archive
Every paper Pith has read. Search by title, abstract, or pith.
1797 papers in cs.SE · page 15
-
UniAda attack fools self-driving cars on both steering and speed
UniAda: Universal Adaptive Multi-objective Adversarial Attack for End-to-End Autonomous Driving Systems
-
Local LLMs spot 43-45% of Python bugs in real projects
An Empirical Evaluation of Locally Deployed LLMs for Bug Detection in Python Code
-
Study evaluates 15 test metrics across 1640 DL scenarios
Empirical Insights of Test Selection Metrics under Multiple Testing Objectives and Distribution Shifts
-
LLMs match only 0-60% of real open-source commits
Can LLMs be Effective Code Contributors? A Study on Open-source Projects
-
Agents map 6G intents to pre-validated services via TMF graphs
Towards Agentic Test-Driven Quality Assurance for 6G Networks
-
Knowledge levers raise software project capital by 63.8%
Knowledge Lever Risk Management for Software Engineering: A Stochastic Framework for Mitigating Knowledge Loss
-
AI reviewer in GitHub PRs sustains 33% follow-up activity
AI-Assisted Code Review as a Scaffold for Code Quality and Self-Regulated Learning: An Experience Report
-
Layered procedures and accountability underpin effective fintech ISMS
Operationalising Information Security Management: A Procedural Framework Analysis of ISO/IEC 27001:2022 Implementation in a Financial-Technology Organisation
-
Framework sets up any code repository automatically
RAT: RunAnyThing via Fully Automated Environment Configuration
-
RANalyzer ties performance drops to specific code changes
RANalyzer: Automated Continuous RAN Software Evaluation and Regression Analysis
-
Argumentation resolves multi-agent requirements conflicts with traceability
ArgRE: Formal Argumentation for Conflict Resolution in Multi-Agent Requirements Negotiation
-
Seven modeling challenges limit iFogSim for complex IoT setups
Source-Code Analysis of iFogSim for Simulating Distributed IoT Architectures: Coverage, Challenges, and Enhancements
-
New framework generates scientific code without test cases
No Test Cases, No Problem: Distillation-Driven Code Generation for Scientific Workflows
-
Parallel agents produce readable code quality feedback
Code Broker: A Multi-Agent System for Automated Code Quality Assessment
-
Frontier AI firms should disclose internal deployment details
What Should Frontier AI Developers Disclose About Internal Deployments?
-
Testing documentation correlates with higher test engagement in OSS pull requests
The Impact of Documentation on Test Engagement in Pull Requests in OSS
-
Smartphone photos detect anemia at 96 percent accuracy
AnemiaVision: Non-Invasive Anemia Detection via Smartphone Imagery Using EfficientNet-B3 with TrivialAugmentWide, Mixup Augmentation, and Persistent Patient History Management
-
Coding agents burn 1000x more tokens than chats or reasoning
How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks
-
Vibe coding hackathon lets all skill levels build apps with AI only
Code for All: Educational Applications of the "Vibe Coding" Hackathon in Programming Education across All Skill Levels
-
Binary analysis infers test equivalence classes from legacy firmware
Inferring Equivalence Classes from Legacy Undocumented Embedded Binaries for ISO 26262-Compliant Testing
-
RealBench shows LLMs lag at full repo code gen even with UML specs
RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices
-
Verifier warnings add no value to comprehensibility prediction models
Verifier Warnings Do Not Improve Comprehensibility Prediction
-
Selective mutation cuts deep learning mutants by over half
Quality-Driven Selective Mutation for Deep Learning
-
LLMs turn natural language into Dafny-verified code at high success rates
From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification
-
ZF requirements show gaps in ROS 2 and AUTOSAR Adaptive
A Comparison of ROS 2 and AUTOSAR Adaptive Platform Against Industry-Elicited Automotive Middleware Requirements
-
Template breaks AI tests into goal
Test Design and Review Argumentation in AI-Assisted Test Generation
-
Game points cut cross-service code contributions
Gamifying Architectural Governance to Reduce Organizational Coupling in Microservice Systems
-
LLM framework boosts traceability F1 by 7.4% with 41.7% fewer tokens
R2Code: A Self-Reflective LLM Framework for Requirements-to-Code Traceability
-
Modular refactor adds use case diagrams to gamified UML tool
Enhancing a gamified tool for UML modeling education
-
Poisoning 10% of code data blocks unauthorized AI training
Train in Vain: Functionality-Preserving Poisoning to Prevent Unauthorized Use of Code Datasets
-
Helpers from high-level features speed HLS verification up to 6x
AutoINV: Automated Invariant Generation Framework for Formal Verification on High-Level Synthesis Designs
-
Reflections boost RAG to 0.78 F1 for predicting SO code edits without training
RAG-Reflect: Agentic Retrieval-Augmented Generation with Reflections for Comment-Driven Code Maintenance on Stack Overflow
-
LLM feedback loop pulls low-level goals at 61% accuracy
Evaluating LLM-Based Goal Extraction in Requirements Engineering: Prompting Strategies and Their Limitations
-
Smart contracts lock fraud detection trails into blockchain
Who Audits the Auditor? Tamper-Proof Fraud Detection with Blockchain-Anchored Explainable ML
-
Ethics testing detects harms in generative AI outputs
Ethics Testing: Proactive Identification of Generative AI System Harms
-
Pipeline rebuilds real crashes in CARLA with exact road maps
TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation
-
Call-chain context lifts LLM test coverage
Call-Chain-Aware LLM-Based Test Generation for Java Projects
-
Framework helps universities update rules for student GenAI use
A Systematic AI Adoption Framework for Higher Education: From Student GenAI Usage to Institutional Integration
-
Tests yield 300 correct runtime checkers across four systems
FlyCatcher: Neural Inference of Runtime Checkers from Tests
-
Nominal group interviews enable documentless assessments
Documentless Assessments Using Nominal Group Interviews
-
87% of multi-commit Python vulnerabilities evade per-commit SAST
CrossCommitVuln-Bench: A Dataset of Multi-Commit Python Vulnerabilities Invisible to Per-Commit Static Analysis
-
Models create harder math problems than they can solve in dual tests
MathDuels: Evaluating LLMs as Problem Posers and Solvers
-
Framework eases onboarding for research computing newcomers
Institutionalizing Best Practices in Research Computing: A Framework and Case Study for Improving User Onboarding
-
One test generalizes into full scenario coverage
Generalizing Test Cases for Comprehensive Test Scenario Coverage
-
PrismaDV produces task-aware data unit tests automatically
PrismaDV: Automated Task-Aware Data Unit Test Generation
-
Structured JSON output beats direct and agentic LLM methods for analysis queries
Less Is More: Measuring How LLM Involvement affects Chatbot Accuracy in Static Analysis
-
Grounding document steers AI coding to valid scientific results
Agentic AI-assisted coding offers a unique opportunity to instill epistemic grounding during software development
-
AI code generators include sensitive attributes in 88% of ML pipelines
From If-Statements to ML Pipelines: Revisiting Bias in Code-Generation
-
LLMs answer 98 percent of ROS2 architecture questions correctly
Can Large Language Models Assist the Comprehension of ROS2 Software Architectures?
-
Provenance data turns ML interpretability into verifiable requirements
Verifying Machine Learning Interpretability Requirements through Provenance