archive

Every paper Pith has read. Search by title, abstract, or pith.

1797 papers in cs.SE · page 4

cs.SE 2026-05-15 reviewed

Non-self-fixed ATD lingers longer with many developers' changes
The Dangers of Non-Self-Fixed Architecture Technical Debt and Its Impact on Time-to-Fix

Edi Sutoyo +2
cs.SE 2026-05-15 reviewed

Concept alignment lifts code search accuracy 15x on new data
XSearch: Explainable Code Search via Concept-to-Code Alignment

Yiming Liu +9
cs.SE 2026-05-15 reviewed

Small open LLMs match large ones at grammar-based DSL generation
From Text to DSL: Evaluating Grammar-Based Model Generation Using Open LLMs

Junaid Baber +4
cs.SE 2026-05-15 reviewed

AI agents solve at most 39% of real version upgrade tasks
RoadmapBench: Evaluating Long-Horizon Agentic Software Development Across Version Upgrades

Xinbo Xu +15

1 Piths
cs.SE 2026-05-15 reviewed

BootstrapAgent distills repo setup into reusable contracts
BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge

Sihan Fu +4
cs.SE 2026-05-15 reviewed

Early QA in annotation pipelines cuts costs more than late checks
Position: Early-Stage Quality Assurance in Annotation Pipelines Is More Cost-Effective Than Late-Stage Validation

Sunil Kothari +10
cs.AR 2026-05-15 reviewed

Intra-thread duplication catches 39% more defective servers
ITHICA: Intra-Thread Instruction Checking Approach for Defect-Induced Silent Data Corruptions

Ioanna Vavelidou +5
cs.SE 2026-05-15 reviewed

Bayesian sequential tests cut quantum verification costs
Bayesian Sequential Verification for Budget-Aware Quantum Program Testing

Lei Zhang
cs.CR 2026-05-15 reviewed

Chained mutators mostly interfere but some synergize in LLM jailbreaks
Compositional Jailbreaking: An Empirical Analysis of Mutator Chain Interactions in Aligned LLMs

Reinelle Jan Bugnot +3
cs.CR 2026-05-15 reviewed

LLM agent finds 24 zero-day privilege escalations in microservices
Detecting Privilege Escalation in Polyglot Microservices via Agentic Program Analysis

Penghui Li +3
cs.SE 2026-05-14 reviewed

Runtime structure cuts retry costs in agentic coding by 51.7%
Runtime-Structured Task Decomposition for Agentic Coding Systems

Shubhi Asthana +4
cs.LG 2026-05-14 reviewed

Agent turns I/O examples into code via guided evolutionary search
From I/O to Code with Discovery Agent

Yihong Dong +9
cs.SE 2026-05-14 reviewed

Semantically grounded agents detect memory bugs in binaries
Veritas: A Semantically Grounded Agentic Framework for Memory Corruption Vulnerability Detection in Binaries

Xinran Zheng +4
cs.SE 2026-05-14 reviewed

Viverra adds verified assertions to LLM-generated C code
Viverra: Text-to-Code with Guarantees

Haoze Wu +3
cs.SE 2026-05-14 reviewed

Test generation uncovers 2.56x more privacy leaks in code LLMs
Probing Privacy Leaks in LLM-based Code Generation via Test Generation

Yifei Ge +9
cs.SE 2026-05-14 reviewed

Agentic AI matures fastest where outputs can be tested automatically
Assistance to Autonomy: A Systematic Literature Review of Agentic AI across the Software Development Life Cycle

Spyridon Alvanakis Apostolou +2
cs.SE 2026-05-14 reviewed

Architecture docs let agents migrate eight C repos to Rust
Documentation-Guided Agentic Codebase Migration from C to Rust

Minh Le-Anh +3
cs.SE 2026-05-14 reviewed

Documentation blueprint enables full C-to-Rust repo migration
Documentation-Guided Agentic Codebase Migration from C to Rust

Minh Le-Anh +3
cs.SE 2026-05-14 reviewed

ML classifier beats rules at spotting BDD refactoring chances
Mining Subscenario Refactoring Opportunities in Behaviour-Driven Software Test Suites: ML Classifiers and LLM-Judge Baselines

Ali Hassaan Mughal +2
cs.SE 2026-05-14 reviewed

Memory agent keeps repo documentation consistent
Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation

Suyoung Bae +4
cs.SE 2026-05-14 reviewed

Retriever beats generator in RAG for code tasks
Not All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Tasks

Qiang Ke +4
cs.SE 2026-05-14 reviewed

Stale code snippets make models output outdated helpers
When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context

Haojun Weng +4
cs.CR 2026-05-14 reviewed

Disguised compliance rules let attackers hijack LLM agents
Exploiting LLM Agent Supply Chains via Payload-less Skills

Xinyu Liu +3
cs.SE 2026-05-14 reviewed

Multi-agent system automates full library fuzzing lifecycle
FuzzAgent: Multi-Agent System for Evolutionary Library Fuzzing

Yunlong Lyu +5
cs.SE 2026-05-14 reviewed

Agents resolve 45 percent of chained package upgrades
SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

Man Ho Lam +7
cs.SE 2026-05-14 reviewed

Size filter trims 80 percent of tokens from LLM repo inputs
Correctness-Aware Repository Filtering Under Maximum Effective Context Window Constraints

Shweta Mishra
cs.SE 2026-05-14 reviewed

Valid microservice APIs often fail for AI agents
Making OpenAPI Documentation Agent-Ready: Detecting Documentation and REST Smells with a Multi-Agent LLM System

Rayfran Rocha Lima +2
cs.SE 2026-05-14 reviewed

Hydra cuts LLM code gen latency up to 71% with rollback repairs
Hydra: Efficient, Correct Code Generation via Checkpoint-and-Rollback Support

Alexander Du +3
cs.CR 2026-05-14 reviewed

Web agents should plan before seeing page content
Web Agents Should Adopt the Plan-Then-Execute Paradigm

Julien Piet +7
cs.SE 2026-05-14 reviewed

Failure-guided fuzzing beats random testing for HQC programs
Failure-Guided Fuzzing for Hybrid Quantum-Classical Programs

Lei Zhang
cs.SE 2026-05-13 reviewed

Prompt strategy explains more variation in test diversity than model size when using LLMs…
LLM-Based Robustness Testing of Microservice Applications: An Empirical Study

Hrushitha Goud Tigulla +1
cs.SE 2026-05-13 reviewed

Constrained edits merge checkpoints to lift code agent scores
CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing

Mingzhi Zhu +3
cs.SE 2026-05-13 reviewed

AI agents speed creation of digital music instruments
Case Studies and Reflections on Agentic Software Engineering for Rapid Development of Digital Music Instruments

Matthew John Yee-King
cs.SE 2026-05-13 reviewed

Method-level change-proneness beats class-level for test minimization
Method-level Change-proneness: A Better Metric for Black-box Test Suite Minimization

Md Siam +1
cs.SE 2026-05-13 reviewed

Benchmark shows AI agents recall 42-83 percent of property-based testing bugs
PBT-Bench: Benchmarking AI Agents on Property-Based Testing

Lucas Jing +3
cs.SE 2026-05-13 reviewed

LLMs detect 42-83% of semantic bugs with property-test prompts
PBT-Bench: Benchmarking AI Agents on Property-Based Testing

Lucas Jing +3
cs.SE 2026-05-13 reviewed

LLM with SMT solver audits natural-language requirements
Neurosymbolic Auditing of Natural-Language Software Requirements

Bethel Hall +1
cs.SE 2026-05-13 reviewed

LLMs reach only 52% accuracy on HMSC semantic tasks
(How) Do Large Language Models Understand High-Level Message Sequence Charts?

Mohammad Reza Mousavi
cs.SE 2026-05-13 reviewed

LLMs reach only 52% accuracy on HMSC formal semantics
(How) Do Large Language Models Understand High-Level Message Sequence Charts?

Mohammad Reza Mousavi
cs.RO 2026-05-13 reviewed

CARS attributes AV collisions to driver faults
Learning Responsibility-Attributed Adversarial Scenarios for Testing Autonomous Vehicles

Yizhuo Xiao +7
cs.SE 2026-05-13 reviewed

SkillOps is a plug-in framework that maintains LLM agent skill libraries by representing…
SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems

Hongji Pu +2
cs.SE 2026-05-13 reviewed

Quantifier rewrites and non-alias specs speed GPU verification ninefold
Scalable Deductive Verification of Data-Level Parallel Programs

Lars B. van den Haak +2
cs.AR 2026-05-13 reviewed

AI agents drop 37-58% on hardware vs software tasks
Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench

Qingyun Zou +4
cs.RO 2026-05-13 reviewed

Open standards let one agent model run consistently in three simulators
Integration of an Agent Model into an Open Simulation Architecture for Scenario-Based Testing of Automated Vehicles

Christian Geller +3
cs.SE 2026-05-13 reviewed

Runtime pruning cuts tokens 49% for local LLM fault localization
SieveFL: Hierarchical Runtime-Aware Pruning for Scalable LLM-Based Fault Localization

Mahdi Farzandway +1
cs.SE 2026-05-13 reviewed

Call stack data improves RL game testing agents
CA2: Code-Aware Agent for Automated Game Testing

Valliappan Chidambaram Adaikkappan +3
cs.SE 2026-05-13 reviewed

Runtime harness mediates AI agent actions on code projects
AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents

Hailin Zhong +1
cs.SE 2026-05-13 reviewed

This paper finds that code generated by large language models has overall readability…
The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code

Hengzhi Ye +3
cs.SE 2026-05-13 reviewed

Noise reshapes mutant detection in quantum programs
Robust Mutation Analysis of Quantum Programs Under Noise

Sophie Fortz +4
cs.SE 2026-05-13 reviewed

Readiness metrics show near-zero link to research software execution success
ReproScore: Separating Readiness from Outcome in Research Software Reproducibility Assessment

Sheeba Samuel +4