ADI equips AI debugging agents with function-level interaction via a new execution trace structure, raising SWE-bench Verified resolution to 63.8% at $1.28 per task and delivering 6-18% gains when added to existing agents.
Title resolution pending
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 9roles
baseline 2polarities
baseline 2representative citing papers
DebugRepair improves LLM-based automated program repair by adding test semantic purification, simulated instrumentation, and debugging-driven conversational repair, fixing 224 Defects4J bugs with GPT-3.5 (26.2% above prior SOTA) and 295 with DeepSeek-V3.
DynaFix iteratively feeds execution-level dynamic information such as variable states and control flows into LLM prompts to repair 186 bugs on Defects4J, a 10% gain over baselines including 38 previously unrepaired cases.
GALA uses hierarchical graph alignment between UI screenshots and code structures to achieve state-of-the-art bug localization in multimodal automated program repair on SWE-bench.
REAgent improves LLM patch generation for software issues by 17.4% on average through automated construction, quality checking, and iterative refinement of structured issue-oriented requirements.
More fault localization context does not consistently improve LLM-based program repair; file-level context gives 15-17x gains, optimal around 6-10 files, while line-level context often degrades performance from noise.
Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.
Auto-Diagnose applies LLMs to summarize and diagnose root causes of integration test failures, reporting 90.14% accuracy on 71 manual cases and positive adoption after Google-wide rollout.
citing papers explorer
-
Empowering Autonomous Debugging Agents with Efficient Dynamic Analysis
ADI equips AI debugging agents with function-level interaction via a new execution trace structure, raising SWE-bench Verified resolution to 63.8% at $1.28 per task and delivering 6-18% gains when added to existing agents.
-
DebugRepair: Enhancing LLM-Based Automated Program Repair via Self-Directed Debugging
DebugRepair improves LLM-based automated program repair by adding test semantic purification, simulated instrumentation, and debugging-driven conversational repair, fixing 224 Defects4J bugs with GPT-3.5 (26.2% above prior SOTA) and 295 with DeepSeek-V3.
-
DynaFix: Iterative Automated Program Repair Driven by Execution-Level Dynamic Information
DynaFix iteratively feeds execution-level dynamic information such as variable states and control flows into LLM prompts to repair 186 bugs on Defects4J, a 10% gain over baselines including 38 previously unrepaired cases.
-
GALA: Multimodal Graph Alignment for Bug Localization in Automated Program Repair
GALA uses hierarchical graph alignment between UI screenshots and code structures to achieve state-of-the-art bug localization in multimodal automated program repair on SWE-bench.
-
REAgent: Requirement-Driven LLM Agents for Software Issue Resolution
REAgent improves LLM patch generation for software issues by 17.4% on average through automated construction, quality checking, and iterative refinement of structured issue-oriented requirements.
-
On the Role of Fault Localization Context for LLM-Based Program Repair
More fault localization context does not consistently improve LLM-based program repair; file-level context gives 15-17x gains, optimal around 6-10 files, while line-level context often degrades performance from noise.
-
Beyond Fixed Tests: Repository-Level Issue Resolution as Coevolution of Code and Behavioral Constraints
Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.
-
LLM-Based Automated Diagnosis Of Integration Test Failures At Google
Auto-Diagnose applies LLMs to summarize and diagnose root causes of integration test failures, reporting 90.14% accuracy on 71 manual cases and positive adoption after Google-wide rollout.
- HELO-APR: Enhancing Low-Resource Program Repair through Cross-Lingual Knowledge Transfer