An empirical protocol measures rediscovery costs when coding agents resume interrupted tasks and finds that context-bearing handoffs cut agent events 20-59% and tokens 42-63% versus repository-only handoffs across three models.
Runtime Execution Traces Guided Automated Program Repair with Multi-Agent Debate
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
Automated Program Repair (APR) struggles with complex logic errors and silent failures. Current LLM-based APR methods are mostly static, relying on source code and basic test outputs, which fail to accurately capture complex runtime behaviors and dynamic data dependencies. While incorporating runtime evidence like execution traces exposes concrete state transitions, a single LLM interpreting this in isolation often overfits to specific hypotheses, producing patches that satisfy tests by coincidence rather than correct logic. Therefore, runtime evidence should act as objective constraints rather than mere additional input. We propose TraceRepair, a multi-agent framework that leverages runtime facts as shared constraints for patch validation. A probe agent captures execution snapshots of critical variables to form an objective repair basis. Meanwhile, a committee of specialized agents cross-verifies candidate patches to expose inconsistencies and iteratively refine them. Evaluated on the Defects4J benchmark, TraceRepair correctly fixes 392 defects, substantially outperforming existing LLM-based approaches. Extensive experiments demonstrate improved efficiency and strong generalization on a newly constructed dataset of recent bugs, confirming that performance gains arise from dynamic reasoning rather than memorization.
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
AuditRepairBench supplies a large trace corpus and four screening methods that reduce evaluator-channel ranking instability in agent repair leaderboards by a mean of 62%.
citing papers explorer
-
Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks
An empirical protocol measures rediscovery costs when coding agents resume interrupted tasks and finds that context-bearing handoffs cut agent events 20-59% and tokens 42-63% versus repository-only handoffs across three models.
-
AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
AuditRepairBench supplies a large trace corpus and four screening methods that reduce evaluator-channel ranking instability in agent repair leaderboards by a mean of 62%.