Runtime Execution Traces Guided Automated Program Repair with Multi-Agent Debate

· 2026 · cs.SE · arXiv 2604.02647

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Automated Program Repair (APR) struggles with complex logic errors and silent failures. Current LLM-based APR methods are mostly static, relying on source code and basic test outputs, which fail to accurately capture complex runtime behaviors and dynamic data dependencies. While incorporating runtime evidence like execution traces exposes concrete state transitions, a single LLM interpreting this in isolation often overfits to specific hypotheses, producing patches that satisfy tests by coincidence rather than correct logic. Therefore, runtime evidence should act as objective constraints rather than mere additional input. We propose TraceRepair, a multi-agent framework that leverages runtime facts as shared constraints for patch validation. A probe agent captures execution snapshots of critical variables to form an objective repair basis. Meanwhile, a committee of specialized agents cross-verifies candidate patches to expose inconsistencies and iteratively refine them. Evaluated on the Defects4J benchmark, TraceRepair correctly fixes 392 defects, substantially outperforming existing LLM-based approaches. Extensive experiments demonstrate improved efficiency and strong generalization on a newly constructed dataset of recent bugs, confirming that performance gains arise from dynamic reasoning rather than memorization.

representative citing papers

Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks

cs.AI · 2026-06-01 · unverdicted · novelty 7.0

An empirical protocol measures rediscovery costs when coding agents resume interrupted tasks and finds that context-bearing handoffs cut agent events 20-59% and tokens 42-63% versus repository-only handoffs across three models.

AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

AuditRepairBench supplies a large trace corpus and four screening methods that reduce evaluator-channel ranking instability in agent repair leaderboards by a mean of 62%.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks cs.AI · 2026-06-01 · unverdicted · none · ref 28 · internal anchor
An empirical protocol measures rediscovery costs when coding agents resume interrupted tasks and finds that context-bearing handoffs cut agent events 20-59% and tokens 42-63% versus repository-only handoffs across three models.
AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair cs.AI · 2026-05-06 · unverdicted · none · ref 49 · internal anchor
AuditRepairBench supplies a large trace corpus and four screening methods that reduce evaluator-channel ranking instability in agent repair leaderboards by a mean of 62%.

Runtime Execution Traces Guided Automated Program Repair with Multi-Agent Debate

fields

years

verdicts

representative citing papers

citing papers explorer