Large-scale trajectory analysis of 19 coding agents on 500 tasks finds that LLM choice drives outcomes more than framework design and that context-gathering plus validation behaviors improve success beyond task difficulty predictions.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 4roles
background 1polarities
background 1representative citing papers
ContraFix couples differential runtime evidence from execution variants with reusable repair skills to achieve 84.0% resolution on SEC-Bench and 73.8% on PatchEval using GPT-5-mini, outperforming baselines at lower cost.
Graphectory turns stochastic agent trajectories into analyzable graphs, showing that stronger models and successful fixes follow coherent localization-validation steps while failures are chaotic, and online detection plus rollback improves resolution rates by 6.9-23.5%.
Agent-generated tests mainly act as observational feedback channels and do not meaningfully improve issue resolution success in current LLM software engineering agents.
citing papers explorer
-
Beyond Resolution Rates: Behavioral Drivers of Coding Agent Success and Failure
Large-scale trajectory analysis of 19 coding agents on 500 tasks finds that LLM choice drives outcomes more than framework design and that context-gathering plus validation behaviors improve success beyond task difficulty predictions.
-
ContraFix: Agentic Vulnerability Repair via Differential Runtime Evidence and Skill Reuse
ContraFix couples differential runtime evidence from execution variants with reusable repair skills to achieve 84.0% resolution on SEC-Bench and 73.8% on PatchEval using GPT-5-mini, outperforming baselines at lower cost.
-
Process-Centric Analysis of Agentic Software Systems
Graphectory turns stochastic agent trajectories into analyzable graphs, showing that stronger models and successful fixes follow coherent localization-validation steps while failures are chaotic, and online detection plus rollback improves resolution rates by 6.9-23.5%.
-
Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents
Agent-generated tests mainly act as observational feedback channels and do not meaningfully improve issue resolution success in current LLM software engineering agents.