Empirical study of 3977 agent trajectories finds Python execution errors correlate with lower success rates on GitHub issues, flags challenging errors, and reports three confirmed bugs in the SWE-Bench platform.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SE 2representative citing papers
AI IDEs with structured guidance can produce functional large-scale code but frequently introduce design flaws such as duplication, complexity, and principle violations that risk long-term maintainability.
citing papers explorer
-
Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios
Empirical study of 3977 agent trajectories finds Python execution errors correlate with lower success rates on GitHub issues, flags challenging errors, and reports three confirmed bugs in the SWE-Bench platform.
-
Beyond Functional Correctness: Design Issues in AI IDE-Generated Large-Scale Projects
AI IDEs with structured guidance can produce functional large-scale code but frequently introduce design flaws such as duplication, complexity, and principle violations that risk long-term maintainability.