LLMs fail most often during strategy formulation and logic synthesis when fixing GitHub issues, but succeed relatively well at localizing faults, according to a taxonomy derived from 243 manual failure cases.
Within each repository, tasks are analyzed in ascending order of total failure count, starting from tasks with fewer failed attempts and progressing to more difficult ones
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Characterizing the Failure Modes of LLMs in Resolving Real-World GitHub Issues
LLMs fail most often during strategy formulation and logic synthesis when fixing GitHub issues, but succeed relatively well at localizing faults, according to a taxonomy derived from 243 manual failure cases.