RigorBench evaluates AI coding agents on process discipline via five pillars and reports 41% higher process scores and 17% better outcome correctness with structured approaches on 30 tasks.
ProjDevBench: Benchmarking LLM Agents on Full-Scale Software Project Development,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
RigorBench: Benchmarking Engineering Process Discipline in Autonomous AI Coding Agents
RigorBench evaluates AI coding agents on process discipline via five pillars and reports 41% higher process scores and 17% better outcome correctness with structured approaches on 30 tasks.