pith. sign in

← back to paper

Review history

arxiv: 2606.22678 · 2 revisions

RigorBench: Benchmarking Engineering Process Discipline in Autonomous AI Coding Agents

  1. 2026-07-01 UNVERDICTED LOW v0.9.1-grok novelty 7.0
    50396 ms 5799 in 1155 out 2026-07-01T07:08:41.792113+00:00
  2. 2026-06-26 UNVERDICTED LOW v0.9.1-grok novelty 8.0
    21161 ms 5799 in 1238 out 2026-06-26T09:37:55.518928+00:00