Review history

arxiv: 2512.04111 · 2 revisions

CentaurEval: Benchmarking Human-in-the-Loop Value in Agentic Coding

2026-05-22 UNVERDICTED LOW v0.9.0 novelty 7.0

45534 ms 5768 in 1260 out 2026-05-22T11:55:20.207362+00:00
2026-05-21 UNVERDICTED LOW v0.9.0 novelty 7.0

36068 ms 5880 in 1220 out 2026-05-21T18:05:07.835330+00:00