pith. sign in

← back to paper

Review history

arxiv: 2604.17338 · 2 revisions

Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?

  1. 2026-05-20 UNVERDICTED UNKNOWN v0.9.0 novelty 7.0
    28950 ms 5745 in 1220 out 2026-05-20T23:57:20.423767+00:00
  2. 2026-05-10 UNVERDICTED LOW v0.9.0 novelty 7.0
    69801 ms 5514 in 1291 out 2026-05-10T06:02:06.759620+00:00