pith. sign in

← back to paper

Review history

arxiv: 2605.15229 · 2 revisions

PBT-Bench: Benchmarking AI Agents on Property-Based Testing

  1. 2026-05-21 UNVERDICTED LOW v0.9.0 novelty 7.0
    57428 ms 5872 in 1428 out 2026-05-21T07:54:20.257433+00:00
  2. 2026-05-19 CONDITIONAL MODERATE v0.9.0 novelty 7.0
    41507 ms 5872 in 1387 out 2026-05-19T17:19:53.915761+00:00