pith. sign in

← back to paper

Review history

arxiv: 2604.03244 · 2 revisions

AI Evaluation Should Require Standardized Item-Level Data Releases

  1. 2026-05-25 CONDITIONAL LOW v0.9.0 novelty 5.0
    21360 ms 5765 in 1205 out 2026-05-25T06:44:19.936687+00:00
  2. 2026-05-15 UNVERDICTED LOW v0.9.0 novelty 5.0
    34685 ms 5445 in 1318 out 2026-05-15T19:13:38.408450+00:00