Review history

arxiv: 2604.03244 · 2 revisions

AI Evaluation Should Require Standardized Item-Level Data Releases

2026-05-25 CONDITIONAL LOW v0.9.0 novelty 5.0

21360 ms 5765 in 1205 out 2026-05-25T06:44:19.936687+00:00
2026-05-15 UNVERDICTED LOW v0.9.0 novelty 5.0

34685 ms 5445 in 1318 out 2026-05-15T19:13:38.408450+00:00