pith. sign in

← back to paper

Review history

arxiv: 2605.03596 · 5 revisions

Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

  1. 2026-05-15 UNVERDICTED LOW v0.9.0 novelty 8.0
    45842 ms 5606 in 1278 out 2026-05-15T07:04:25.127126+00:00
  2. 2026-05-13 CONDITIONAL LOW v0.9.0 novelty 7.0
    30685 ms 5608 in 973 out 2026-05-13T07:34:41.870601+00:00
  3. 2026-05-12 UNVERDICTED LOW v0.9.0 novelty 8.0
    121634 ms 5608 in 1303 out 2026-05-12T04:02:24.053344+00:00
  4. 2026-05-07 UNVERDICTED LOW v0.9.0 novelty 8.0
    74078 ms 5609 in 1393 out 2026-05-07T16:32:24.172683+00:00
  5. 2026-05-07 UNVERDICTED LOW v0.9.0 novelty 7.0
    27457 ms 5587 in 1103 out 2026-05-07T01:28:24.242567+00:00