pith. sign in

← back to paper

Review history

arxiv: 2605.14678 · 2 revisions

$\pi$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

  1. 2026-05-20 UNVERDICTED LOW v0.9.0 novelty 7.0
    38585 ms 5769 in 1142 out 2026-05-20T21:09:34.280051+00:00
  2. 2026-05-19 UNVERDICTED LOW v0.9.0 novelty 7.0
    56753 ms 5769 in 1222 out 2026-05-19T16:26:00.538981+00:00