pith. machine review for the scientific record. sign in

← back to paper

Review history

arxiv: 2605.08978 · 2 revisions

Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization

  1. 2026-05-13 UNVERDICTED LOW v0.9.0 novelty 5.0
    36147 ms 5481 in 1002 out 2026-05-13T00:54:36.447292+00:00
  2. 2026-05-12 UNVERDICTED LOW v0.9.0 novelty 7.0
    49600 ms 5484 in 1042 out 2026-05-12T02:56:05.744997+00:00