pith. machine review for the scientific record. sign in

← back to paper

Review history

arxiv: 2605.06241 · 2 revisions

Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

  1. 2026-05-12 UNVERDICTED LOW v0.9.0 novelty 6.0
    83299 ms 5581 in 1355 out 2026-05-12T01:04:38.429615+00:00
  2. 2026-05-08 UNVERDICTED LOW v0.9.0 novelty 7.0
    70584 ms 11123 in 1586 out 2026-05-08T10:27:04.217941+00:00