Review history

arxiv: 2605.06241 · 2 revisions

Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

2026-05-12 UNVERDICTED LOW v0.9.0 novelty 6.0

83299 ms 5581 in 1355 out 2026-05-12T01:04:38.429615+00:00
2026-05-08 UNVERDICTED LOW v0.9.0 novelty 7.0

70584 ms 11123 in 1586 out 2026-05-08T10:27:04.217941+00:00