Review history
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
-
2026-05-12 UNVERDICTED
-
2026-05-08 UNVERDICTED
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning