← back to paper
arxiv: 2605.06638 · 3 revisions
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key