TemplateRL extracts interpretable templates via MCTS on seed problems and injects them into RL policy optimization to raise high-quality rollout rates, reporting 99% gain over GRPO on AIME and 41% on AMC.
If rh > p policy h (the per-group suc- cess probability under policy rollouts), tem- plate transfer strictly improves per-group suc- cess
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning
TemplateRL extracts interpretable templates via MCTS on seed problems and injects them into RL policy optimization to raise high-quality rollout rates, reporting 99% gain over GRPO on AIME and 41% on AMC.