TemplateRL extracts interpretable templates via MCTS on seed problems and injects them into RL policy optimization to raise high-quality rollout rates, reporting 99% gain over GRPO on AIME and 41% on AMC.
The problem involves … final answer is
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning
TemplateRL extracts interpretable templates via MCTS on seed problems and injects them into RL policy optimization to raise high-quality rollout rates, reporting 99% gain over GRPO on AIME and 41% on AMC.