If rh > p policy h (the per-group suc- cess probability under policy rollouts), tem- plate transfer strictly improves per-group suc- cess

Per-mini-group: the event that mini-group h contains at least one positive trajectory due to template transfer occurs with probability at least rh

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning

cs.CL · 2025-05-21 · unverdicted · novelty 5.0

TemplateRL extracts interpretable templates via MCTS on seed problems and injects them into RL policy optimization to raise high-quality rollout rates, reporting 99% gain over GRPO on AIME and 41% on AMC.

citing papers explorer

Showing 1 of 1 citing paper.

TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning cs.CL · 2025-05-21 · unverdicted · none · ref 9
TemplateRL extracts interpretable templates via MCTS on seed problems and injects them into RL policy optimization to raise high-quality rollout rates, reporting 99% gain over GRPO on AIME and 41% on AMC.

If rh > p policy h (the per-group suc- cess probability under policy rollouts), tem- plate transfer strictly improves per-group suc- cess

fields

years

verdicts

representative citing papers

citing papers explorer