The problem involves … final answer is

Substitute into the equation for the sum of the highest, lowest scores: Thus, the lowest score is: The lowest score is

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning

cs.CL · 2025-05-21 · unverdicted · novelty 5.0

TemplateRL extracts interpretable templates via MCTS on seed problems and injects them into RL policy optimization to raise high-quality rollout rates, reporting 99% gain over GRPO on AIME and 41% on AMC.

citing papers explorer

Showing 1 of 1 citing paper.

TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning cs.CL · 2025-05-21 · unverdicted · none · ref 24
TemplateRL extracts interpretable templates via MCTS on seed problems and injects them into RL policy optimization to raise high-quality rollout rates, reporting 99% gain over GRPO on AIME and 41% on AMC.

The problem involves … final answer is

fields

years

verdicts

representative citing papers

citing papers explorer