Good learners think their thinking: Generative prm makes large reasoning model more efficient math learner

Tao He, Rongchuan Mu, Lizi Liao, Yixin Cao, Ming Liu, Bing Qin · 2025 · arXiv 2507.23317

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

LLM Reasoning with Process Rewards for Outcome-Guided Steps

cs.LG · 2026-02-08 · unverdicted · novelty 5.0

PROGRS uses outcome-conditioned centering on PRM scores to safely integrate process rewards into GRPO for improved Pass@1 on math benchmarks.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

citing papers explorer

Showing 2 of 2 citing papers.

LLM Reasoning with Process Rewards for Outcome-Guided Steps cs.LG · 2026-02-08 · unverdicted · none · ref 10
PROGRS uses outcome-conditioned centering on PRM scores to safely integrate process rewards into GRPO for improved Pass@1 on math benchmarks.
A Survey of Reinforcement Learning for Large Reasoning Models cs.CL · 2025-09-10 · accept · none · ref 191
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

Good learners think their thinking: Generative prm makes large reasoning model more efficient math learner

fields

years

verdicts

representative citing papers

citing papers explorer