GRPO - LEAD : A difficulty-aware reinforcement learning approach for concise mathematical reasoning in language models

Grpo-lead: A difficulty-aware reinforcement learning approach for concise mathematical reasoning in language models , author= · 2025 · DOI 10.18653/v1/2025.emnlp-main.287

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Process Supervision of Confidence Margin for Calibrated LLM Reasoning

cs.LG · 2026-04-25 · unverdicted · novelty 6.0

RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.

Beyond Penalizing Mistakes: Stabilizing Efficiency Training in Large Reasoning Models via Adaptive Correct-Only Rewards

cs.AI · 2026-06-21 · unverdicted · novelty 5.0

ACOER applies adaptive correct-only efficiency rewards in GRPO to avoid reward collapse, yielding higher accuracy and over 60% fewer tokens on math reasoning benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

Process Supervision of Confidence Margin for Calibrated LLM Reasoning cs.LG · 2026-04-25 · unverdicted · none · ref 85
RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.
Beyond Penalizing Mistakes: Stabilizing Efficiency Training in Large Reasoning Models via Adaptive Correct-Only Rewards cs.AI · 2026-06-21 · unverdicted · none · ref 24
ACOER applies adaptive correct-only efficiency rewards in GRPO to avoid reward collapse, yielding higher accuracy and over 60% fewer tokens on math reasoning benchmarks.

GRPO - LEAD : A difficulty-aware reinforcement learning approach for concise mathematical reasoning in language models

fields

years

verdicts

representative citing papers

citing papers explorer