EDAS modulates advantage signals in RLVR to penalize repeated errors more and rare errors less, yielding consistent gains on math benchmarks when added to existing methods.
Harder is better: Boost- ing mathematical reasoning via difficulty-aware grpo and multi-aspect question reformulation.arXiv preprint arXiv:2601.20614, 2026
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
D²Evo mines medium-difficulty anchors from the current model, trains a Questioner to generate matching questions, and jointly optimizes Solver and Questioner for progressive gains, outperforming baselines on math reasoning with under 2K real samples.
citing papers explorer
-
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
EDAS modulates advantage signals in RLVR to penalize repeated errors more and rare errors less, yielding consistent gains on math benchmarks when added to existing methods.
-
D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning
D²Evo mines medium-difficulty anchors from the current model, trains a Questioner to generate matching questions, and jointly optimizes Solver and Questioner for progressive gains, outperforming baselines on math reasoning with under 2K real samples.