Harder is better: Boost- ing mathematical reasoning via difficulty-aware grpo and multi-aspect question reformulation.arXiv preprint arXiv:2601.20614, 2026

Yanqi Dai, Yuxiang Ji, Xiao Zhang, Yong Wang, Xiangxiang Chu, Zhiwu Lu · 2026 · arXiv 2601.20614

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Leveraging Error Diversity in Group Rollouts for Reinforcement Learning

cs.LG · 2026-05-17 · unverdicted · novelty 6.0

EDAS modulates advantage signals in RLVR to penalize repeated errors more and rare errors less, yielding consistent gains on math benchmarks when added to existing methods.

D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning

cs.LG · 2026-05-16 · unverdicted · novelty 5.0

D²Evo mines medium-difficulty anchors from the current model, trains a Questioner to generate matching questions, and jointly optimizes Solver and Questioner for progressive gains, outperforming baselines on math reasoning with under 2K real samples.

citing papers explorer

Showing 2 of 2 citing papers.

Leveraging Error Diversity in Group Rollouts for Reinforcement Learning cs.LG · 2026-05-17 · unverdicted · none · ref 8
EDAS modulates advantage signals in RLVR to penalize repeated errors more and rare errors less, yielding consistent gains on math benchmarks when added to existing methods.
D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning cs.LG · 2026-05-16 · unverdicted · none · ref 62
D²Evo mines medium-difficulty anchors from the current model, trains a Questioner to generate matching questions, and jointly optimizes Solver and Questioner for progressive gains, outperforming baselines on math reasoning with under 2K real samples.

Harder is better: Boost- ing mathematical reasoning via difficulty-aware grpo and multi-aspect question reformulation.arXiv preprint arXiv:2601.20614, 2026

fields

years

verdicts

representative citing papers

citing papers explorer