NGRPO: Negative-enhanced group relative policy optimization

Nan, G · 2025 · arXiv 2509.18851

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

PCM uses success-failure action variance to probabilistically select and mask chunks for gradient updates in GRPO, matching standard success rates with 2.38x wall-clock speedup and 60% lower memory on LIBERO benchmarks.

Leveraging Error Diversity in Group Rollouts for Reinforcement Learning

cs.LG · 2026-05-17 · unverdicted · novelty 6.0

EDAS modulates advantage signals in RLVR to penalize repeated errors more and rare errors less, yielding consistent gains on math benchmarks when added to existing methods.

MCPO: Mastery-Consolidated Policy Optimization for Large Reasoning Models

cs.AI · 2026-04-18 · unverdicted · novelty 5.0

MCPO fixes vanishing training signals and shrinking weights in GRPO by using a hinge-KL regularizer on mastered prompts and prioritizing majority-correct prompts, yielding higher pass@1 and pass@k on math tasks.

Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation

cs.LG · 2026-05-20

citing papers explorer

Showing 4 of 4 citing papers.

Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking cs.LG · 2026-05-15 · unverdicted · none · ref 21
PCM uses success-failure action variance to probabilistically select and mask chunks for gradient updates in GRPO, matching standard success rates with 2.38x wall-clock speedup and 60% lower memory on LIBERO benchmarks.
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning cs.LG · 2026-05-17 · unverdicted · none · ref 23
EDAS modulates advantage signals in RLVR to penalize repeated errors more and rare errors less, yielding consistent gains on math benchmarks when added to existing methods.
MCPO: Mastery-Consolidated Policy Optimization for Large Reasoning Models cs.AI · 2026-04-18 · unverdicted · none · ref 29
MCPO fixes vanishing training signals and shrinking weights in GRPO by using a hinge-KL regularizer on mastered prompts and prioritizing majority-correct prompts, yielding higher pass@1 and pass@k on math tasks.
Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation cs.LG · 2026-05-20 · unreviewed · ref 27

NGRPO: Negative-enhanced group relative policy optimization

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer