Why GRPO needs normalization: A local-curvature perspective on adaptive gradients.arXiv preprint arXiv:2601.23135, 2026

Cheng Ge, Caitlyn Heqi Yin, Hao Liang, Jiawei Zhang · 2026 · arXiv 2601.23135

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

GRPO, Dr. GRPO, and DAPO Are Three Operations on One Number: The Group-Standard-Deviation Identity

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

GRPO, Dr. GRPO, and DAPO are three settings of one dial on the group standard deviation of binary rewards, unified by the group-standard-deviation identity where disagreement equals update magnitude.

CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning

cs.LG · 2026-05-23 · unverdicted · novelty 7.0

CurveRL derives a quantile-coordinate reweighting rule from a utility functional on pass rates and shows it outperforms GRPO on reasoning benchmarks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

GRPO, Dr. GRPO, and DAPO Are Three Operations on One Number: The Group-Standard-Deviation Identity cs.LG · 2026-06-30 · unverdicted · none · ref 16
GRPO, Dr. GRPO, and DAPO are three settings of one dial on the group standard deviation of binary rewards, unified by the group-standard-deviation identity where disagreement equals update magnitude.
CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning cs.LG · 2026-05-23 · unverdicted · none · ref 11
CurveRL derives a quantile-coordinate reweighting rule from a utility functional on pass rates and shows it outperforms GRPO on reasoning benchmarks.

Why GRPO needs normalization: A local-curvature perspective on adaptive gradients.arXiv preprint arXiv:2601.23135, 2026

fields

years

verdicts

representative citing papers

citing papers explorer