GRPO, Dr. GRPO, and DAPO are three settings of one dial on the group standard deviation of binary rewards, unified by the group-standard-deviation identity where disagreement equals update magnitude.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Test-time sampling improves coverage but stalls at modal and correlation ceilings for answer selection, with the effective number of samples as the practical limit.
citing papers explorer
-
GRPO, Dr. GRPO, and DAPO Are Three Operations on One Number: The Group-Standard-Deviation Identity
GRPO, Dr. GRPO, and DAPO are three settings of one dial on the group standard deviation of binary rewards, unified by the group-standard-deviation identity where disagreement equals update magnitude.
-
When More Sampling Hurts: The Modal Ceiling and Correlation Ceiling of Test-Time Scaling
Test-time sampling improves coverage but stalls at modal and correlation ceilings for answer selection, with the effective number of samples as the practical limit.