pith:SPDDBVRN
Sharpness-Guided Group Relative Policy Optimization via Probability Shaping
GRPO-SG is a sharpness-guided token-weighted variant of GRPO that downweights high-gradient tokens to stabilize optimization and improve generalization in reinforcement learning with verifiable rewards.
arxiv:2511.00066 v4 · 2025-10-29 · cs.LG
Record completeness
Claims
GRPO-SG, a simple token-weighted variant of GRPO, downweights tokens likely to cause overly large gradients, reducing sharp updates and stabilizing optimization, thereby improving generalization in RLVR.
That the generalization loss is upper bounded by a combination of the empirical loss and a sharpness surrogate measured by the gradient norm, and that downweighting high-gradient tokens will reliably reduce this sharpness in the RLVR setting for LLMs.
GRPO-SG is a sharpness-guided token-weighted variant of GRPO that downweights high-gradient tokens to stabilize optimization and improve generalization in reinforcement learning with verifiable rewards.
Cited by
Receipt and verification
| First computed | 2026-05-18T03:09:33.672129Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
93c630d62d96fd4ddb40c4a0c63a6390d1fe33a219909cc0a01d9a8d3f0f1d3d
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/SPDDBVRNS36U3W2AYSQMMOTDSD \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 93c630d62d96fd4ddb40c4a0c63a6390d1fe33a219909cc0a01d9a8d3f0f1d3d
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "69ae622ee2d31532a890af88e4480e6416911004d68999429d7b8d6f7b2cc7d7",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2025-10-29T08:07:47Z",
"title_canon_sha256": "31f0c3e1f267c875ebb9f805cfd7b65021ac2103745a48b4901fd85355cf81c9"
},
"schema_version": "1.0",
"source": {
"id": "2511.00066",
"kind": "arxiv",
"version": 4
}
}