pith. machine review for the scientific record. sign in
Pith Number

pith:SPDDBVRN

pith:2025:SPDDBVRNS36U3W2AYSQMMOTDSD
not attested not anchored not stored refs pending

Sharpness-Guided Group Relative Policy Optimization via Probability Shaping

Linh Ngo Van, Trung Le, Tue Le

GRPO-SG is a sharpness-guided token-weighted variant of GRPO that downweights high-gradient tokens to stabilize optimization and improve generalization in reinforcement learning with verifiable rewards.

arxiv:2511.00066 v4 · 2025-10-29 · cs.LG

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

GRPO-SG, a simple token-weighted variant of GRPO, downweights tokens likely to cause overly large gradients, reducing sharp updates and stabilizing optimization, thereby improving generalization in RLVR.

C2weakest assumption

That the generalization loss is upper bounded by a combination of the empirical loss and a sharpness surrogate measured by the gradient norm, and that downweighting high-gradient tokens will reliably reduce this sharpness in the RLVR setting for LLMs.

C3one line summary

GRPO-SG is a sharpness-guided token-weighted variant of GRPO that downweights high-gradient tokens to stabilize optimization and improve generalization in reinforcement learning with verifiable rewards.

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-18T03:09:33.672129Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

93c630d62d96fd4ddb40c4a0c63a6390d1fe33a219909cc0a01d9a8d3f0f1d3d

Aliases

arxiv: 2511.00066 · arxiv_version: 2511.00066v4 · doi: 10.48550/arxiv.2511.00066 · pith_short_12: SPDDBVRNS36U · pith_short_16: SPDDBVRNS36U3W2A · pith_short_8: SPDDBVRN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/SPDDBVRNS36U3W2AYSQMMOTDSD \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 93c630d62d96fd4ddb40c4a0c63a6390d1fe33a219909cc0a01d9a8d3f0f1d3d
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "69ae622ee2d31532a890af88e4480e6416911004d68999429d7b8d6f7b2cc7d7",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-10-29T08:07:47Z",
    "title_canon_sha256": "31f0c3e1f267c875ebb9f805cfd7b65021ac2103745a48b4901fd85355cf81c9"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2511.00066",
    "kind": "arxiv",
    "version": 4
  }
}