pith. sign in
Pith Number

pith:SGTFOMT5

pith:2026:SGTFOMT5U2N567NN2QIFNSYAUR
not attested not anchored not stored refs pending

STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens

Bo Zhang, Guojian Zhan, Jiang Wu, Jingliang Duan, Kehua Sheng, Keqiang Li, Letian Tao, Shengbo Eben Li, Shiqi Liu, Yang Guan, Yinuo Wang, Zeyu He, Zhilong Zheng

Silencing gradients from a tiny fraction of spurious tokens stabilizes RL fine-tuning of LLMs and raises math reasoning performance.

arxiv:2602.15620 v5 · 2026-02-17 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{SGTFOMT5U2N567NN2QIFNSYAUR}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across six mathematical reasoning benchmarks using Qwen 1.7B, 8B, and 14B base models, STAPO consistently demonstrates superior entropy stability and achieves an average performance improvement of 11.49% (ρ_T=1.0, top-p=1.0) and 3.73% (ρ_T=0.7, top-p=0.9) over GRPO, 20-Entropy, and JustRL.

C2weakest assumption

That the identified spurious tokens (0.01% fraction) are the primary driver of instability and that silencing their gradients does not discard useful reasoning signal or introduce new biases in the policy update.

C3one line summary

STAPO stabilizes RL for LLMs by suppressing gradient updates from rare spurious tokens, yielding 11.49% average gains on math benchmarks over GRPO and similar baselines.

Cited by

2 papers in Pith

Receipt and verification
First computed 2026-05-26T02:04:06.525769Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

91a657327da69bdf7dadd41056cb00a44df0832c7bc88a360b85393174efae65

Aliases

arxiv: 2602.15620 · arxiv_version: 2602.15620v5 · doi: 10.48550/arxiv.2602.15620 · pith_short_12: SGTFOMT5U2N5 · pith_short_16: SGTFOMT5U2N567NN · pith_short_8: SGTFOMT5
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/SGTFOMT5U2N567NN2QIFNSYAUR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 91a657327da69bdf7dadd41056cb00a44df0832c7bc88a360b85393174efae65
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "34b2f2138427fe72e67dea67a30b5e4808a9e4c70f93b18271adfbbecc62bd0b",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-02-17T14:46:48Z",
    "title_canon_sha256": "b046897e425ba8f1ea06345a01ddc9783375117ae852d676dc9c4e1e7e9e70f1"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.15620",
    "kind": "arxiv",
    "version": 5
  }
}