pith:SGTFOMT5
STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens
Silencing gradients from a tiny fraction of spurious tokens stabilizes RL fine-tuning of LLMs and raises math reasoning performance.
arxiv:2602.15620 v5 · 2026-02-17 · cs.CL · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{SGTFOMT5U2N567NN2QIFNSYAUR}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Across six mathematical reasoning benchmarks using Qwen 1.7B, 8B, and 14B base models, STAPO consistently demonstrates superior entropy stability and achieves an average performance improvement of 11.49% (ρ_T=1.0, top-p=1.0) and 3.73% (ρ_T=0.7, top-p=0.9) over GRPO, 20-Entropy, and JustRL.
That the identified spurious tokens (0.01% fraction) are the primary driver of instability and that silencing their gradients does not discard useful reasoning signal or introduce new biases in the policy update.
STAPO stabilizes RL for LLMs by suppressing gradient updates from rare spurious tokens, yielding 11.49% average gains on math benchmarks over GRPO and similar baselines.
Cited by
Receipt and verification
| First computed | 2026-05-26T02:04:06.525769Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
91a657327da69bdf7dadd41056cb00a44df0832c7bc88a360b85393174efae65
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/SGTFOMT5U2N567NN2QIFNSYAUR \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 91a657327da69bdf7dadd41056cb00a44df0832c7bc88a360b85393174efae65
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "34b2f2138427fe72e67dea67a30b5e4808a9e4c70f93b18271adfbbecc62bd0b",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-02-17T14:46:48Z",
"title_canon_sha256": "b046897e425ba8f1ea06345a01ddc9783375117ae852d676dc9c4e1e7e9e70f1"
},
"schema_version": "1.0",
"source": {
"id": "2602.15620",
"kind": "arxiv",
"version": 5
}
}