pith:QAEGPR6G
Soft Adaptive Policy Optimization
A smooth temperature-controlled gate replaces hard clipping to stabilize reinforcement learning updates for language models.
arxiv:2511.20347 v2 · 2025-11-25 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{QAEGPR6GQVUOWUPBXPUKW3PMDG}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Empirical results on mathematical reasoning benchmarks indicate that SAPO exhibits improved training stability and higher Pass@1 performance under comparable training budgets. Moreover, we employ SAPO to train the Qwen3-VL model series, demonstrating that SAPO yields consistent performance gains across diverse tasks and different model sizes.
That the smooth temperature-controlled gate selectively attenuates only harmful off-policy signals without suppressing useful learning gradients or introducing new instabilities that hard clipping avoided.
SAPO introduces smooth adaptive gating to replace hard clipping in token- and sequence-level policy optimization for more stable LLM reinforcement learning.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:53.164767Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
800867c7c68568eb51e1bbe8ab6dec19ad280eec8e7f7d09d5ee95de3cb8e144
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/QAEGPR6GQVUOWUPBXPUKW3PMDG \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 800867c7c68568eb51e1bbe8ab6dec19ad280eec8e7f7d09d5ee95de3cb8e144
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "87ba5eef28f6f15dd14bb0c369fff2172ee7a06436f5bd193fe0f7ecba7898a4",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2025-11-25T14:25:19Z",
"title_canon_sha256": "9dbd102b340aee7e9177a9024622d6d374cd500709201ceaa0298a3715c1ed8d"
},
"schema_version": "1.0",
"source": {
"id": "2511.20347",
"kind": "arxiv",
"version": 2
}
}