Pith Number

pith:QAEGPR6G

pith:2025:QAEGPR6GQVUOWUPBXPUKW3PMDG

not attested not anchored not stored refs resolved

Soft Adaptive Policy Optimization

An Yang, Bowen Yu, Chang Gao, Chujie Zheng, Jingren Zhou, Junyang Lin, Kai Dang, Shixuan Liu, Shuai Bai, Xiong-Hui Chen

A smooth temperature-controlled gate replaces hard clipping to stabilize reinforcement learning updates for language models.

arxiv:2511.20347 v2 · 2025-11-25 · cs.LG · cs.AI · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{QAEGPR6GQVUOWUPBXPUKW3PMDG}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Empirical results on mathematical reasoning benchmarks indicate that SAPO exhibits improved training stability and higher Pass@1 performance under comparable training budgets. Moreover, we employ SAPO to train the Qwen3-VL model series, demonstrating that SAPO yields consistent performance gains across diverse tasks and different model sizes.

C2weakest assumption

That the smooth temperature-controlled gate selectively attenuates only harmful off-policy signals without suppressing useful learning gradients or introducing new instabilities that hard clipping avoided.

C3one line summary

SAPO introduces smooth adaptive gating to replace hard clipping in token- and sequence-level policy optimization for more stable LLM reinforcement learning.

References

12 extracted · 12 resolved · 5 Pith anchors

[1] Aime problems and solutions 2025

[2] The sufficiency of off-policyness and soft clipping: Ppo is still insufficient according to an off-policy measure 2023

[3] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 2025 · arXiv:2501.12948

[4] HMMT . Hmmt 2025. https://www.hmmt.org, 2025 2025

[5] LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code 2024 · arXiv:2403.07974

Formal links

2 machine-checked theorem links

Cited by

40 papers in Pith

Reinforcement Learning from Human Feedback

A Brief Overview: On-Policy Self-Distillation In Large Language Models

Clipping Bottleneck: Stabilizing RLVR via Stochastic Recovery of Near-Boundary Signals

Rethinking the Design Space of Reinforcement Learning for Diffusion Models: On the Importance of Likelihood Estimation Beyond Loss Design

Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards

Receipt and verification

First computed	2026-05-17T23:38:53.164767Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

800867c7c68568eb51e1bbe8ab6dec19ad280eec8e7f7d09d5ee95de3cb8e144

Aliases

arxiv: 2511.20347 · arxiv_version: 2511.20347v2 · doi: 10.48550/arxiv.2511.20347 · pith_short_12: QAEGPR6GQVUO · pith_short_16: QAEGPR6GQVUOWUPB · pith_short_8: QAEGPR6G

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/QAEGPR6GQVUOWUPBXPUKW3PMDG \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 800867c7c68568eb51e1bbe8ab6dec19ad280eec8e7f7d09d5ee95de3cb8e144

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "87ba5eef28f6f15dd14bb0c369fff2172ee7a06436f5bd193fe0f7ecba7898a4",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-11-25T14:25:19Z",
    "title_canon_sha256": "9dbd102b340aee7e9177a9024622d6d374cd500709201ceaa0298a3715c1ed8d"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2511.20347",
    "kind": "arxiv",
    "version": 2
  }
}