Pith Number

pith:JNRSKZFD

pith:2026:JNRSKZFDN464UNAJYUJBLMC43F

not attested not anchored not stored refs pending

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

Abhinav Gullapalli, Chenlu Ye, Hao Chen, Jing Huang, Tong Zhang, Xuanchang Zhang, Yifan Hao, Zhou Yu, Ziji Zhang

Injecting learnable perturbations into hidden states of each layer stabilizes LLM reinforcement learning by flattening policy distributions and reducing importance ratio tails.

arxiv:2603.19470 v3 · 2026-03-19 · cs.LG · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{JNRSKZFDN464UNAJYUJBLMC43F}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

ALP prevents the updated policy from deviating too sharply from the inference policy and enlarges the policy family to cover inference-time mismatch noise, thereby maintaining training stability and improving performance on math and tool tasks.

C2weakest assumption

That small learnable perturbations added to intermediate hidden states will reliably flatten the policy distribution and reduce importance-ratio tails without introducing new instabilities or degrading the quality of the learned policy.

C3one line summary

ALP adds learnable perturbations to layer hidden states to flatten policy distributions and stabilize off-policy RL training for LLMs.

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-20T00:02:10.767538Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

4b632564a36f3dca3409c51215b05cd967f49fd3f4f9ff48a7a86de0491adef1

Aliases

arxiv: 2603.19470 · arxiv_version: 2603.19470v3 · doi: 10.48550/arxiv.2603.19470 · pith_short_12: JNRSKZFDN464 · pith_short_16: JNRSKZFDN464UNAJ · pith_short_8: JNRSKZFD

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/JNRSKZFDN464UNAJYUJBLMC43F \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4b632564a36f3dca3409c51215b05cd967f49fd3f4f9ff48a7a86de0491adef1

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "6c69edb49c9a928a5773c5cc22ec3547f0d62b4ade877e63880c6c3555b5745a",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-03-19T21:04:17Z",
    "title_canon_sha256": "64b4efc2583088e55a949ca5f995dfda42d304439bbcd5b1fb46018925ba2b3a"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2603.19470",
    "kind": "arxiv",
    "version": 3
  }
}