pith. sign in
Pith Number

pith:JNRSKZFD

pith:2026:JNRSKZFDN464UNAJYUJBLMC43F
not attested not anchored not stored refs pending

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

Abhinav Gullapalli, Chenlu Ye, Hao Chen, Jing Huang, Tong Zhang, Xuanchang Zhang, Yifan Hao, Zhou Yu, Ziji Zhang

Injecting learnable perturbations into hidden states of each layer stabilizes LLM reinforcement learning by flattening policy distributions and reducing importance ratio tails.

arxiv:2603.19470 v3 · 2026-03-19 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JNRSKZFDN464UNAJYUJBLMC43F}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

ALP prevents the updated policy from deviating too sharply from the inference policy and enlarges the policy family to cover inference-time mismatch noise, thereby maintaining training stability and improving performance on math and tool tasks.

C2weakest assumption

That small learnable perturbations added to intermediate hidden states will reliably flatten the policy distribution and reduce importance-ratio tails without introducing new instabilities or degrading the quality of the learned policy.

C3one line summary

ALP adds learnable perturbations to layer hidden states to flatten policy distributions and stabilize off-policy RL training for LLMs.

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:02:10.767538Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

4b632564a36f3dca3409c51215b05cd967f49fd3f4f9ff48a7a86de0491adef1

Aliases

arxiv: 2603.19470 · arxiv_version: 2603.19470v3 · doi: 10.48550/arxiv.2603.19470 · pith_short_12: JNRSKZFDN464 · pith_short_16: JNRSKZFDN464UNAJ · pith_short_8: JNRSKZFD
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JNRSKZFDN464UNAJYUJBLMC43F \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4b632564a36f3dca3409c51215b05cd967f49fd3f4f9ff48a7a86de0491adef1
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "6c69edb49c9a928a5773c5cc22ec3547f0d62b4ade877e63880c6c3555b5745a",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-03-19T21:04:17Z",
    "title_canon_sha256": "64b4efc2583088e55a949ca5f995dfda42d304439bbcd5b1fb46018925ba2b3a"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2603.19470",
    "kind": "arxiv",
    "version": 3
  }
}