pith:JNRSKZFD
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL
Injecting learnable perturbations into hidden states of each layer stabilizes LLM reinforcement learning by flattening policy distributions and reducing importance ratio tails.
arxiv:2603.19470 v3 · 2026-03-19 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JNRSKZFDN464UNAJYUJBLMC43F}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
ALP prevents the updated policy from deviating too sharply from the inference policy and enlarges the policy family to cover inference-time mismatch noise, thereby maintaining training stability and improving performance on math and tool tasks.
That small learnable perturbations added to intermediate hidden states will reliably flatten the policy distribution and reduce importance-ratio tails without introducing new instabilities or degrading the quality of the learned policy.
ALP adds learnable perturbations to layer hidden states to flatten policy distributions and stabilize off-policy RL training for LLMs.
Formal links
Receipt and verification
| First computed | 2026-05-20T00:02:10.767538Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
4b632564a36f3dca3409c51215b05cd967f49fd3f4f9ff48a7a86de0491adef1
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JNRSKZFDN464UNAJYUJBLMC43F \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4b632564a36f3dca3409c51215b05cd967f49fd3f4f9ff48a7a86de0491adef1
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "6c69edb49c9a928a5773c5cc22ec3547f0d62b4ade877e63880c6c3555b5745a",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-03-19T21:04:17Z",
"title_canon_sha256": "64b4efc2583088e55a949ca5f995dfda42d304439bbcd5b1fb46018925ba2b3a"
},
"schema_version": "1.0",
"source": {
"id": "2603.19470",
"kind": "arxiv",
"version": 3
}
}