pith:T7VQIBNL
RL's Razor: Why Online Reinforcement Learning Forgets Less
On-policy RL forgets less than SFT because it selects the minimal-KL solution to new tasks among many possibilities.
arxiv:2509.04259 v1 · 2025-09-04 · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{T7VQIBNLLTE2TBPWOMMR5JEKYR}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
on-policy RL is implicitly biased towards KL-minimal solutions among the many that solve the new task, whereas SFT can converge to distributions arbitrarily far from the base model
that the observed degree of forgetting is determined by the KL-divergence between fine-tuned and base policy evaluated on the new task
Online RL fine-tuning forgets less than SFT because it is implicitly biased toward KL-minimal solutions among all policies that solve the new task.
References
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:15.049025Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
9feb0405ab5cc9a985f673191ea48ac45049de8d784429319118892e7c71d92e
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/T7VQIBNLLTE2TBPWOMMR5JEKYR \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9feb0405ab5cc9a985f673191ea48ac45049de8d784429319118892e7c71d92e
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "06de02bce6ff42333c41e0a972d2169fd6b955543aa8d13f5ef008acba3af663",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2025-09-04T14:38:08Z",
"title_canon_sha256": "9b5cf8274c7ae4584cabcb5cbf3d95444013f1dba694d01c57bdbdf3abbf1241"
},
"schema_version": "1.0",
"source": {
"id": "2509.04259",
"kind": "arxiv",
"version": 1
}
}