pith:YWC3IJHJ
The Art of Scaling Reinforcement Learning Compute for LLMs
RL training for LLMs follows predictable sigmoidal scaling curves that enable extrapolation from small-scale runs.
arxiv:2510.13786 v1 · 2025-10-15 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{YWC3IJHJ47CJKCAFEB2EW4RBR2}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Stable, scalable recipes follow predictable scaling trajectories, enabling extrapolation from smaller-scale runs. We demonstrate its effectiveness by successfully scaling and predicting validation performance on a single RL run scaled up to 100,000 GPU-hours.
That the sigmoidal functional form fitted to smaller-scale runs will continue to hold and allow accurate extrapolation at scales an order of magnitude larger, and that the ablated design choices capture the dominant factors that determine asymptotic performance versus efficiency.
A 400k+ GPU-hour study shows RL scaling in LLMs follows predictable sigmoidal trajectories, with most design choices affecting efficiency rather than the performance asymptote, enabling accurate large-scale predictions via the ScaleRL recipe.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:47.304966Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
c585b424e9e7c495080520744b72218e966160a63d15d1223b48ca4c80d67e12
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YWC3IJHJ47CJKCAFEB2EW4RBR2 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c585b424e9e7c495080520744b72218e966160a63d15d1223b48ca4c80d67e12
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "713ae47eea08fff4bed2b11c38746e1499694d17cccc4516db60778642b19026",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2025-10-15T17:43:03Z",
"title_canon_sha256": "9487e005a66954e91c32149adfded5424cdd518509be36aa2df7a2394e1bcee8"
},
"schema_version": "1.0",
"source": {
"id": "2510.13786",
"kind": "arxiv",
"version": 1
}
}