Pith Number

pith:YWC3IJHJ

pith:2025:YWC3IJHJ47CJKCAFEB2EW4RBR2

not attested not anchored not stored refs resolved

The Art of Scaling Reinforcement Learning Compute for LLMs

David Brandfonbrener, Devvrit Khatri, Inderjit S. Dhillon, Lovish Madaan, Manzil Zaheer, Rachit Bansal, Rishabh Agarwal, Rishabh Tiwari, Sai Surya Duvvuri

RL training for LLMs follows predictable sigmoidal scaling curves that enable extrapolation from small-scale runs.

arxiv:2510.13786 v1 · 2025-10-15 · cs.LG · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{YWC3IJHJ47CJKCAFEB2EW4RBR2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Stable, scalable recipes follow predictable scaling trajectories, enabling extrapolation from smaller-scale runs. We demonstrate its effectiveness by successfully scaling and predicting validation performance on a single RL run scaled up to 100,000 GPU-hours.

C2weakest assumption

That the sigmoidal functional form fitted to smaller-scale runs will continue to hold and allow accurate extrapolation at scales an order of magnitude larger, and that the ablated design choices capture the dominant factors that determine asymptotic performance versus efficiency.

C3one line summary

A 400k+ GPU-hour study shows RL scaling in LLMs follows predictable sigmoidal trajectories, with most design choices affecting efficiency rather than the performance asymptote, enabling accurate large-scale predictions via the ScaleRL recipe.

References

36 extracted · 36 resolved · 14 Pith anchors

[1] URLhttps://hkunlp.github.io/blog/2025/Polaris. AoPS. AIME problem set 1983-2025, 2025

[2] Cwm: An open-weights llm for research on code generation with world models

[3] The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models · arXiv:2505.22617

[4] GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning · arXiv:2507.01006

[5] Measuring Mathematical Problem Solving With the MATH Dataset · doi:10.64434/tml.20250910

Formal links

3 machine-checked theorem links

Cited by

27 papers in Pith

Reinforcement Learning from Human Feedback

Toward Training Superintelligent Software Agents through Self-Play SWE-RL

Rethinking the Design Space of Reinforcement Learning for Diffusion Models: On the Importance of Likelihood Estimation Beyond Loss Design

Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Receipt and verification

First computed	2026-05-17T23:38:47.304966Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

c585b424e9e7c495080520744b72218e966160a63d15d1223b48ca4c80d67e12

Aliases

arxiv: 2510.13786 · arxiv_version: 2510.13786v1 · doi: 10.48550/arxiv.2510.13786 · pith_short_12: YWC3IJHJ47CJ · pith_short_16: YWC3IJHJ47CJKCAF · pith_short_8: YWC3IJHJ

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/YWC3IJHJ47CJKCAFEB2EW4RBR2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c585b424e9e7c495080520744b72218e966160a63d15d1223b48ca4c80d67e12

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "713ae47eea08fff4bed2b11c38746e1499694d17cccc4516db60778642b19026",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-10-15T17:43:03Z",
    "title_canon_sha256": "9487e005a66954e91c32149adfded5424cdd518509be36aa2df7a2394e1bcee8"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2510.13786",
    "kind": "arxiv",
    "version": 1
  }
}