Pith Number

pith:LEGNLGLF

pith:2025:LEGNLGLF34JZ4ZJ53CLHW7TWJI

not attested not anchored not stored refs resolved

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Binhang Yuan, Chen Zhu, Chuyi He, Guo Wei, Jiashu Wang, Jiaxuan Gao, Jun Mei, Shusheng Xu, Tongkai Yang, Wei Fu, Xujie Shen, Yi Wu, Zhiyu Mei

AReaL decouples generation from training in reinforcement learning to achieve up to 2.77 times faster training for language models on reasoning tasks.

arxiv:2505.24298 v5 · 2025-05-30 · cs.LG · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{LEGNLGLF34JZ4ZJ53CLHW7TWJI}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

AReaL achieves up to 2.77× training speedup compared to synchronous systems with the same number of GPUs and matched or improved final performance.

C2weakest assumption

That workload balancing between rollout and training workers plus the staleness-enhanced PPO variant can keep training stable and effective despite using outdated samples.

C3one line summary

AReaL decouples generation and training in LLM reinforcement learning to achieve up to 2.77x speedup with matched or better performance on math and code benchmarks.

References

50 extracted · 50 resolved · 10 Pith anchors

[2] Dota 2 with Large Scale Deep Reinforcement Learning 1912 · arXiv:1912.06680

[3] Evaluating Large Language Models Trained on Code 2021 · arXiv:2107.03374

[4] Z. Chen, A. May, R. Svirschevski, Y . Huang, M. Ryabinin, Z. Jia, and B. Chen. Se- quoia: Scalable and robust speculative decoding. In A. Globerson, L. Mackey, D. Bel- grave, A. Fan, U. Paquet, J. Tom 2024

[5] Training Verifiers to Solve Math Word Problems 2021 · arXiv:2110.14168

[7] L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V . Mnih, T. Ward, Y . Doron, V . Firoiu, T. Harley, I. Dunning, S. Legg, and K. Kavukcuoglu. IMPALA: scalable distributed deep-rl with impor- tance weigh 2018

Cited by

38 papers in Pith

Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor

Reinforcement Learning from Human Feedback

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment

Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor

Receipt and verification

First computed	2026-05-17T23:38:52.357524Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

590cd59965df139e653dd8967b7e764a2f9f4e826de7f9ec26f739013c03a5b5

Aliases

arxiv: 2505.24298 · arxiv_version: 2505.24298v5 · doi: 10.48550/arxiv.2505.24298 · pith_short_12: LEGNLGLF34JZ · pith_short_16: LEGNLGLF34JZ4ZJ5 · pith_short_8: LEGNLGLF

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/LEGNLGLF34JZ4ZJ53CLHW7TWJI \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 590cd59965df139e653dd8967b7e764a2f9f4e826de7f9ec26f739013c03a5b5

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "02100390d6c07329c4b3e7edcd670ebbfc27e4c0175403a6bccdd413babab26e",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-05-30T07:18:25Z",
    "title_canon_sha256": "ff58ae21a309efe1d147bd9a0bcb0a0041e651f751a57c5a6b49ee3abc1e73de"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2505.24298",
    "kind": "arxiv",
    "version": 5
  }
}