pith. sign in
Pith Number

pith:LEGNLGLF

pith:2025:LEGNLGLF34JZ4ZJ53CLHW7TWJI
not attested not anchored not stored refs resolved

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Binhang Yuan, Chen Zhu, Chuyi He, Guo Wei, Jiashu Wang, Jiaxuan Gao, Jun Mei, Shusheng Xu, Tongkai Yang, Wei Fu, Xujie Shen, Yi Wu, Zhiyu Mei

AReaL decouples generation from training in reinforcement learning to achieve up to 2.77 times faster training for language models on reasoning tasks.

arxiv:2505.24298 v5 · 2025-05-30 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{LEGNLGLF34JZ4ZJ53CLHW7TWJI}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

AReaL achieves up to 2.77× training speedup compared to synchronous systems with the same number of GPUs and matched or improved final performance.

C2weakest assumption

That workload balancing between rollout and training workers plus the staleness-enhanced PPO variant can keep training stable and effective despite using outdated samples.

C3one line summary

AReaL decouples generation and training in LLM reinforcement learning to achieve up to 2.77x speedup with matched or better performance on math and code benchmarks.

References

50 extracted · 50 resolved · 10 Pith anchors

[2] Dota 2 with Large Scale Deep Reinforcement Learning 1912 · arXiv:1912.06680
[3] Evaluating Large Language Models Trained on Code 2021 · arXiv:2107.03374
[4] Z. Chen, A. May, R. Svirschevski, Y . Huang, M. Ryabinin, Z. Jia, and B. Chen. Se- quoia: Scalable and robust speculative decoding. In A. Globerson, L. Mackey, D. Bel- grave, A. Fan, U. Paquet, J. Tom 2024
[5] Training Verifiers to Solve Math Word Problems 2021 · arXiv:2110.14168
[7] L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V . Mnih, T. Ward, Y . Doron, V . Firoiu, T. Harley, I. Dunning, S. Legg, and K. Kavukcuoglu. IMPALA: scalable distributed deep-rl with impor- tance weigh 2018

Cited by

38 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.357524Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

590cd59965df139e653dd8967b7e764a2f9f4e826de7f9ec26f739013c03a5b5

Aliases

arxiv: 2505.24298 · arxiv_version: 2505.24298v5 · doi: 10.48550/arxiv.2505.24298 · pith_short_12: LEGNLGLF34JZ · pith_short_16: LEGNLGLF34JZ4ZJ5 · pith_short_8: LEGNLGLF
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LEGNLGLF34JZ4ZJ53CLHW7TWJI \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 590cd59965df139e653dd8967b7e764a2f9f4e826de7f9ec26f739013c03a5b5
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "02100390d6c07329c4b3e7edcd670ebbfc27e4c0175403a6bccdd413babab26e",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-05-30T07:18:25Z",
    "title_canon_sha256": "ff58ae21a309efe1d147bd9a0bcb0a0041e651f751a57c5a6b49ee3abc1e73de"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2505.24298",
    "kind": "arxiv",
    "version": 5
  }
}