pith. sign in
Pith Number

pith:MRDWRL3O

pith:2025:MRDWRL3OM5EXWVYIYD7TZKZIZG
not attested not anchored not stored refs pending

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Chengsong Huang, Dong Yu, Haitao Mi, Hongming Zhang, Jiaxin Huang, Ruosen Li, Wenhao Yu, Xiaoyang Wang, Zongxia Li

R-Zero lets a base LLM create its own reasoning tasks by co-evolving a Challenger that proposes hard problems and a Solver that learns to solve them, with no human data or labels required.

arxiv:2508.05004 v4 · 2025-08-07 · cs.LG · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MRDWRL3OM5EXWVYIYD7TZKZIZG}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

R-Zero substantially improves reasoning capability across different backbone LLMs, e.g., boosting the Qwen3-4B-Base by +6.49 on math-reasoning benchmarks and +7.54 on general-domain reasoning benchmarks.

C2weakest assumption

That the reward signals for the Challenger (proposing tasks near the edge of Solver capability) and Solver (solving those tasks) can be defined and optimized without any external human data or labels while still producing genuine capability gains rather than reward hacking or mode collapse.

C3one line summary

R-Zero lets a base LLM bootstrap its own reasoning curriculum by pitting a Challenger model against a Solver model that co-evolve through autonomous task generation and solution.

Formal links

2 machine-checked theorem links

Cited by

42 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:22.078871Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

644768af6e67497b5708c0ff3cab28c98ff9cc5e4125a68c2b8b7f77bb4af1f2

Aliases

arxiv: 2508.05004 · arxiv_version: 2508.05004v4 · doi: 10.48550/arxiv.2508.05004 · pith_short_12: MRDWRL3OM5EX · pith_short_16: MRDWRL3OM5EXWVYI · pith_short_8: MRDWRL3O
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MRDWRL3OM5EXWVYIYD7TZKZIZG \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 644768af6e67497b5708c0ff3cab28c98ff9cc5e4125a68c2b8b7f77bb4af1f2
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "b9e978157adc5a2629f1b85bdce9cf6c30705ad48fc37a0df5b12d4f8f7994f1",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-08-07T03:38:16Z",
    "title_canon_sha256": "28e6dc030c05c9b1bd153d4c38293c8d96f6bfbd03c940830afe4162def8544e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2508.05004",
    "kind": "arxiv",
    "version": 4
  }
}