pith:MRDWRL3O
R-Zero: Self-Evolving Reasoning LLM from Zero Data
R-Zero lets a base LLM create its own reasoning tasks by co-evolving a Challenger that proposes hard problems and a Solver that learns to solve them, with no human data or labels required.
arxiv:2508.05004 v4 · 2025-08-07 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MRDWRL3OM5EXWVYIYD7TZKZIZG}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
R-Zero substantially improves reasoning capability across different backbone LLMs, e.g., boosting the Qwen3-4B-Base by +6.49 on math-reasoning benchmarks and +7.54 on general-domain reasoning benchmarks.
That the reward signals for the Challenger (proposing tasks near the edge of Solver capability) and Solver (solving those tasks) can be defined and optimized without any external human data or labels while still producing genuine capability gains rather than reward hacking or mode collapse.
R-Zero lets a base LLM bootstrap its own reasoning curriculum by pitting a Challenger model against a Solver model that co-evolve through autonomous task generation and solution.
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:39:22.078871Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
644768af6e67497b5708c0ff3cab28c98ff9cc5e4125a68c2b8b7f77bb4af1f2
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MRDWRL3OM5EXWVYIYD7TZKZIZG \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 644768af6e67497b5708c0ff3cab28c98ff9cc5e4125a68c2b8b7f77bb4af1f2
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "b9e978157adc5a2629f1b85bdce9cf6c30705ad48fc37a0df5b12d4f8f7994f1",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2025-08-07T03:38:16Z",
"title_canon_sha256": "28e6dc030c05c9b1bd153d4c38293c8d96f6bfbd03c940830afe4162def8544e"
},
"schema_version": "1.0",
"source": {
"id": "2508.05004",
"kind": "arxiv",
"version": 4
}
}