{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:MRDWRL3OM5EXWVYIYD7TZKZIZG","short_pith_number":"pith:MRDWRL3O","schema_version":"1.0","canonical_sha256":"644768af6e67497b5708c0ff3cab28c98ff9cc5e4125a68c2b8b7f77bb4af1f2","source":{"kind":"arxiv","id":"2508.05004","version":4},"attestation_state":"computed","paper":{"title":"R-Zero: Self-Evolving Reasoning LLM from Zero Data","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"R-Zero lets a base LLM create its own reasoning tasks by co-evolving a Challenger that proposes hard problems and a Solver that learns to solve them, with no human data or labels required.","cross_cats":["cs.AI","cs.CL"],"primary_cat":"cs.LG","authors_text":"Chengsong Huang, Dong Yu, Haitao Mi, Hongming Zhang, Jiaxin Huang, Ruosen Li, Wenhao Yu, Xiaoyang Wang, Zongxia Li","submitted_at":"2025-08-07T03:38:16Z","abstract_excerpt":"Self-evolving Large Language Models (LLMs) offer a scalable path toward super-intelligence by autonomously generating, refining, and learning from their own experiences. However, existing methods for training such models still rely heavily on vast human-curated tasks and labels, typically via fine-tuning or reinforcement learning, which poses a fundamental bottleneck to advancing AI systems toward capabilities beyond human intelligence. To overcome this limitation, we introduce R-Zero, a fully autonomous framework that generates its own training data from scratch. Starting from a single base L"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":true},"canonical_record":{"source":{"id":"2508.05004","kind":"arxiv","version":4},"metadata":{"license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","primary_cat":"cs.LG","submitted_at":"2025-08-07T03:38:16Z","cross_cats_sorted":["cs.AI","cs.CL"],"title_canon_sha256":"28e6dc030c05c9b1bd153d4c38293c8d96f6bfbd03c940830afe4162def8544e","abstract_canon_sha256":"b9e978157adc5a2629f1b85bdce9cf6c30705ad48fc37a0df5b12d4f8f7994f1"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:39:22.079613Z","signature_b64":"gizSNUrT1kDzCzibQaHg2/yUYKTJxH7GdbsNK4+qQXJDxWHTneVULYkPE+o0qdYHtzdsYVFcV5fwbjnx/GbnDw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"644768af6e67497b5708c0ff3cab28c98ff9cc5e4125a68c2b8b7f77bb4af1f2","last_reissued_at":"2026-05-17T23:39:22.078871Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:39:22.078871Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"R-Zero: Self-Evolving Reasoning LLM from Zero Data","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"R-Zero lets a base LLM create its own reasoning tasks by co-evolving a Challenger that proposes hard problems and a Solver that learns to solve them, with no human data or labels required.","cross_cats":["cs.AI","cs.CL"],"primary_cat":"cs.LG","authors_text":"Chengsong Huang, Dong Yu, Haitao Mi, Hongming Zhang, Jiaxin Huang, Ruosen Li, Wenhao Yu, Xiaoyang Wang, Zongxia Li","submitted_at":"2025-08-07T03:38:16Z","abstract_excerpt":"Self-evolving Large Language Models (LLMs) offer a scalable path toward super-intelligence by autonomously generating, refining, and learning from their own experiences. However, existing methods for training such models still rely heavily on vast human-curated tasks and labels, typically via fine-tuning or reinforcement learning, which poses a fundamental bottleneck to advancing AI systems toward capabilities beyond human intelligence. To overcome this limitation, we introduce R-Zero, a fully autonomous framework that generates its own training data from scratch. Starting from a single base L"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"R-Zero substantially improves reasoning capability across different backbone LLMs, e.g., boosting the Qwen3-4B-Base by +6.49 on math-reasoning benchmarks and +7.54 on general-domain reasoning benchmarks.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the reward signals for the Challenger (proposing tasks near the edge of Solver capability) and Solver (solving those tasks) can be defined and optimized without any external human data or labels while still producing genuine capability gains rather than reward hacking or mode collapse.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"R-Zero lets a base LLM bootstrap its own reasoning curriculum by pitting a Challenger model against a Solver model that co-evolve through autonomous task generation and solution.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"R-Zero lets a base LLM create its own reasoning tasks by co-evolving a Challenger that proposes hard problems and a Solver that learns to solve them, with no human data or labels required.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"385ae95a7795a1eee837ecb45abb954633c92cd8dd88eaf68f8e4802c9c33ebd"},"source":{"id":"2508.05004","kind":"arxiv","version":4},"verdict":{"id":"0534890d-5e2b-4a9f-9b72-43daf99cbc19","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T19:18:34.136554Z","strongest_claim":"R-Zero substantially improves reasoning capability across different backbone LLMs, e.g., boosting the Qwen3-4B-Base by +6.49 on math-reasoning benchmarks and +7.54 on general-domain reasoning benchmarks.","one_line_summary":"R-Zero lets a base LLM bootstrap its own reasoning curriculum by pitting a Challenger model against a Solver model that co-evolve through autonomous task generation and solution.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the reward signals for the Challenger (proposing tasks near the edge of Solver capability) and Solver (solving those tasks) can be defined and optimized without any external human data or labels while still producing genuine capability gains rather than reward hacking or mode collapse.","pith_extraction_headline":"R-Zero lets a base LLM create its own reasoning tasks by co-evolving a Challenger that proposes hard problems and a Solver that learns to solve them, with no human data or labels required."},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"b969f89b64b4b0ad4e2cd2162fe02902c4e8f98463deba6f8f4c9d78314e8a6e"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2508.05004","created_at":"2026-05-17T23:39:22.078997+00:00"},{"alias_kind":"arxiv_version","alias_value":"2508.05004v4","created_at":"2026-05-17T23:39:22.078997+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2508.05004","created_at":"2026-05-17T23:39:22.078997+00:00"},{"alias_kind":"pith_short_12","alias_value":"MRDWRL3OM5EX","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"MRDWRL3OM5EXWVYI","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"MRDWRL3O","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":42,"internal_anchor_count":42,"sample":[{"citing_arxiv_id":"2605.22905","citing_title":"EVE-Agent: Evidence-Verifiable Self-Evolving Agents","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2603.08403","citing_title":"SPIRAL: Self-Evolving Action-Conditioned Video Generation via Reflective Planning Agents","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21931","citing_title":"EvoVid: Temporal-Centric Self-Evolution for Video Large Language Models","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2512.18552","citing_title":"Toward Training Superintelligent Software Agents through Self-Play SWE-RL","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20189","citing_title":"SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation","ref_index":42,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20914","citing_title":"RISE: Reliable Improvement in Self-Evolving Vision-Language Models","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06638","citing_title":"Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key","ref_index":74,"is_internal_anchor":true},{"citing_arxiv_id":"2605.16727","citing_title":"PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play","ref_index":60,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17037","citing_title":"D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2509.02547","citing_title":"The Landscape of Agentic Reinforcement Learning for LLMs: A Survey","ref_index":171,"is_internal_anchor":true},{"citing_arxiv_id":"2509.14274","citing_title":"Discovering New Theorems via LLMs with In-Context Proof Learning in Lean","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2509.08827","citing_title":"A Survey of Reinforcement Learning for Large Reasoning Models","ref_index":209,"is_internal_anchor":true},{"citing_arxiv_id":"2511.09907","citing_title":"Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2508.07407","citing_title":"A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems","ref_index":38,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16322","citing_title":"Steerable Instruction Following Coding Data Synthesis with Actor-Parametric Schema Co-Evolution","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13880","citing_title":"PREPING: Building Agent Memory without Tasks","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13369","citing_title":"Query-Conditioned Test-Time Self-Training for Large Language Models","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09423","citing_title":"SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning","ref_index":38,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11679","citing_title":"Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion","ref_index":62,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13369","citing_title":"Query-Conditioned Test-Time Self-Training for Large Language Models","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13803","citing_title":"EvoGround: Self-Evolving Video Agents for Video Temporal Grounding","ref_index":38,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13775","citing_title":"RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data","ref_index":65,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03472","citing_title":"Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11636","citing_title":"Seir\\^enes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11679","citing_title":"Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion","ref_index":62,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/MRDWRL3OM5EXWVYIYD7TZKZIZG","json":"https://pith.science/pith/MRDWRL3OM5EXWVYIYD7TZKZIZG.json","graph_json":"https://pith.science/api/pith-number/MRDWRL3OM5EXWVYIYD7TZKZIZG/graph.json","events_json":"https://pith.science/api/pith-number/MRDWRL3OM5EXWVYIYD7TZKZIZG/events.json","paper":"https://pith.science/paper/MRDWRL3O"},"agent_actions":{"view_html":"https://pith.science/pith/MRDWRL3OM5EXWVYIYD7TZKZIZG","download_json":"https://pith.science/pith/MRDWRL3OM5EXWVYIYD7TZKZIZG.json","view_paper":"https://pith.science/paper/MRDWRL3O","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2508.05004&json=true","fetch_graph":"https://pith.science/api/pith-number/MRDWRL3OM5EXWVYIYD7TZKZIZG/graph.json","fetch_events":"https://pith.science/api/pith-number/MRDWRL3OM5EXWVYIYD7TZKZIZG/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/MRDWRL3OM5EXWVYIYD7TZKZIZG/action/timestamp_anchor","attest_storage":"https://pith.science/pith/MRDWRL3OM5EXWVYIYD7TZKZIZG/action/storage_attestation","attest_author":"https://pith.science/pith/MRDWRL3OM5EXWVYIYD7TZKZIZG/action/author_attestation","sign_citation":"https://pith.science/pith/MRDWRL3OM5EXWVYIYD7TZKZIZG/action/citation_signature","submit_replication":"https://pith.science/pith/MRDWRL3OM5EXWVYIYD7TZKZIZG/action/replication_record"}},"created_at":"2026-05-17T23:39:22.078997+00:00","updated_at":"2026-05-17T23:39:22.078997+00:00"}