{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:ZYQFF5SARDW4SGOCW2R7236AAQ","short_pith_number":"pith:ZYQFF5SA","schema_version":"1.0","canonical_sha256":"ce2052f64088edc919c2b6a3fd6fc00436fd02c0c7b01a2e6d496d72e771c351","source":{"kind":"arxiv","id":"2505.04588","version":2},"attestation_state":"computed","paper":{"title":"ZeroSearch: Incentivize the Search Capability of LLMs without Searching","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A fine-tuned retrieval module with degrading document quality trains LLMs to match or beat real search engines via RL without live API calls.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Fei Huang, Hao Sun, Jiayan Guo, Jingren Zhou, Pengjun Xie, Xuanbo Fan, Yan Zhang, Yingyan Hou, Yong Jiang, Zile Qiao","submitted_at":"2025-05-07T17:30:22Z","abstract_excerpt":"Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs' search capabilities by interacting with live search engines in real-world environments. While these approaches show promising results, they face two major challenges: (1) Uncontrolled Document Quality: The quality of documents returned by search engines is often unpredictable, introducing noise and instability into the training process. (2) Prohibitively High API Costs: RL training r"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2505.04588","kind":"arxiv","version":2},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CL","submitted_at":"2025-05-07T17:30:22Z","cross_cats_sorted":[],"title_canon_sha256":"da79235b764bba152c0a246fa0c89a521fcd544162d904d39cd880bd9dfefac2","abstract_canon_sha256":"24ee6e81d6ad32332790766812db89532d98c9ec58455280211a714f2719674f"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:13.493633Z","signature_b64":"ZKS3gN5euFI8wZoamQWccgnW9YHVAcTd4dH5rMEY2q/S9iOPWpITHNHtklmw2rwD3PY6+H1QwPBe/dZwCuMuAw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"ce2052f64088edc919c2b6a3fd6fc00436fd02c0c7b01a2e6d496d72e771c351","last_reissued_at":"2026-05-17T23:38:13.492969Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:13.492969Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"ZeroSearch: Incentivize the Search Capability of LLMs without Searching","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A fine-tuned retrieval module with degrading document quality trains LLMs to match or beat real search engines via RL without live API calls.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Fei Huang, Hao Sun, Jiayan Guo, Jingren Zhou, Pengjun Xie, Xuanbo Fan, Yan Zhang, Yingyan Hou, Yong Jiang, Zile Qiao","submitted_at":"2025-05-07T17:30:22Z","abstract_excerpt":"Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs' search capabilities by interacting with live search engines in real-world environments. While these approaches show promising results, they face two major challenges: (1) Uncontrolled Document Quality: The quality of documents returned by search engines is often unpredictable, introducing noise and instability into the training process. (2) Prohibitively High API Costs: RL training r"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"a 7B retrieval module achieves comparable performance to the real search engine, while a 14B retrieval module even surpasses it.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That progressively degrading the quality of documents generated by the fine-tuned retrieval module during curriculum rollouts will reliably elicit and improve the main model's reasoning ability in a manner that transfers to real search engine use.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ZeroSearch simulates search engine interactions via supervised fine-tuning of a retrieval module and curriculum-based RL degradation of document quality, achieving comparable or superior performance to real search engines with 7B and 14B modules.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A fine-tuned retrieval module with degrading document quality trains LLMs to match or beat real search engines via RL without live API calls.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"63add1840a8bfb4e65103ed3716ca65c32a188ef6dac7d9a221af46be07a96d6"},"source":{"id":"2505.04588","kind":"arxiv","version":2},"verdict":{"id":"46b1b71d-e3fc-488f-a232-b6ac842ccf38","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T17:38:01.454336Z","strongest_claim":"a 7B retrieval module achieves comparable performance to the real search engine, while a 14B retrieval module even surpasses it.","one_line_summary":"ZeroSearch simulates search engine interactions via supervised fine-tuning of a retrieval module and curriculum-based RL degradation of document quality, achieving comparable or superior performance to real search engines with 7B and 14B modules.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That progressively degrading the quality of documents generated by the fine-tuned retrieval module during curriculum rollouts will reliably elicit and improve the main model's reasoning ability in a manner that transfers to real search engine use.","pith_extraction_headline":"A fine-tuned retrieval module with degrading document quality trains LLMs to match or beat real search engines via RL without live API calls."},"references":{"count":49,"sample":[{"doi":"","year":2023,"title":"A. Asai, Z. Wu, Y . Wang, A. Sil, and H. Hajishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reflection. In The Twelfth International Conference on Learning Representations","work_id":"996ab5c8-ba86-4ab8-b5a4-d75515b5b93c","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"https://arxiv.org/abs/2212.08037","work_id":"3d95359c-7270-48aa-a841-070fb1259b87","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"PaLM: Scaling Language Modeling with Pathways","work_id":"a94f3ef7-2c49-4445-93fe-6ec16aafd966","ref_index":3,"cited_arxiv_id":"2204.02311","is_internal_anchor":true},{"doi":"","year":2024,"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","ref_index":4,"cited_arxiv_id":"2407.21783","is_internal_anchor":true},{"doi":"","year":2025,"title":"W. Feng, C. Hao, Y . Zhang, J. Song, and H. Wang. Airrag: Activating intrinsic reasoning for retrieval augmented generation via tree-based search. arXiv preprint arXiv:2501.10053, 2025","work_id":"8ef0adb9-684b-4d73-928f-bdf4b8717433","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":49,"snapshot_sha256":"3465e1606eaf41ddc0a675c0597744397e0d3f77a78bdcaa7e9116d26706f4b3","internal_anchors":17},"formal_canon":{"evidence_count":2,"snapshot_sha256":"de7721bd6c3a6a2054a47c0f129ad8be4f90be2cb4a7846abd161fbd1ac5bbeb"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2505.04588","created_at":"2026-05-17T23:38:13.493095+00:00"},{"alias_kind":"arxiv_version","alias_value":"2505.04588v2","created_at":"2026-05-17T23:38:13.493095+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2505.04588","created_at":"2026-05-17T23:38:13.493095+00:00"},{"alias_kind":"pith_short_12","alias_value":"ZYQFF5SARDW4","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"ZYQFF5SARDW4SGOC","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"ZYQFF5SA","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":26,"internal_anchor_count":26,"sample":[{"citing_arxiv_id":"2505.17086","citing_title":"Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning","ref_index":71,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22511","citing_title":"Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17734","citing_title":"Harnessing LLM Agents with Skill Programs","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08401","citing_title":"AIPO: Learning to Reason from Active Interaction","ref_index":61,"is_internal_anchor":true},{"citing_arxiv_id":"2509.02547","citing_title":"The Landscape of Agentic Reinforcement Learning for LLMs: A Survey","ref_index":291,"is_internal_anchor":true},{"citing_arxiv_id":"2510.00861","citing_title":"Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs","ref_index":36,"is_internal_anchor":true},{"citing_arxiv_id":"2511.02805","citing_title":"MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2601.12538","citing_title":"Agentic Reasoning for Large Language Models","ref_index":240,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09038","citing_title":"SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks","ref_index":28,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11611","citing_title":"CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12975","citing_title":"Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation","ref_index":37,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13534","citing_title":"Scaling Retrieval-Augmented Reasoning with Parallel Search and Explicit Merging","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09287","citing_title":"PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12039","citing_title":"SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11611","citing_title":"CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08401","citing_title":"AIPO: Learning to Reason from Active Interaction","ref_index":61,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09038","citing_title":"SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks","ref_index":28,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09287","citing_title":"PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06285","citing_title":"LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG","ref_index":78,"is_internal_anchor":true},{"citing_arxiv_id":"2505.10978","citing_title":"Group-in-Group Policy Optimization for LLM Agent Training","ref_index":59,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08013","citing_title":"Learning CLI Agents with Structured Action Credit under Selective Observation","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2507.20534","citing_title":"Kimi K2: Open Agentic Intelligence","ref_index":70,"is_internal_anchor":true},{"citing_arxiv_id":"2604.14054","citing_title":"$\\pi$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2604.17739","citing_title":"Democratizing Tool Learning with Environments Fully Simulated by a Free 8B Language Model","ref_index":36,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18235","citing_title":"Negative Advantage Is a Double-Edged Sword: Calibrating Advantage in GRPO for Deep Search","ref_index":1,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/ZYQFF5SARDW4SGOCW2R7236AAQ","json":"https://pith.science/pith/ZYQFF5SARDW4SGOCW2R7236AAQ.json","graph_json":"https://pith.science/api/pith-number/ZYQFF5SARDW4SGOCW2R7236AAQ/graph.json","events_json":"https://pith.science/api/pith-number/ZYQFF5SARDW4SGOCW2R7236AAQ/events.json","paper":"https://pith.science/paper/ZYQFF5SA"},"agent_actions":{"view_html":"https://pith.science/pith/ZYQFF5SARDW4SGOCW2R7236AAQ","download_json":"https://pith.science/pith/ZYQFF5SARDW4SGOCW2R7236AAQ.json","view_paper":"https://pith.science/paper/ZYQFF5SA","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2505.04588&json=true","fetch_graph":"https://pith.science/api/pith-number/ZYQFF5SARDW4SGOCW2R7236AAQ/graph.json","fetch_events":"https://pith.science/api/pith-number/ZYQFF5SARDW4SGOCW2R7236AAQ/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/ZYQFF5SARDW4SGOCW2R7236AAQ/action/timestamp_anchor","attest_storage":"https://pith.science/pith/ZYQFF5SARDW4SGOCW2R7236AAQ/action/storage_attestation","attest_author":"https://pith.science/pith/ZYQFF5SARDW4SGOCW2R7236AAQ/action/author_attestation","sign_citation":"https://pith.science/pith/ZYQFF5SARDW4SGOCW2R7236AAQ/action/citation_signature","submit_replication":"https://pith.science/pith/ZYQFF5SARDW4SGOCW2R7236AAQ/action/replication_record"}},"created_at":"2026-05-17T23:38:13.493095+00:00","updated_at":"2026-05-17T23:38:13.493095+00:00"}