{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:OXUCBDMX5AF4NE7IMWMG3FXJXV","short_pith_number":"pith:OXUCBDMX","schema_version":"1.0","canonical_sha256":"75e8208d97e80bc693e865986d96e9bd647ab816316f887d843c2e1aa1b764c9","source":{"kind":"arxiv","id":"2504.21801","version":2},"attestation_state":"computed","paper":{"title":"DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"DeepSeek-Prover-V2 trains a 671B model on subgoal-decomposed proofs to reach 88.9% success on formal theorem proving benchmarks.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Chong Ruan, Daya Guo, Dejian Yang, Haocheng Wang, Hongxuan Tang, Huajian Xin, Junxiao Song, Liyue Zhang, Qihao Zhu, Shirong Ma, Wanjia Zhao, Wenjun Gao, Yuxuan Liu, Z.F. Wu, Zhe Fu, Zhibin Gou, Zhihong Shao, Z.Z. Ren","submitted_at":"2025-04-30T16:57:48Z","abstract_excerpt":"We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal ma"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2504.21801","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CL","submitted_at":"2025-04-30T16:57:48Z","cross_cats_sorted":["cs.AI"],"title_canon_sha256":"bf5d910ecaf0d586c06fa829f87927eb46db812956b8276ad6a26c0214b828b9","abstract_canon_sha256":"314813eb5810c27c4a0fd5ea06f6793a726b2401782d3db55c566eedb7593c9c"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:52.892819Z","signature_b64":"fEyu51bqdiL0eOSNVV9L9aGJJAT7dgF2lNQS7QtYGTF0VpPG1ab7S4TV33EdygRXDZcobAd+Im27M7O7P18gDA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"75e8208d97e80bc693e865986d96e9bd647ab816316f887d843c2e1aa1b764c9","last_reissued_at":"2026-05-17T23:38:52.892126Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:52.892126Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"DeepSeek-Prover-V2 trains a 671B model on subgoal-decomposed proofs to reach 88.9% success on formal theorem proving benchmarks.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Chong Ruan, Daya Guo, Dejian Yang, Haocheng Wang, Hongxuan Tang, Huajian Xin, Junxiao Song, Liyue Zhang, Qihao Zhu, Shirong Ma, Wanjia Zhao, Wenjun Gao, Yuxuan Liu, Z.F. Wu, Zhe Fu, Zhibin Gou, Zhihong Shao, Z.Z. Ren","submitted_at":"2025-04-30T16:57:48Z","abstract_excerpt":"We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal ma"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. Further evaluation on these 15 AIME problems shows that the model successfully solves 6 of them.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The cold-start data generated by prompting DeepSeek-V3 to decompose problems into subgoals and synthesize proofs produces high-quality, error-free formal reasoning traces that reinforcement learning can reliably improve without systematic biases or invalid steps.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"DeepSeek-Prover-V2-671B reaches 88.9% on MiniF2F-test and solves 49 PutnamBench problems plus 6 of 15 recent AIME problems by training on subgoal-decomposed proofs collected via DeepSeek-V3.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"DeepSeek-Prover-V2 trains a 671B model on subgoal-decomposed proofs to reach 88.9% success on formal theorem proving benchmarks.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a6f9cf98a011ffbb04df5961cc8174a46d357a7e81bede592be9d0f7be98e9f5"},"source":{"id":"2504.21801","kind":"arxiv","version":2},"verdict":{"id":"9152bb00-de7a-495f-b496-6e8e8fd8a72f","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T09:28:06.880518Z","strongest_claim":"The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. Further evaluation on these 15 AIME problems shows that the model successfully solves 6 of them.","one_line_summary":"DeepSeek-Prover-V2-671B reaches 88.9% on MiniF2F-test and solves 49 PutnamBench problems plus 6 of 15 recent AIME problems by training on subgoal-decomposed proofs collected via DeepSeek-V3.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The cold-start data generated by prompting DeepSeek-V3 to decompose problems into subgoals and synthesize proofs produces high-quality, error-free formal reasoning traces that reinforcement learning can reliably improve without systematic biases or invalid steps.","pith_extraction_headline":"DeepSeek-Prover-V2 trains a 671B model on subgoal-decomposed proofs to reach 88.9% success on formal theorem proving benchmarks."},"references":{"count":36,"sample":[{"doi":"","year":2025,"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","ref_index":1,"cited_arxiv_id":"2501.12948","is_internal_anchor":true},{"doi":"","year":null,"title":"− Note that 𝑘 is a positive integer since all terms are positive","work_id":"83b830c3-d4da-4038-bd1f-48c66e69ac54","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"− Alternatively, note that for 𝑝, 𝑞, 𝑟 in the given range, 𝑘 ⩽ 3 is natural, as larger 𝑘 would make the right side too large","work_id":"1e07868d-93a5-41eb-8108-17b3b66ea174","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"− This has no solutions since 𝑝, 𝑞, 𝑟 ⩾ 2, making the left side much larger than the right","work_id":"bd94dcb5-e42a-47dd-a5eb-72a363080a08","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"− For 𝑝 = 2, no solution","work_id":"eb276d23-0d26-44fd-a2c2-1804d4aaf33e","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":36,"snapshot_sha256":"c18a743c39f2c7ed0c14b8d4e3b7a713347667046da9d92946fc7f747cfa9d7f","internal_anchors":1},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2504.21801","created_at":"2026-05-17T23:38:52.892246+00:00"},{"alias_kind":"arxiv_version","alias_value":"2504.21801v2","created_at":"2026-05-17T23:38:52.892246+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2504.21801","created_at":"2026-05-17T23:38:52.892246+00:00"},{"alias_kind":"pith_short_12","alias_value":"OXUCBDMX5AF4","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"OXUCBDMX5AF4NE7I","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"OXUCBDMX","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":39,"internal_anchor_count":39,"sample":[{"citing_arxiv_id":"2510.12787","citing_title":"Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics","ref_index":55,"is_internal_anchor":true},{"citing_arxiv_id":"2605.23109","citing_title":"Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems","ref_index":40,"is_internal_anchor":true},{"citing_arxiv_id":"2605.23772","citing_title":"Agentic Proving for Program Verification","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22257","citing_title":"What are the Right Symmetries for Formal Theorem Proving?","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20244","citing_title":"Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2604.23135","citing_title":"Characterizing Paraphrase-Induced Failures in Lean 4 Autoformalization","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.16379","citing_title":"An Information-Theoretic Criterion for Efficient Data Synthesis","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17283","citing_title":"OProver: A Unified Framework for Agentic Formal Theorem Proving","ref_index":153,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17255","citing_title":"CAM-Bench: A Benchmark for Computational and Applied Mathematics in Lean","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18747","citing_title":"Code as Agent Harness","ref_index":87,"is_internal_anchor":true},{"citing_arxiv_id":"2604.27859","citing_title":"Rethinking Agentic Reinforcement Learning In Large Language Models","ref_index":73,"is_internal_anchor":true},{"citing_arxiv_id":"2509.07966","citing_title":"Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images","ref_index":38,"is_internal_anchor":true},{"citing_arxiv_id":"2509.17677","citing_title":"EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving","ref_index":36,"is_internal_anchor":true},{"citing_arxiv_id":"2601.14004","citing_title":"Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models","ref_index":259,"is_internal_anchor":true},{"citing_arxiv_id":"2602.24273","citing_title":"A Minimal Agent for Automated Theorem Proving","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16347","citing_title":"Lean Atlas: An Integrated Proof Environment for Scalable Human-AI Collaborative Formalization","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2510.01346","citing_title":"Aristotle: IMO-level Automated Theorem Proving","ref_index":37,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13137","citing_title":"LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06651","citing_title":"AI co-mathematician: Accelerating mathematicians with agentic AI","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13137","citing_title":"LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03071","citing_title":"Automatic Textbook Formalization","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11905","citing_title":"Rethinking Supervision Granularity: Segment-Level Learning for LLM-Based Theorem Proving","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2604.27859","citing_title":"Rethinking Agentic Reinforcement Learning In Large Language Models","ref_index":73,"is_internal_anchor":true},{"citing_arxiv_id":"2604.27859","citing_title":"Rethinking Agentic Reinforcement Learning In Large Language Models","ref_index":73,"is_internal_anchor":true},{"citing_arxiv_id":"2604.23712","citing_title":"OptProver: Bridging Olympiad and Optimization through Continual Training in Formal Theorem Proving","ref_index":23,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/OXUCBDMX5AF4NE7IMWMG3FXJXV","json":"https://pith.science/pith/OXUCBDMX5AF4NE7IMWMG3FXJXV.json","graph_json":"https://pith.science/api/pith-number/OXUCBDMX5AF4NE7IMWMG3FXJXV/graph.json","events_json":"https://pith.science/api/pith-number/OXUCBDMX5AF4NE7IMWMG3FXJXV/events.json","paper":"https://pith.science/paper/OXUCBDMX"},"agent_actions":{"view_html":"https://pith.science/pith/OXUCBDMX5AF4NE7IMWMG3FXJXV","download_json":"https://pith.science/pith/OXUCBDMX5AF4NE7IMWMG3FXJXV.json","view_paper":"https://pith.science/paper/OXUCBDMX","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2504.21801&json=true","fetch_graph":"https://pith.science/api/pith-number/OXUCBDMX5AF4NE7IMWMG3FXJXV/graph.json","fetch_events":"https://pith.science/api/pith-number/OXUCBDMX5AF4NE7IMWMG3FXJXV/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/OXUCBDMX5AF4NE7IMWMG3FXJXV/action/timestamp_anchor","attest_storage":"https://pith.science/pith/OXUCBDMX5AF4NE7IMWMG3FXJXV/action/storage_attestation","attest_author":"https://pith.science/pith/OXUCBDMX5AF4NE7IMWMG3FXJXV/action/author_attestation","sign_citation":"https://pith.science/pith/OXUCBDMX5AF4NE7IMWMG3FXJXV/action/citation_signature","submit_replication":"https://pith.science/pith/OXUCBDMX5AF4NE7IMWMG3FXJXV/action/replication_record"}},"created_at":"2026-05-17T23:38:52.892246+00:00","updated_at":"2026-05-17T23:38:52.892246+00:00"}