Formal Conjectures is a Lean 4 benchmark containing 2615 formalized problems with 1029 open conjectures, designed to evaluate automated mathematical reasoning and proof discovery.
hub Mixed citations
Numina-lean-agent: An open and general agentic reasoning system for formal mathematics
Mixed citation behavior. Most common role is background (67%).
hub tools
citation-role summary
citation-polarity summary
years
2026 11representative citing papers
LeanSearch v2 recovers 46.1% of ground-truth premise groups for research-level Lean 4 theorems within 10 candidates and raises fixed-loop proof success to 20%.
LLM proofs for hard math problems show large differences in quality metrics like conciseness and cognitive simplicity that correctness-only tests miss, along with trade-offs between quality and correctness.
An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
Multi-agent AI system formalizes entire 500-page graduate algebraic combinatorics textbook into Lean, creating 130K lines of code in one week at human-expert cost.
Explorable theorems ground written proofs in Lean formalizations to enable step-by-step execution, custom example testing, and dependency tracing, with a user study showing improved comprehension.
Heavy supervised fine-tuning on formal math suppresses tool-calling in Goedel-Prover-V2 from 89.4% to near 0%, but 100 Lean agentic traces restore it to 83.8% on the Berkeley Function Calling Leaderboard with in-domain gains on ProofNet.
An AI framework combining informal reasoning and formal verification resolves an open commutative algebra problem and produces a Lean 4-checked proof with minimal human input.
Agentic Claude reaches 98.8% valid specs, 87.5% implementation certification, and 98.1% end-to-end success on CLEVER, revealing a mismatch between benchmark difficulty and current prover performance.
AI agents exploring Platonic mathematical structures via proof hypergraphs may reveal the overall architecture of formal mathematics and what makes parts of it human-accessible.
AI for math combines task-specific architectures and general foundation models to support research and advance AI reasoning capabilities.
citing papers explorer
-
Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics
Formal Conjectures is a Lean 4 benchmark containing 2615 formalized problems with 1029 open conjectures, designed to evaluate automated mathematical reasoning and proof discovery.
-
LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving
LeanSearch v2 recovers 46.1% of ground-truth premise groups for research-level Lean 4 theorems within 10 candidates and raises fixed-loop proof success to 20%.
-
Not All Proofs Are Equal: Evaluating LLM Proof Quality Beyond Correctness
LLM proofs for hard math problems show large differences in quality metrics like conciseness and cognitive simplicity that correctness-only tests miss, along with trade-offs between quality and correctness.
-
AI co-mathematician: Accelerating mathematicians with agentic AI
An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
-
Automatic Textbook Formalization
Multi-agent AI system formalizes entire 500-page graduate algebraic combinatorics textbook into Lean, creating 130K lines of code in one week at human-expert cost.
-
Explorable Theorems: Making Written Theorems Explorable by Grounding Them in Formal Representations
Explorable theorems ground written proofs in Lean formalizations to enable step-by-step execution, custom example testing, and dependency tracing, with a user study showing improved comprehension.
-
Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover
Heavy supervised fine-tuning on formal math suppresses tool-calling in Goedel-Prover-V2 from 89.4% to near 0%, but 100 Lean agentic traces restore it to 83.8% on the Berkeley Function Calling Leaderboard with in-domain gains on ProofNet.
-
Automated Conjecture Resolution with Formal Verification
An AI framework combining informal reasoning and formal verification resolves an open commutative algebra problem and produces a Lean 4-checked proof with minimal human input.
-
Agentic Proving for Program Verification
Agentic Claude reaches 98.8% valid specs, 87.5% implementation certification, and 98.1% end-to-end success on CLEVER, revealing a mismatch between benchmark difficulty and current prover performance.
-
Artificial Intelligence and the Structure of Mathematics
AI agents exploring Platonic mathematical structures via proof hypergraphs may reveal the overall architecture of formal mathematics and what makes parts of it human-accessible.
-
AI for Mathematics: Progress, Challenges, and Prospects
AI for math combines task-specific architectures and general foundation models to support research and advance AI reasoning capabilities.