LLM-based agents in Lean solved 9 of 353 open Erdős problems and proved 44 of 492 OEIS conjectures at a few hundred dollars each.
hub
Hilbert: Recursively building formal proofs with informal reasoning.CoRR, abs/2509.22819
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
LLM proofs for hard math problems show large differences in quality metrics like conciseness and cognitive simplicity that correctness-only tests miss, along with trade-offs between quality and correctness.
LeetProof achieves higher rates of fully certified program synthesis from natural language by using a multi-modal verifier in Lean to validate specifications via randomized testing and delegate proofs to AI tools, outperforming single-mode baselines on benchmarks while uncovering defects in prior参考.
Lean Atlas visualizes Lean 4 dependency graphs and applies Lean Compass to reduce the nodes needing human semantic review by 27-99% across six evaluated projects.
SGS adds self-guidance to LLM self-play for Lean4 theorem proving, surpassing RL baselines and enabling a 7B model to outperform a 671B model after 200 rounds.
A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
ContextPilot reduces LLM prefill latency by up to 3x via context indexing, ordering, de-duplication, and succinct annotations that maximize KV-cache reuse while preserving or improving reasoning quality.
Ax-Prover is a tool-using multi-agent LLM system that matches state-of-the-art provers on public math benchmarks and outperforms them on new abstract-algebra and quantum-theory benchmarks while also assisting an expert with a cryptography proof.
STAR-PólyaMath introduces a multi-agent framework with meta-strategic supervision and state-machine orchestration that reports state-of-the-art and perfect scores on eight top math competition benchmarks.
Rule-based annotation generation for ACSL outperforms LLM-based methods in achieving successful formal verification of C programs.
AI for math combines task-specific architectures and general foundation models to support research and advance AI reasoning capabilities.
citing papers explorer
-
Advancing Mathematics Research with AI-Driven Formal Proof Search
LLM-based agents in Lean solved 9 of 353 open Erdős problems and proved 44 of 492 OEIS conjectures at a few hundred dollars each.
-
Not All Proofs Are Equal: Evaluating LLM Proof Quality Beyond Correctness
LLM proofs for hard math problems show large differences in quality metrics like conciseness and cognitive simplicity that correctness-only tests miss, along with trade-offs between quality and correctness.
-
Certified Program Synthesis with a Multi-Modal Verifier
LeetProof achieves higher rates of fully certified program synthesis from natural language by using a multi-modal verifier in Lean to validate specifications via randomized testing and delegate proofs to AI tools, outperforming single-mode baselines on benchmarks while uncovering defects in prior参考.
-
Lean Atlas: An Integrated Proof Environment for Scalable Human-AI Collaborative Formalization
Lean Atlas visualizes Lean 4 dependency graphs and applies Lean Compass to reduce the nodes needing human semantic review by 27-99% across six evaluated projects.
-
Scaling Self-Play with Self-Guidance
SGS adds self-guidance to LLM self-play for Lean4 theorem proving, surpassing RL baselines and enabling a 7B model to outperform a 671B model after 200 rounds.
-
A Minimal Agent for Automated Theorem Proving
A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
-
ContextPilot: Fast Long-Context Inference via Context Reuse
ContextPilot reduces LLM prefill latency by up to 3x via context indexing, ordering, de-duplication, and succinct annotations that maximize KV-cache reuse while preserving or improving reasoning quality.
-
Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics
Ax-Prover is a tool-using multi-agent LLM system that matches state-of-the-art provers on public math benchmarks and outperforms them on new abstract-algebra and quantum-theory benchmarks while also assisting an expert with a cryptography proof.
-
STAR-P\'olyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision
STAR-PólyaMath introduces a multi-agent framework with meta-strategic supervision and state-machine orchestration that reports state-of-the-art and perfect scores on eight top math competition benchmarks.
-
Evaluating LLM-Generated ACSL Annotations for Formal Verification
Rule-based annotation generation for ACSL outperforms LLM-based methods in achieving successful formal verification of C programs.
-
AI for Mathematics: Progress, Challenges, and Prospects
AI for math combines task-specific architectures and general foundation models to support research and advance AI reasoning capabilities.