LeanMarathon uses four contract-scoped agents on an evolving blueprint coordinated by a two-stage orchestrator to formalize seven theorems from Erdős problems in Lean, proving 258 lemmas with no sorry across three runs.
Advancing Mathematics Research with AI-Driven Formal Proof Search
5 Pith papers cite this work. Polarity classification is still indexing.
abstract
Large language models (LLMs) increasingly excel at mathematical reasoning, but their unreliability limits their utility in mathematics research. A mitigation is using LLMs to generate formal proofs in languages like Lean. We perform the first large-scale evaluation of this method's ability to solve open problems. Our most capable agent autonomously resolved 9 of 353 open Erd\H{o}s problems at the per-problem cost of a few hundred dollars, proved 44/492 OEIS conjectures, and is being deployed in combinatorics, optimization, graph theory, algebraic geometry, and quantum optics research. A basic agent alternating LLM-based generation with Lean-based verification replicated the Erd\H{o}s successes but proved costlier on the hardest problems. These findings demonstrate the power of AI-aided formal proof search and shed light on the agent designs that enable it.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
LEAP enables general-purpose LLMs to reach 100% solve rate on the 2025 Putnam and 70% on a new Lean-IMO-Bench by combining informal reasoning with iterative Lean interaction.
Theoria rewrites solutions into auditable typed state transitions with justifications, certifying 105 of 185 HLE problems at 91.4% precision and outperforming holistic judges on adversarial poisoned proofs by catching hidden premises.
Introduces the Agentic Publication Protocol (APP) as a repository-based standard for publishing papers together with reproducibility artifacts and agent instructions.
A witness-split and window-pruning SAT framework finds no 44-element 3-AP-free subset of [1,212] but leaves two resistant instances unsolved.
citing papers explorer
-
LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization
LeanMarathon uses four contract-scoped agents on an evolving blueprint coordinated by a two-stage orchestrator to formalize seven theorems from Erdős problems in Lean, proving 258 lemmas with no sorry across three runs.
-
LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks
LEAP enables general-purpose LLMs to reach 100% solve rate on the 2025 Putnam and 70% on a new Lean-IMO-Bench by combining informal reasoning with iterative Lean interaction.
-
Theoria: Rewrite-Acceptability Verification over Informal Reasoning States
Theoria rewrites solutions into auditable typed state transitions with justifications, certifying 105 of 185 HLE problems at 91.4% precision and outperforming holistic judges on adversarial poisoned proofs by catching hidden premises.