Advancing Mathematics Research with AI-Driven Formal Proof Search

· 2026 · cs.AI · arXiv 2605.22763

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

Large language models (LLMs) increasingly excel at mathematical reasoning, but their unreliability limits their utility in mathematics research. A mitigation is using LLMs to generate formal proofs in languages like Lean. We perform the first large-scale evaluation of this method's ability to solve open problems. Our most capable agent autonomously resolved 9 of 353 open Erd\H{o}s problems at the per-problem cost of a few hundred dollars, proved 44/492 OEIS conjectures, and is being deployed in combinatorics, optimization, graph theory, algebraic geometry, and quantum optics research. A basic agent alternating LLM-based generation with Lean-based verification replicated the Erd\H{o}s successes but proved costlier on the hardest problems. These findings demonstrate the power of AI-aided formal proof search and shed light on the agent designs that enable it.

representative citing papers

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

LeanMarathon uses four contract-scoped agents on an evolving blueprint coordinated by a two-stage orchestrator to formalize seven theorems from Erdős problems in Lean, proving 258 lemmas with no sorry across three runs.

LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks

cs.AI · 2026-06-02 · unverdicted · novelty 7.0

LEAP enables general-purpose LLMs to reach 100% solve rate on the 2025 Putnam and 70% on a new Lean-IMO-Bench by combining informal reasoning with iterative Lean interaction.

Theoria: Rewrite-Acceptability Verification over Informal Reasoning States

cs.AI · 2026-07-01 · unverdicted · novelty 6.0

Theoria rewrites solutions into auditable typed state transitions with justifications, certifying 105 of 185 HLE problems at 91.4% precision and outperforming holistic judges on adversarial poisoned proofs by catching hidden premises.

Agentic Publication Protocol: An Attempt to Modernize Scientific Publication

cs.DL · 2026-06-15 · unverdicted · novelty 5.0

Introduces the Agentic Publication Protocol (APP) as a repository-based standard for publishing papers together with reproducibility artifacts and agent instructions.

Witness-split + window-cardinality refinement for $r_3(N)$: Architecture, empirical results, and a structural hard pocket

cs.LO · 2026-05-31 · unverdicted · novelty 5.0

A witness-split and window-pruning SAT framework finds no 44-element 3-AP-free subset of [1,212] but leaves two resistant instances unsolved.

citing papers explorer

Showing 3 of 3 citing papers after filters.

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization cs.AI · 2026-06-03 · unverdicted · none · ref 29 · internal anchor
LeanMarathon uses four contract-scoped agents on an evolving blueprint coordinated by a two-stage orchestrator to formalize seven theorems from Erdős problems in Lean, proving 258 lemmas with no sorry across three runs.
LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks cs.AI · 2026-06-02 · unverdicted · none · ref 2 · internal anchor
LEAP enables general-purpose LLMs to reach 100% solve rate on the 2025 Putnam and 70% on a new Lean-IMO-Bench by combining informal reasoning with iterative Lean interaction.
Theoria: Rewrite-Acceptability Verification over Informal Reasoning States cs.AI · 2026-07-01 · unverdicted · none · ref 2 · internal anchor
Theoria rewrites solutions into auditable typed state transitions with justifications, certifying 105 of 185 HLE problems at 91.4% precision and outperforming holistic judges on adversarial poisoned proofs by catching hidden premises.

Advancing Mathematics Research with AI-Driven Formal Proof Search

fields

years

verdicts

representative citing papers

citing papers explorer