Ax-prover: A deep reasoning agentic framework for theorem proving in mathematics and quantum physics

Benjamin Breen, Marco Del Tredici, Jacob McCarran, Javier Aspuru Mijares, Weichen Winston Yin, Kfir Sulimany, Jacob M Taylor, Frank HL Koppens, Dirk Englund · 2025 · cs.AI · arXiv 2510.12787

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open full Pith review browse 10 citing papers arXiv PDF

abstract

We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts. To achieve this, Ax-Prover approaches scientific problem solving through formal proof generation, a process that demands both creative reasoning and strict syntactic rigor. Ax-Prover meets this challenge by equipping Large Language Models (LLMs), which provide knowledge and reasoning, with Lean tools via the Model Context Protocol (MCP), which ensure formal correctness. To evaluate its performance as an autonomous prover, we benchmark our approach against frontier LLMs and specialized prover models on two public math benchmarks and on two Lean benchmarks we introduce in the fields of abstract algebra and quantum theory. On public datasets, Ax-Prover is competitive with state-of-the-art provers, while it largely outperforms them on the new benchmarks. This shows that, unlike specialized systems that struggle to generalize, our tool-based agentic theorem prover approach offers a generalizable methodology for formal verification across diverse scientific domains. Furthermore, we demonstrate Ax-Prover's assistant capabilities in a practical use case, showing how it enabled an expert mathematician to formalize the proof of a complex cryptography theorem.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

A Machine-Verified Proof of a Quantum-Optimization Conjecture

quant-ph · 2026-06-29 · accept · novelty 8.0

A Lean 4 machine-verified proof establishes that depth-p QAOA on the ring of disagrees attains approximation ratio (2p+1)/(2p+2) exactly.

Beyond the Library: An Agentic Framework for Autoformalizing Research Mathematics

cs.AI · 2026-06-30 · conditional · novelty 7.0

Agentic LLM framework autoformalizes 32 Putnam problems and main theorems plus proofs from five STOC papers into Lean 4, with two proofs using only kernel axioms.

LAMP: Lean-based Agentic framework with MCP and Proof Repair

cs.LO · 2026-06-27 · conditional · novelty 7.0

LAMP achieves 96.7% success generating verified Lean proofs for 90 Combinatorics on Words theorems by coordinating Planner, Builder, and Verifier agents with a CoW ontology accessed through Model Context Protocol.

Fine-Tuning Small Reasoning Models for Quantum Field Theory

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Small 7B reasoning models were fine-tuned on synthetic and curated QFT problems using RL and SFT, yielding performance gains, error analysis, and public release of data and traces.

Automating Formal Verification with Agent-Guided Tree Search

cs.LO · 2026-05-26 · unverdicted · novelty 6.0

Agent-directed tree search improves LLM performance on Lean formal verification tasks, with context-based orchestration solving more intermediate specs at lower token cost than baseline agents.

NeuroClaw Technical Report

cs.CV · 2026-04-27 · unverdicted · novelty 6.0

NeuroClaw is a domain-specialized multi-agent framework with NeuroBench benchmark that improves executability and reproducibility for multimodal neuroimaging research.

Automating Formal Verification with Reinforcement Learning and Recursive Inference

cs.LG · 2026-05-29 · unverdicted · novelty 5.0

RLVR training raises verified Dafny pass rates from 9.7% to 31.1% on a filtered benchmark while a Lean proof scaffold lifts success from 46.2% to 69.2% on a pilot set and solves 7 of 42 prior unsolved tasks.

LLMs with in-context learning for Algorithmic Theoretical Physics

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

Frontier LLMs with in-context learning and CAS integration solve most algorithmic tasks in theoretical physics when supplied with worked examples.

The Topological Dual of a Dataset: A Logic-to-Topology Encoding for AlphaGeometry-Style Data

cs.AI · 2026-04-20

Automated Conjecture Resolution with Formal Verification

cs.LG · 2026-04-04