Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics

Benjamin Breen; Dirk Englund; Frank H. L. Koppens; Jacob McCarran; Jacob M. Taylor; Javier Aspuru Mijares; Kfir Sulimany; Marco Del Tredici; Weichen Winston Yin

REVIEW 3 major objections 2 minor 15 cited by

Ax-Prover equips LLMs with Lean tools in a multi-agent setup to prove theorems across math and quantum physics where specialized systems fall short.

Reviewed by Pith at T0; open to challenge. T0 means a machine referee read the full paper against a public rubric. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

2026-05-25 07:44 UTC pith:D2MS2PUP

load-bearing objection Ax-Prover combines multi-agent orchestration with Lean tool access via MCP and adds two new benchmarks, but the generalization story rests on performance numbers and benchmark details that are not shown. the 3 major comments →

arxiv 2510.12787 v4 pith:D2MS2PUP submitted 2025-10-14 cs.AI cs.MA

Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics

Benjamin Breen , Marco Del Tredici , Jacob McCarran , Javier Aspuru Mijares , Weichen Winston Yin , Kfir Sulimany , Jacob M. Taylor , Frank H. L. Koppens

show 1 more author

Dirk Englund

This is my paper

classification cs.AI cs.MA

keywords theorem provingLeanmulti-agent systemsformal verificationabstract algebraquantum theorylarge language modelsautomated reasoning

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Ax-Prover is a multi-agent framework that pairs large language models with Lean theorem-proving tools through the Model Context Protocol. The system generates formal proofs either autonomously or in collaboration with human experts. It was tested on two public math benchmarks plus two new Lean benchmarks in abstract algebra and quantum theory. Performance matches frontier models on established datasets but exceeds them on the new domain benchmarks. The authors present this as evidence that a tool-based agentic method supplies a generalizable route to formal verification across scientific fields, illustrated by its use in formalizing a complex cryptography theorem.

Core claim

The paper claims that a multi-agent system combining LLMs for reasoning with Lean tools via MCP delivers a generalizable methodology for automated theorem proving, remaining competitive with state-of-the-art provers on public math benchmarks while largely outperforming them on newly introduced benchmarks in abstract algebra and quantum theory.

What carries the argument

The multi-agent framework that supplies LLMs with Lean verification tools through the Model Context Protocol to enforce formal correctness during proof generation.

Load-bearing premise

That outperformance on the two new benchmarks demonstrates genuine cross-domain generalization rather than effects from how those benchmarks were constructed or tuned to the system.

What would settle it

An evaluation on a third independently designed benchmark in a different domain, with no changes to Ax-Prover, that shows whether the performance edge disappears.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

The approach remains competitive with specialized provers on existing public math benchmarks.
It produces higher success rates than those systems on the new abstract algebra and quantum theory benchmarks.
It supports collaborative formalization of complex results such as a cryptography theorem with human experts.
The tool-based agentic design supplies a single methodology usable across multiple scientific domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same architecture could be tested on benchmarks from additional fields such as number theory or condensed-matter physics without domain-specific retraining.
Success here raises the question of whether similar tool integration would improve performance in non-Lean formal systems or in proof search without language models.
If the pattern holds, research groups might shift from maintaining separate provers per subfield toward shared agentic tool layers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

Ax-Prover combines multi-agent orchestration with Lean tool access via MCP and adds two new benchmarks, but the generalization story rests on performance numbers and benchmark details that are not shown.

read the letter

The paper's core contribution is a multi-agent LLM system that gives models direct Lean tool calls through the Model Context Protocol, letting it run either alone or with a human. They test it on two existing math benchmarks where it stays competitive, then introduce their own Lean sets in abstract algebra and quantum theory where they say it largely beats specialized provers. They also show one concrete case where the system helped an expert formalize a cryptography result. That practical example is the clearest sign they have something usable rather than just another prover wrapper. The architecture itself is a straightforward extension of recent agent and tool-use work, but the choice to target quantum and crypto domains with Lean is a reasonable next step. The main weakness is that the abstract gives no numbers, no error bars, no list of baselines, and no description of how the new problems were chosen or checked for leakage. Without those, the claim that the approach generalizes better than specialized systems cannot be checked. The stress-test point about benchmark construction is fair: if the problems reward iterative tool use or human-style collaboration more than single-shot search, the delta does not prove domain-agnostic strength. The paper is aimed at groups already building LLM agents for formal methods who want to see an applied example in physics-adjacent areas. It is coherent on its own terms and shows clear thinking about the tool interface, so it deserves a serious referee who can ask for the missing benchmark details and run checks on the new sets. I would send it to review rather than desk reject.

Referee Report

3 major / 2 minor

Summary. The manuscript presents Ax-Prover, a multi-agent system that integrates LLMs with Lean theorem-proving tools via the Model Context Protocol (MCP) to generate formal proofs. It evaluates the system as an autonomous prover on two existing public math benchmarks (where it is reported competitive with frontier LLMs and specialized provers) and on two newly introduced Lean benchmarks in abstract algebra and quantum theory (where it is reported to largely outperform specialized models). The central claim is that this tool-based agentic architecture provides a generalizable methodology for formal verification across scientific domains, unlike specialized systems. The paper also includes a collaborative demonstration in which Ax-Prover assists an expert in formalizing a complex cryptography theorem.

Significance. If the reported performance deltas on the new benchmarks are shown to be robust and free of construction bias, the work would offer a concrete demonstration that agentic, tool-augmented LLMs can extend formal theorem proving into domains such as abstract algebra and quantum theory where specialized provers have limited coverage. The explicit use of MCP for Lean tool access and the human-in-the-loop cryptography example are concrete strengths that could be built upon by the formal-methods and AI-for-science communities.

major comments (3)

[Abstract] Abstract: the central claim that Ax-Prover 'largely outperforms' specialized models on the two new Lean benchmarks in abstract algebra and quantum theory is stated without any numerical results, baseline scores, error bars, problem counts, or description of how the benchmarks were constructed or filtered. This information is load-bearing for the generalization argument and its absence prevents evaluation of whether the reported gains reflect architectural advantages or benchmark-specific properties.
[Benchmark sections (inferred from abstract)] The description of the new benchmarks (abstract algebra and quantum theory) provides no details on problem selection criteria, difficulty calibration, coverage of Lean tactics, or steps taken to prevent data leakage from training corpora. Without these, the outperformance cannot be taken as evidence that the multi-agent tool-use design generalizes rather than exploiting construction choices that reward iterative agent behavior.
[Evaluation and conclusion sections (inferred from abstract)] The generalization claim ('unlike specialized systems that struggle to generalize, our tool-based agentic theorem prover approach offers a generalizable methodology') rests entirely on the empirical comparison to the two new benchmarks; no ablation isolating the contribution of the multi-agent MCP architecture versus single-pass LLM prompting is reported, leaving the causal link between architecture and cross-domain performance untested.

minor comments (2)

[Abstract] The abstract mentions 'two public math benchmarks' without naming them; explicit citation of the datasets (e.g., miniF2F, ProofNet) would improve reproducibility.
[Use-case section (inferred from abstract)] The cryptography use-case is presented as a practical demonstration but lacks any quantitative measure of human effort saved or proof length/complexity, which would strengthen the assistant-capability claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the paper to strengthen the presentation of the benchmarks and evaluation.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that Ax-Prover 'largely outperforms' specialized models on the two new Lean benchmarks in abstract algebra and quantum theory is stated without any numerical results, baseline scores, error bars, problem counts, or description of how the benchmarks were constructed or filtered. This information is load-bearing for the generalization argument and its absence prevents evaluation of whether the reported gains reflect architectural advantages or benchmark-specific properties.

Authors: We agree that the abstract would benefit from greater specificity to support the central claim. In the revised manuscript we have updated the abstract to report problem counts for the new benchmarks, key performance numbers with baselines, and a concise description of construction and filtering. Full tables with error bars appear in the evaluation section. revision: yes
Referee: [Benchmark sections (inferred from abstract)] The description of the new benchmarks (abstract algebra and quantum theory) provides no details on problem selection criteria, difficulty calibration, coverage of Lean tactics, or steps taken to prevent data leakage from training corpora. Without these, the outperformance cannot be taken as evidence that the multi-agent tool-use design generalizes rather than exploiting construction choices that reward iterative agent behavior.

Authors: We acknowledge the need for these details. The revised manuscript adds a new subsection that specifies problem selection criteria (recent publications and textbooks), difficulty calibration via expert review and tactic complexity, Lean tactic coverage, and leakage-prevention steps including post-training-cutoff sourcing and manual originality checks. revision: yes
Referee: [Evaluation and conclusion sections (inferred from abstract)] The generalization claim ('unlike specialized systems that struggle to generalize, our tool-based agentic theorem prover approach offers a generalizable methodology') rests entirely on the empirical comparison to the two new benchmarks; no ablation isolating the contribution of the multi-agent MCP architecture versus single-pass LLM prompting is reported, leaving the causal link between architecture and cross-domain performance untested.

Authors: We agree that a direct ablation would strengthen the causal argument. The original experiments did not include a single-pass versus multi-agent ablation. In the revision we have added an explicit discussion of this limitation in the evaluation section, clarified that the specialized baselines are architecturally distinct from LLM-based agents, and noted that future work will include such an ablation. The current evidence still supports generalization via the observed cross-domain performance gap. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation only

full rationale

The paper presents an empirical multi-agent system evaluated on public benchmarks plus two author-introduced Lean benchmarks in abstract algebra and quantum theory. No mathematical derivation chain, fitted parameters, or first-principles results are described. The generalization claim rests on reported outperformance comparisons rather than any self-referential construction, self-citation load-bearing premise, or ansatz. This matches the default expectation for non-circular empirical work; the central claim has independent content in the form of external benchmark results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied AI-systems paper describing an engineering framework; it introduces no free parameters, mathematical axioms, or postulated physical entities.

pith-pipeline@v0.9.0 · 5785 in / 1144 out tokens · 23660 ms · 2026-05-25T07:44:14.644341+00:00 · methodology

0 comments

read the original abstract

We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts. To achieve this, Ax-Prover approaches scientific problem solving through formal proof generation, a process that demands both creative reasoning and strict syntactic rigor. Ax-Prover meets this challenge by equipping Large Language Models (LLMs), which provide knowledge and reasoning, with Lean tools via the Model Context Protocol (MCP), which ensure formal correctness. To evaluate its performance as an autonomous prover, we benchmark our approach against frontier LLMs and specialized prover models on two public math benchmarks and on two Lean benchmarks we introduce in the fields of abstract algebra and quantum theory. On public datasets, Ax-Prover is competitive with state-of-the-art provers, while it largely outperforms them on the new benchmarks. This shows that, unlike specialized systems that struggle to generalize, our tool-based agentic theorem prover approach offers a generalizable methodology for formal verification across diverse scientific domains. Furthermore, we demonstrate Ax-Prover's assistant capabilities in a practical use case, showing how it enabled an expert mathematician to formalize the proof of a complex cryptography theorem.

Figures

Figures reproduced from arXiv: 2510.12787 by Benjamin Breen, Dirk Englund, Frank H. L. Koppens, Jacob McCarran, Jacob M. Taylor, Javier Aspuru Mijares, Kfir Sulimany, Marco Del Tredici, Weichen Winston Yin.

**Figure 2.** Figure 2: The main steps performed by the Prover to prove a theorem. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

discussion (0)

Forward citations

Cited by 15 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Machine-Verified Proof of a Quantum-Optimization Conjecture
quant-ph 2026-06 accept novelty 8.0 full

A Lean 4 machine-verified proof establishes that depth-p QAOA on the ring of disagrees attains approximation ratio (2p+1)/(2p+2) exactly.
Beyond the Library: An Agentic Framework for Autoformalizing Research Mathematics
cs.AI 2026-06 conditional novelty 7.0

Agentic LLM framework autoformalizes 32 Putnam problems and main theorems plus proofs from five STOC papers into Lean 4, with two proofs using only kernel axioms.
Beyond the Library: An Agentic Framework for Autoformalizing Research Mathematics
cs.AI 2026-06 accept novelty 7.0

An orchestrator-driven agentic pipeline using general coding LLMs autoformalizes 32 PutnamBench problems and the main theorems plus proofs from five STOC papers into Lean 4, with two proofs using only the kernel.
LAMP: Lean-based Agentic framework with MCP and Proof Repair
cs.LO 2026-06 conditional novelty 7.0

LAMP achieves 96.7% success generating verified Lean proofs for 90 Combinatorics on Words theorems by coordinating Planner, Builder, and Verifier agents with a CoW ontology accessed through Model Context Protocol.
Hypothesis-Disciplined Multi-Agent Automated Formalization of Asymptotic Statistical Theory
cs.AI 2026-06 unverdicted novelty 7.0

A hypothesis-disciplined multi-agent pipeline in Lean 4 produces axiom-clean, source-faithful formalizations of parametric and semi-parametric asymptotic distribution and efficiency theorems.
Fine-Tuning Small Reasoning Models for Quantum Field Theory
cs.LG 2026-04 unverdicted novelty 7.0

Small 7B reasoning models were fine-tuned on synthetic and curated QFT problems using RL and SFT, yielding performance gains, error analysis, and public release of data and traces.
Automating Formal Verification with Agent-Guided Tree Search
cs.LO 2026-05 unverdicted novelty 6.0

Agent-directed tree search improves LLM performance on Lean formal verification tasks, with context-based orchestration solving more intermediate specs at lower token cost than baseline agents.
NeuroClaw Technical Report
cs.CV 2026-04 unverdicted novelty 6.0

NeuroClaw is a domain-specialized multi-agent framework with NeuroBench benchmark that improves executability and reproducibility for multimodal neuroimaging research.
NeuroClaw Technical Report
cs.CV 2026-04 unverdicted novelty 6.0

NeuroClaw introduces a three-tier multi-agent framework and NeuroBench benchmark that improve executability and reproducibility scores for neuroimaging tasks when used with multimodal LLMs.
The Topological Dual of a Dataset: A Logic-to-Topology Encoding for AlphaGeometry-Style Data
cs.AI 2026-04 unverdicted novelty 6.0

The topological dual of a dataset is introduced as a transformation that encodes logical structures into topological ones to expose invariants in neural latent spaces for AlphaGeometry-style reasoning.
Automated Conjecture Resolution with Formal Verification
cs.LG 2026-04 unverdicted novelty 6.0

An AI framework combining informal reasoning and formal verification resolves an open commutative algebra problem and produces a Lean 4-checked proof with minimal human input.
Automating Formal Verification with Reinforcement Learning and Recursive Inference
cs.LG 2026-05 unverdicted novelty 5.0

RLVR training raises verified Dafny pass rates from 9.7% to 31.1% on a filtered benchmark while a Lean proof scaffold lifts success from 46.2% to 69.2% on a pilot set and solves 7 of 42 prior unsolved tasks.
LLMs with in-context learning for Algorithmic Theoretical Physics
cs.LG 2026-05 unverdicted novelty 5.0

Frontier LLMs with in-context learning and CAS integration solve most algorithmic tasks in theoretical physics when supplied with worked examples.
The Topological Dual of a Dataset: A Logic-to-Topology Encoding for AlphaGeometry-Style Data
cs.AI 2026-04 conditional novelty 5.0

The paper introduces the 'topological dual of a dataset' — translating observable-logic sequents into covering-sieve statements on Grothendieck sites — as a new input representation for neuro-symbolic AI systems like ...
When AI meets quantum information: A comprehensive review
quant-ph 2026-07 unverdicted novelty 2.0

A comprehensive review organizing progress at the AI-quantum information intersection from both directions.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · cited by 12 Pith papers · 9 internal anchors

[1]

arXiv, 2025

Mathematics (arxiv archive). arXiv, 2025. Accessed: 2025-09-24

work page 2025
[2]

Aristotle: IMO-level Automated Theorem Proving

Tudor Achim, Alex Best, Kevin Der, Math¨ıs F´ed´erico, Sergei Gukov, Daniel Halpern-Leister, Kirsten Hennings- gard, Yury Kudryashov, Alexander Meiburg, Martin Michelsen, et al. Aristotle: Imo-level automated theorem proving.arXiv preprint arXiv:2510.01346, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Claude 3.7 sonnet and claude code.https://www.anthropic.com/news/ claude-3-7-sonnet, 2025

Anthropic. Claude 3.7 sonnet and claude code.https://www.anthropic.com/news/ claude-3-7-sonnet, 2025. Accessed: YYYY-MM-DD

work page 2025
[4]

Claude 4.https://www.anthropic.com/news/claude-4, 2025

Anthropic. Claude 4.https://www.anthropic.com/news/claude-4, 2025. Accessed: 2025-09-16

work page 2025
[5]

Claude documentation, 2025

Anthropic. Claude documentation, 2025. Accessed: 2025-09-19

work page 2025
[6]

Claude opus 4.1.https://www.anthropic.com/claude/opus, 2025

Anthropic. Claude opus 4.1.https://www.anthropic.com/claude/opus, 2025. Accessed: 2025-10- 14

work page 2025
[7]

Introducing claude haiku 4.5, October 2025

Anthropic PBC. Introducing claude haiku 4.5, October 2025. Accessed on November 14, 2025

work page 2025
[8]

Leanlm: Large language models for lean theorem proving.arXiv preprint arXiv:2306.09264, 2023

Ethan Ayers et al. Leanlm: Large language models for lean theorem proving.arXiv preprint arXiv:2306.09264, 2023

work page arXiv 2023
[9]

Formal proving with llms: Lean as a benchmark

Zhenisbek Azerbayev et al. Formal proving with llms: Lean as a benchmark. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. 23

work page 2023
[10]

Prover agent: An agent-based framework for formal mathematical proofs.arXiv preprint arXiv:2506.19923, 2025

Kaito Baba, Chaoran Liu, Shuhei Kurita, and Akiyoshi Sannai. Prover agent: An agent-based framework for formal mathematical proofs.arXiv preprint arXiv:2506.19923, 2025

work page arXiv 2025
[11]

Satisfiability modulo theories: An appe- tizer.Communications of the ACM, 65(6):69–77, 2022

Haniel Barbosa, Clark Barrett, Pascal Fontaine, and Andrew Reynolds. Satisfiability modulo theories: An appe- tizer.Communications of the ACM, 65(6):69–77, 2022

work page 2022
[12]

Computer-aided security proofs for the working cryptographer

Gilles Barthe, Benjamin Gr ´egoire, Sylvain Heraud, and Santiago Zanella B ´eguelin. Computer-aided security proofs for the working cryptographer. InAnnual Cryptology Conference, pages 71–90. Springer, 2011

work page 2011
[13]

Tactician: Lean proof automation with knn

Bart Blaauwbroek et al. Tactician: Lean proof automation with knn. InProceedings of the International Confer- ence on Interactive Theorem Proving (ITP), pages 348–366. Springer, 2020

work page 2020
[14]

A computationally sound mechanized prover for security protocols.IEEE Transactions on Dependable and Secure Computing, 5(4):193–207, 2008

Bruno Blanchet. A computationally sound mechanized prover for security protocols.IEEE Transactions on Dependable and Secure Computing, 5(4):193–207, 2008

work page 2008
[15]

Knudsen, Gregor Leander, Ventzislav Nikov, Christof Paar, Christian Rechberger, Peter Rombouts, Søren S

Julia Borghoff, Anne Canteaut, Tim G ¨uneysu, Elif Bilge Kavun, Miroslav Kne ˇzevi´c, Lars R. Knudsen, Gregor Leander, Ventzislav Nikov, Christof Paar, Christian Rechberger, Peter Rombouts, Søren S. Thomsen, and Tolga Yalc ¸ın. Prince – a low-latency block cipher for pervasive computing applications. In Xiaoyun Wang and Kazue Sako, editors,Advances in Cry...

work page 2012
[16]

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Kaijing Ma, et al. Seed-prover: Deep and broad reasoning for automated theorem proving.arXiv preprint arXiv:2507.23726, 2025

work page Pith review arXiv 2025
[17]

Evaluating Large Language Models Trained on Code

Mark Chen et al. Evaluating large language models trained on code. InarXiv preprint arXiv:2107.03374, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[18]

Quantum algorithms for lattice problems

Yilei Chen. Quantum algorithms for lattice problems. Technical report, Cryptology ePrint Archive, Report 2024/555, 2024. Updated April 18: algorithm contains an unfixable bug invalidating the main claim (see Section 3.5.9, Page 37)

work page 2024
[19]

Gold-medalist performance in solving olympiad geometry with alphageometry2.arXiv preprint arXiv:2502.03544, 2025

Yuri Chervonyi, Trieu H Trinh, Miroslav Ol ˇs´ak, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V Le, and Thang Luong. Gold-medalist performance in solving olympiad geometry with alphageometry2.arXiv preprint arXiv:2502.03544, 2025

work page arXiv 2025
[20]

Llms as conversational partners for mathematicians.arXiv preprint arXiv:2305.XXXX, 2023

Emily Collins et al. Llms as conversational partners for mathematicians.arXiv preprint arXiv:2305.XXXX, 2023

work page 2023
[21]

Z3: An efficient smt solver

Leonardo de Moura and Nikolaj Bjørner. Z3: An efficient smt solver. InProceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 337–340. Springer, 2008

work page 2008
[22]

The lean theorem prover (system description)

Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. The lean theorem prover (system description). InAutomated Deduction – CADE-25, volume 9195 ofLecture Notes in Computer Science, pages 378–388. Springer, 2015

work page 2015
[23]

Deepseek-prover-v1 dataset.https://huggingface.co/datasets/deepseek-ai/ DeepSeek-Prover-V1, 2024

DeepSeek-AI. Deepseek-prover-v1 dataset.https://huggingface.co/datasets/deepseek-ai/ DeepSeek-Prover-V1, 2024. Accessed: 2025-08-24

work page 2024
[24]

Deepseek-prover-v2-671b, 2025

deepseek ai. Deepseek-prover-v2-671b, 2025. Accessed: 2025-09-24

work page 2025
[25]

Lean-lsp-mcp: Tools for agentic interaction with the lean theorem prover, 3 2025

Oliver Dressler. Lean-lsp-mcp: Tools for agentic interaction with the lean theorem prover, 3 2025. Accessed: 2025-08-24

work page 2025
[26]

Abstract algebra

David S Dummit and Richard M Foote. Abstract algebra. john wile & sons.Inc., Hoboken, NJ, 2004

work page 2004
[27]

Praveen Gauravaram, Lars Knudsen, Krystian Matusiewicz, Florian Mendel, Christian Rechberger, Martin Schl¨affer, and Søren S. Thomsen. Grøstl – a sha-3 candidate. Submission to nist, NIST, September 2008. Available athttp://www.groestl.info/

work page 2008
[28]

Tactictoe: Learning to prove with tactics

Th ´er`ese Gauthier, Cezary Kaliszyk, and Josef Urban. Tactictoe: Learning to prove with tactics. InProceedings of the International Conference on Automated Deduction (CADE), pages 275–294. Springer, 2021. 24

work page 2021
[29]

Towards an AI co-scientist

Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. Towards an ai co-scientist.arXiv preprint arXiv:2502.18864, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Learning to prove theorems via interacting with proof assistants

Daniel Huang et al. Learning to prove theorems via interacting with proof assistants. InInternational Conference on Machine Learning (ICML), pages 2654–2663, 2019

work page 2019
[31]

Alemi, et al

Geoffrey Irving, Christian Szegedy, Alexander A. Alemi, et al. Deepmath - deep sequence models for premise selection. InAdvances in Neural Information Processing Systems (NeurIPS), pages 2235–2243, 2016

work page 2016
[32]

Another look at” provable security”.Journal of Cryptology, 20(1):3–37, 2007

Neal Koblitz and Alfred J Menezes. Another look at” provable security”.Journal of Cryptology, 20(1):3–37, 2007

work page 2007
[33]

First-order theorem proving and vampire

Laura Kov ´acs and Andrei V oronkov. First-order theorem proving and vampire. InProceedings of the Interna- tional Conference on Computer Aided Verification (CAV), pages 1–35. Springer, 2013

work page 2013
[34]

LeanAgent: Lifelong Learning for Formal Theorem Proving

Adarsh Kumarappan, Mo Tiwari, Peiyang Song, Robert Joseph George, Chaowei Xiao, and Anima Anandkumar. Leanagent: Lifelong learning for formal theorem proving.arXiv preprint arXiv:2410.06209, 2024

work page Pith review arXiv 2024
[35]

Deep reinforcement learning for theorem proving

Guillaume Lample and Franc ¸ois Charton. Deep reinforcement learning for theorem proving. InInternational Conference on Learning Representations (ICLR), 2022

work page 2022
[36]

Mathlib statistics.https://leanprover-community.github.io/ mathlib_stats.html, 2025

Lean Prover Community. Mathlib statistics.https://leanprover-community.github.io/ mathlib_stats.html, 2025. GitHub repository for generating statistics plots for Mathlib; accessed 2025-08-24

work page 2025
[37]

Numinamath.[https://huggingface.co/AI-MO/NuminaMath-1.5](https: //github.com/project-numina/aimo-progress-prize/blob/main/report/numina_ dataset.pdf), 2024

Jia LI, Edward Beeching, Lewis Tunstall, Ben Lipkin, Roman Soletskyi, Shengyi Costa Huang, Kashif Rasul, Longhui Yu, Albert Jiang, Ziju Shen, Zihan Qin, Bin Dong, Li Zhou, Yann Fleureau, Guillaume Lample, and Stanislas Polu. Numinamath.[https://huggingface.co/AI-MO/NuminaMath-1.5](https: //github.com/project-numina/aimo-progress-prize/blob/main/report/num...

work page 2024
[38]

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, et al. Goedel-prover: A frontier model for open-source automated theorem proving.arXiv preprint arXiv:2502.07640, 2025

work page Pith review arXiv 2025
[39]

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, et al. Goedel-prover-v2: Scaling formal theorem proving with scaffolded data synthesis and self-correction.arXiv preprint arXiv:2508.03613, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

Unconditional Security Of Quantum Key Distribution Over Arbitrarily Long Distances

Hoi-Kwong Lo and HF Chau. Unconditional security of quantum key distribution over arbitrarily long distances. arXiv preprint quant-ph/9803006, 1998

work page internal anchor Pith review Pith/arXiv arXiv 1998
[41]

Unconditional security of quantum key distribution over arbitrarily long distances.science, 283(5410):2050–2056, 1999

Hoi-Kwong Lo and Hoi Fung Chau. Unconditional security of quantum key distribution over arbitrarily long distances.science, 283(5410):2050–2056, 1999

work page 2050
[42]

Process-driven autoformalization in lean 4

Zeyuan Lu, Guodong Zhang, Junxiao Chen, Huan Chen, Yilun Chen, Zonglin Li, Yiping Li, Lianmin Wang, Yao Lin, Ce Zhang, and Jie Chen. Process-driven autoformalization in lean 4. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[43]

Introducing gauss, an agent for autoformalization, 2025

Math Inc. Introducing gauss, an agent for autoformalization, 2025. Announcement of autoformalization agent for formal verification in mathematics

work page 2025
[44]

A formalization of the generalized quantum stein’s lemma in lean.arXiv preprint arXiv:2510.08672, 2025

Alex Meiburg, Leonardo A Lessa, and Rodolfo R Soldati. A formalization of the generalized quantum stein’s lemma in lean.arXiv preprint arXiv:2510.08672, 2025

work page arXiv 2025
[45]

P. R. Mishra, Yogesh Kumar, Susanta Samanta, and Atul Gaur. A new algorithm for computing branch number of non-singular matrices over finite fields.arXiv preprint arXiv:2405.07007, 2024. 25

work page arXiv 2024
[46]

What is the model context protocol (mcp)?https://modelcontextprotocol

Model Context Protocol. What is the model context protocol (mcp)?https://modelcontextprotocol. io/docs/getting-started/intro, 2024. Accessed: 2025-10-05

work page 2024
[47]

Robust de-anonymization of large sparse datasets

Arvind Narayanan and Vitaly Shmatikov. Robust de-anonymization of large sparse datasets. In2008 IEEE Symposium on Security and Privacy (sp 2008), pages 111–125. IEEE, 2008

work page 2008
[48]

Advanced encryption standard (aes)

National Institute of Standards and Technology. Advanced encryption standard (aes). Federal Information Processing Standards Publication FIPS 197-upd1, U.S. Department of Commerce, Gaithersburg, MD, 2001. Published November 26, 2001; Updated May 9, 2023

work page 2001
[49]

Cambridge university press, 2010

Michael A Nielsen and Isaac L Chuang.Quantum computation and quantum information. Cambridge university press, 2010

work page 2010
[50]

Numinamath-lean dataset.https://huggingface.co/datasets/AI-MO/ NuminaMath-LEAN, 2025

Numina-Team. Numinamath-lean dataset.https://huggingface.co/datasets/AI-MO/ NuminaMath-LEAN, 2025. Accessed: 2025-08-24

work page 2025
[51]

Openai models documentation, 2025

OpenAI. Openai models documentation, 2025. Accessed: 2025-09-19

work page 2025
[52]

Apollo: Automated llm and lean collaboration for advanced formal reasoning.arXiv preprint arXiv:2505.05758, 2025

Azim Ospanov, Farzan Farnia, and Roozbeh Yousefzadeh. Apollo: Automated llm and lean collaboration for advanced formal reasoning.arXiv preprint arXiv:2505.05758, 2025

work page arXiv 2025
[53]

Formal mathematics statement curriculum learning

Stanislas Polu et al. Formal mathematics statement curriculum learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[54]

Generative language modeling for automated theorem proving

Stanislas Polu and Ilya Sutskever. Generative language modeling for automated theorem proving. InInterna- tional Conference on Learning Representations (ICLR), 2020

work page 2020
[55]

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

Z.Z. Ren, Zhihong Shao, Wenfeng Liang, et al. Deepseek-prover-v2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition.https://arxiv.org/abs/2504.21801, 2025. Accessed: 2025-08-24

work page internal anchor Pith review Pith/arXiv arXiv 2025
[56]

S ´ebastien Rousseau. Bug discovered in quantum algorithm for lattice-based crypto.https:// sebastienrousseau.com/2024-04-22-bug-discovered-in-quantum-algorithm-for-lattice-based-crypto/ index.html, April 22 2024. Accessed: [add access date here]

work page 2024
[57]

E prover 2.0: Integrating equational and first-order logic

Stephan Schulz, Simon Cruanes, and Petar Vukmirovi´c. E prover 2.0: Integrating equational and first-order logic. InProceedings of the International Conference on Automated Deduction (CADE), pages 523–541. Springer, 2019

work page 2019
[58]

Simple proof of security of the bb84 quantum key distribution protocol.Physical review letters, 85(2):441, 2000

Peter W Shor and John Preskill. Simple proof of security of the bb84 quantum key distribution protocol.Physical review letters, 85(2):441, 2000

work page 2000
[59]

Sequences of games: a tool for taming complexity in security proofs.cryptology eprint archive, 2004

Victor Shoup. Sequences of games: a tool for taming complexity in security proofs.cryptology eprint archive, 2004

work page 2004
[60]

Song et al.,Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean, 2025

Peiyang Song, Kaiyu Yang, and Anima Anandkumar. Lean copilot: Large language models as copilots for theorem proving in lean.arXiv preprint arXiv:2404.12534, 2024

work page arXiv 2024
[61]

k-anonymity: A model for protecting privacy.International journal of uncertainty, fuzziness and knowledge-based systems, 10(05):557–570, 2002

Latanya Sweeney. k-anonymity: A model for protecting privacy.International journal of uncertainty, fuzziness and knowledge-based systems, 10(05):557–570, 2002

work page 2002
[62]

Putnambench leaderboard.https://trishullab.github.io/PutnamBench/ leaderboard.html, 2025

Trishullab. Putnambench leaderboard.https://trishullab.github.io/PutnamBench/ leaderboard.html, 2025. Accessed: 2025-10-07

work page 2025
[63]

Putnambench: Evaluating neural theorem-provers on the putnam mathematical competi- tion.Advances in Neural Information Processing Systems, 37:11545–11569, 2024

George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding, Michael Jennings, Amitayush Thakur, and Swarat Chaudhuri. Putnambench: Evaluating neural theorem-provers on the putnam mathematical competi- tion.Advances in Neural Information Processing Systems, 37:11545–11569, 2024

work page 2024
[64]

Machine learning preselected proof steps

Josef Urban, Geoff Sutcliffe, Stefan Petrov, and Josef Vysko ˇcil. Machine learning preselected proof steps. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2046–2051, 2011. 26

work page 2046
[65]

Hilbert: Recursively building formal proofs with informal reasoning.arXiv preprint arXiv:2509.22819, 2025

Sumanth Varambally, Thomas V oice, Yanchao Sun, Zhifeng Chen, Rose Yu, and Ke Ye. Hilbert: Recursively building formal proofs with informal reasoning.arXiv preprint arXiv:2509.22819, 2025

work page arXiv 2025
[66]

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Haiming Wang et al. Kimina-prover preview: Towards large formal reasoning models with reinforcement learn- ing.https://arxiv.org/abs/2504.11354, 2025. Accessed: 2025-08-24

work page internal anchor Pith review Pith/arXiv arXiv 2025
[67]

Autoformalization in the Era of Large Language Models: A Survey

Qihao Wu, Haotian Zhang, Jialin Chen, Yizhou Li, Xingjian Zhang, Ce Zhang, and Jie Chen. Autoformalization in the era of large language models: A survey.arXiv preprint arXiv:2505.23486, 2025

work page Pith review arXiv 2025
[68]

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, and Xiaodan Liang. Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data.https: //arxiv.org/abs/2405.14333, 2024. Accessed: 2025-08-24

work page Pith review arXiv 2024
[69]

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Huajian Xin, ZZ Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, et al. Deepseek-prover-v1. 5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search.arXiv preprint arXiv:2408.08152, 2024

work page Pith review arXiv 2024
[70]

Leandojo: Theorem proving with large language models

Zhangir Xin et al. Leandojo: Theorem proving with large language models. InInternational Conference on Learning Representations (ICLR), 2024

work page 2024
[71]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search.arXiv preprint arXiv:2504.08066, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[72]

Lean workbook: A large-scale lean problem set formalized from natural language math problems.https://arxiv.org/abs/2406

Huaiyuan Ying, Zijian Wu, Yihan Geng, Jiayu Wang, Dahua Lin, and Kai Chen. Lean workbook: A large-scale lean problem set formalized from natural language math problems.https://arxiv.org/abs/2406. 03847, 2024. Accessed: 2025-08-24

work page 2024
[73]

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

Kunhao Zheng, Jesse Michael Han, and Stanislas Polu. Minif2f: A cross-system benchmark for formal olympiad- level mathematics.https://arxiv.org/abs/2109.00110, 2021. Accessed: 2025-08-24. 27

work page internal anchor Pith review Pith/arXiv arXiv 2021

[1] [1]

arXiv, 2025

Mathematics (arxiv archive). arXiv, 2025. Accessed: 2025-09-24

work page 2025

[2] [2]

Aristotle: IMO-level Automated Theorem Proving

Tudor Achim, Alex Best, Kevin Der, Math¨ıs F´ed´erico, Sergei Gukov, Daniel Halpern-Leister, Kirsten Hennings- gard, Yury Kudryashov, Alexander Meiburg, Martin Michelsen, et al. Aristotle: Imo-level automated theorem proving.arXiv preprint arXiv:2510.01346, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

Claude 3.7 sonnet and claude code.https://www.anthropic.com/news/ claude-3-7-sonnet, 2025

Anthropic. Claude 3.7 sonnet and claude code.https://www.anthropic.com/news/ claude-3-7-sonnet, 2025. Accessed: YYYY-MM-DD

work page 2025

[4] [4]

Claude 4.https://www.anthropic.com/news/claude-4, 2025

Anthropic. Claude 4.https://www.anthropic.com/news/claude-4, 2025. Accessed: 2025-09-16

work page 2025

[5] [5]

Claude documentation, 2025

Anthropic. Claude documentation, 2025. Accessed: 2025-09-19

work page 2025

[6] [6]

Claude opus 4.1.https://www.anthropic.com/claude/opus, 2025

Anthropic. Claude opus 4.1.https://www.anthropic.com/claude/opus, 2025. Accessed: 2025-10- 14

work page 2025

[7] [7]

Introducing claude haiku 4.5, October 2025

Anthropic PBC. Introducing claude haiku 4.5, October 2025. Accessed on November 14, 2025

work page 2025

[8] [8]

Leanlm: Large language models for lean theorem proving.arXiv preprint arXiv:2306.09264, 2023

Ethan Ayers et al. Leanlm: Large language models for lean theorem proving.arXiv preprint arXiv:2306.09264, 2023

work page arXiv 2023

[9] [9]

Formal proving with llms: Lean as a benchmark

Zhenisbek Azerbayev et al. Formal proving with llms: Lean as a benchmark. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. 23

work page 2023

[10] [10]

Prover agent: An agent-based framework for formal mathematical proofs.arXiv preprint arXiv:2506.19923, 2025

Kaito Baba, Chaoran Liu, Shuhei Kurita, and Akiyoshi Sannai. Prover agent: An agent-based framework for formal mathematical proofs.arXiv preprint arXiv:2506.19923, 2025

work page arXiv 2025

[11] [11]

Satisfiability modulo theories: An appe- tizer.Communications of the ACM, 65(6):69–77, 2022

Haniel Barbosa, Clark Barrett, Pascal Fontaine, and Andrew Reynolds. Satisfiability modulo theories: An appe- tizer.Communications of the ACM, 65(6):69–77, 2022

work page 2022

[12] [12]

Computer-aided security proofs for the working cryptographer

Gilles Barthe, Benjamin Gr ´egoire, Sylvain Heraud, and Santiago Zanella B ´eguelin. Computer-aided security proofs for the working cryptographer. InAnnual Cryptology Conference, pages 71–90. Springer, 2011

work page 2011

[13] [13]

Tactician: Lean proof automation with knn

Bart Blaauwbroek et al. Tactician: Lean proof automation with knn. InProceedings of the International Confer- ence on Interactive Theorem Proving (ITP), pages 348–366. Springer, 2020

work page 2020

[14] [14]

A computationally sound mechanized prover for security protocols.IEEE Transactions on Dependable and Secure Computing, 5(4):193–207, 2008

Bruno Blanchet. A computationally sound mechanized prover for security protocols.IEEE Transactions on Dependable and Secure Computing, 5(4):193–207, 2008

work page 2008

[15] [15]

Knudsen, Gregor Leander, Ventzislav Nikov, Christof Paar, Christian Rechberger, Peter Rombouts, Søren S

Julia Borghoff, Anne Canteaut, Tim G ¨uneysu, Elif Bilge Kavun, Miroslav Kne ˇzevi´c, Lars R. Knudsen, Gregor Leander, Ventzislav Nikov, Christof Paar, Christian Rechberger, Peter Rombouts, Søren S. Thomsen, and Tolga Yalc ¸ın. Prince – a low-latency block cipher for pervasive computing applications. In Xiaoyun Wang and Kazue Sako, editors,Advances in Cry...

work page 2012

[16] [16]

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Kaijing Ma, et al. Seed-prover: Deep and broad reasoning for automated theorem proving.arXiv preprint arXiv:2507.23726, 2025

work page Pith review arXiv 2025

[17] [17]

Evaluating Large Language Models Trained on Code

Mark Chen et al. Evaluating large language models trained on code. InarXiv preprint arXiv:2107.03374, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[18] [18]

Quantum algorithms for lattice problems

Yilei Chen. Quantum algorithms for lattice problems. Technical report, Cryptology ePrint Archive, Report 2024/555, 2024. Updated April 18: algorithm contains an unfixable bug invalidating the main claim (see Section 3.5.9, Page 37)

work page 2024

[19] [19]

Gold-medalist performance in solving olympiad geometry with alphageometry2.arXiv preprint arXiv:2502.03544, 2025

Yuri Chervonyi, Trieu H Trinh, Miroslav Ol ˇs´ak, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V Le, and Thang Luong. Gold-medalist performance in solving olympiad geometry with alphageometry2.arXiv preprint arXiv:2502.03544, 2025

work page arXiv 2025

[20] [20]

Llms as conversational partners for mathematicians.arXiv preprint arXiv:2305.XXXX, 2023

Emily Collins et al. Llms as conversational partners for mathematicians.arXiv preprint arXiv:2305.XXXX, 2023

work page 2023

[21] [21]

Z3: An efficient smt solver

Leonardo de Moura and Nikolaj Bjørner. Z3: An efficient smt solver. InProceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 337–340. Springer, 2008

work page 2008

[22] [22]

The lean theorem prover (system description)

Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. The lean theorem prover (system description). InAutomated Deduction – CADE-25, volume 9195 ofLecture Notes in Computer Science, pages 378–388. Springer, 2015

work page 2015

[23] [23]

Deepseek-prover-v1 dataset.https://huggingface.co/datasets/deepseek-ai/ DeepSeek-Prover-V1, 2024

DeepSeek-AI. Deepseek-prover-v1 dataset.https://huggingface.co/datasets/deepseek-ai/ DeepSeek-Prover-V1, 2024. Accessed: 2025-08-24

work page 2024

[24] [24]

Deepseek-prover-v2-671b, 2025

deepseek ai. Deepseek-prover-v2-671b, 2025. Accessed: 2025-09-24

work page 2025

[25] [25]

Lean-lsp-mcp: Tools for agentic interaction with the lean theorem prover, 3 2025

Oliver Dressler. Lean-lsp-mcp: Tools for agentic interaction with the lean theorem prover, 3 2025. Accessed: 2025-08-24

work page 2025

[26] [26]

Abstract algebra

David S Dummit and Richard M Foote. Abstract algebra. john wile & sons.Inc., Hoboken, NJ, 2004

work page 2004

[27] [27]

Praveen Gauravaram, Lars Knudsen, Krystian Matusiewicz, Florian Mendel, Christian Rechberger, Martin Schl¨affer, and Søren S. Thomsen. Grøstl – a sha-3 candidate. Submission to nist, NIST, September 2008. Available athttp://www.groestl.info/

work page 2008

[28] [28]

Tactictoe: Learning to prove with tactics

Th ´er`ese Gauthier, Cezary Kaliszyk, and Josef Urban. Tactictoe: Learning to prove with tactics. InProceedings of the International Conference on Automated Deduction (CADE), pages 275–294. Springer, 2021. 24

work page 2021

[29] [29]

Towards an AI co-scientist

Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. Towards an ai co-scientist.arXiv preprint arXiv:2502.18864, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Learning to prove theorems via interacting with proof assistants

Daniel Huang et al. Learning to prove theorems via interacting with proof assistants. InInternational Conference on Machine Learning (ICML), pages 2654–2663, 2019

work page 2019

[31] [31]

Alemi, et al

Geoffrey Irving, Christian Szegedy, Alexander A. Alemi, et al. Deepmath - deep sequence models for premise selection. InAdvances in Neural Information Processing Systems (NeurIPS), pages 2235–2243, 2016

work page 2016

[32] [32]

Another look at” provable security”.Journal of Cryptology, 20(1):3–37, 2007

Neal Koblitz and Alfred J Menezes. Another look at” provable security”.Journal of Cryptology, 20(1):3–37, 2007

work page 2007

[33] [33]

First-order theorem proving and vampire

Laura Kov ´acs and Andrei V oronkov. First-order theorem proving and vampire. InProceedings of the Interna- tional Conference on Computer Aided Verification (CAV), pages 1–35. Springer, 2013

work page 2013

[34] [34]

LeanAgent: Lifelong Learning for Formal Theorem Proving

Adarsh Kumarappan, Mo Tiwari, Peiyang Song, Robert Joseph George, Chaowei Xiao, and Anima Anandkumar. Leanagent: Lifelong learning for formal theorem proving.arXiv preprint arXiv:2410.06209, 2024

work page Pith review arXiv 2024

[35] [35]

Deep reinforcement learning for theorem proving

Guillaume Lample and Franc ¸ois Charton. Deep reinforcement learning for theorem proving. InInternational Conference on Learning Representations (ICLR), 2022

work page 2022

[36] [36]

Mathlib statistics.https://leanprover-community.github.io/ mathlib_stats.html, 2025

Lean Prover Community. Mathlib statistics.https://leanprover-community.github.io/ mathlib_stats.html, 2025. GitHub repository for generating statistics plots for Mathlib; accessed 2025-08-24

work page 2025

[37] [37]

Numinamath.[https://huggingface.co/AI-MO/NuminaMath-1.5](https: //github.com/project-numina/aimo-progress-prize/blob/main/report/numina_ dataset.pdf), 2024

Jia LI, Edward Beeching, Lewis Tunstall, Ben Lipkin, Roman Soletskyi, Shengyi Costa Huang, Kashif Rasul, Longhui Yu, Albert Jiang, Ziju Shen, Zihan Qin, Bin Dong, Li Zhou, Yann Fleureau, Guillaume Lample, and Stanislas Polu. Numinamath.[https://huggingface.co/AI-MO/NuminaMath-1.5](https: //github.com/project-numina/aimo-progress-prize/blob/main/report/num...

work page 2024

[38] [38]

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, et al. Goedel-prover: A frontier model for open-source automated theorem proving.arXiv preprint arXiv:2502.07640, 2025

work page Pith review arXiv 2025

[39] [39]

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, et al. Goedel-prover-v2: Scaling formal theorem proving with scaffolded data synthesis and self-correction.arXiv preprint arXiv:2508.03613, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[40] [40]

Unconditional Security Of Quantum Key Distribution Over Arbitrarily Long Distances

Hoi-Kwong Lo and HF Chau. Unconditional security of quantum key distribution over arbitrarily long distances. arXiv preprint quant-ph/9803006, 1998

work page internal anchor Pith review Pith/arXiv arXiv 1998

[41] [41]

Unconditional security of quantum key distribution over arbitrarily long distances.science, 283(5410):2050–2056, 1999

Hoi-Kwong Lo and Hoi Fung Chau. Unconditional security of quantum key distribution over arbitrarily long distances.science, 283(5410):2050–2056, 1999

work page 2050

[42] [42]

Process-driven autoformalization in lean 4

Zeyuan Lu, Guodong Zhang, Junxiao Chen, Huan Chen, Yilun Chen, Zonglin Li, Yiping Li, Lianmin Wang, Yao Lin, Ce Zhang, and Jie Chen. Process-driven autoformalization in lean 4. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[43] [43]

Introducing gauss, an agent for autoformalization, 2025

Math Inc. Introducing gauss, an agent for autoformalization, 2025. Announcement of autoformalization agent for formal verification in mathematics

work page 2025

[44] [44]

A formalization of the generalized quantum stein’s lemma in lean.arXiv preprint arXiv:2510.08672, 2025

Alex Meiburg, Leonardo A Lessa, and Rodolfo R Soldati. A formalization of the generalized quantum stein’s lemma in lean.arXiv preprint arXiv:2510.08672, 2025

work page arXiv 2025

[45] [45]

P. R. Mishra, Yogesh Kumar, Susanta Samanta, and Atul Gaur. A new algorithm for computing branch number of non-singular matrices over finite fields.arXiv preprint arXiv:2405.07007, 2024. 25

work page arXiv 2024

[46] [46]

What is the model context protocol (mcp)?https://modelcontextprotocol

Model Context Protocol. What is the model context protocol (mcp)?https://modelcontextprotocol. io/docs/getting-started/intro, 2024. Accessed: 2025-10-05

work page 2024

[47] [47]

Robust de-anonymization of large sparse datasets

Arvind Narayanan and Vitaly Shmatikov. Robust de-anonymization of large sparse datasets. In2008 IEEE Symposium on Security and Privacy (sp 2008), pages 111–125. IEEE, 2008

work page 2008

[48] [48]

Advanced encryption standard (aes)

National Institute of Standards and Technology. Advanced encryption standard (aes). Federal Information Processing Standards Publication FIPS 197-upd1, U.S. Department of Commerce, Gaithersburg, MD, 2001. Published November 26, 2001; Updated May 9, 2023

work page 2001

[49] [49]

Cambridge university press, 2010

Michael A Nielsen and Isaac L Chuang.Quantum computation and quantum information. Cambridge university press, 2010

work page 2010

[50] [50]

Numinamath-lean dataset.https://huggingface.co/datasets/AI-MO/ NuminaMath-LEAN, 2025

Numina-Team. Numinamath-lean dataset.https://huggingface.co/datasets/AI-MO/ NuminaMath-LEAN, 2025. Accessed: 2025-08-24

work page 2025

[51] [51]

Openai models documentation, 2025

OpenAI. Openai models documentation, 2025. Accessed: 2025-09-19

work page 2025

[52] [52]

Apollo: Automated llm and lean collaboration for advanced formal reasoning.arXiv preprint arXiv:2505.05758, 2025

Azim Ospanov, Farzan Farnia, and Roozbeh Yousefzadeh. Apollo: Automated llm and lean collaboration for advanced formal reasoning.arXiv preprint arXiv:2505.05758, 2025

work page arXiv 2025

[53] [53]

Formal mathematics statement curriculum learning

Stanislas Polu et al. Formal mathematics statement curriculum learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[54] [54]

Generative language modeling for automated theorem proving

Stanislas Polu and Ilya Sutskever. Generative language modeling for automated theorem proving. InInterna- tional Conference on Learning Representations (ICLR), 2020

work page 2020

[55] [55]

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

Z.Z. Ren, Zhihong Shao, Wenfeng Liang, et al. Deepseek-prover-v2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition.https://arxiv.org/abs/2504.21801, 2025. Accessed: 2025-08-24

work page internal anchor Pith review Pith/arXiv arXiv 2025

[56] [56]

S ´ebastien Rousseau. Bug discovered in quantum algorithm for lattice-based crypto.https:// sebastienrousseau.com/2024-04-22-bug-discovered-in-quantum-algorithm-for-lattice-based-crypto/ index.html, April 22 2024. Accessed: [add access date here]

work page 2024

[57] [57]

E prover 2.0: Integrating equational and first-order logic

Stephan Schulz, Simon Cruanes, and Petar Vukmirovi´c. E prover 2.0: Integrating equational and first-order logic. InProceedings of the International Conference on Automated Deduction (CADE), pages 523–541. Springer, 2019

work page 2019

[58] [58]

Simple proof of security of the bb84 quantum key distribution protocol.Physical review letters, 85(2):441, 2000

Peter W Shor and John Preskill. Simple proof of security of the bb84 quantum key distribution protocol.Physical review letters, 85(2):441, 2000

work page 2000

[59] [59]

Sequences of games: a tool for taming complexity in security proofs.cryptology eprint archive, 2004

Victor Shoup. Sequences of games: a tool for taming complexity in security proofs.cryptology eprint archive, 2004

work page 2004

[60] [60]

Song et al.,Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean, 2025

Peiyang Song, Kaiyu Yang, and Anima Anandkumar. Lean copilot: Large language models as copilots for theorem proving in lean.arXiv preprint arXiv:2404.12534, 2024

work page arXiv 2024

[61] [61]

k-anonymity: A model for protecting privacy.International journal of uncertainty, fuzziness and knowledge-based systems, 10(05):557–570, 2002

Latanya Sweeney. k-anonymity: A model for protecting privacy.International journal of uncertainty, fuzziness and knowledge-based systems, 10(05):557–570, 2002

work page 2002

[62] [62]

Putnambench leaderboard.https://trishullab.github.io/PutnamBench/ leaderboard.html, 2025

Trishullab. Putnambench leaderboard.https://trishullab.github.io/PutnamBench/ leaderboard.html, 2025. Accessed: 2025-10-07

work page 2025

[63] [63]

Putnambench: Evaluating neural theorem-provers on the putnam mathematical competi- tion.Advances in Neural Information Processing Systems, 37:11545–11569, 2024

George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding, Michael Jennings, Amitayush Thakur, and Swarat Chaudhuri. Putnambench: Evaluating neural theorem-provers on the putnam mathematical competi- tion.Advances in Neural Information Processing Systems, 37:11545–11569, 2024

work page 2024

[64] [64]

Machine learning preselected proof steps

Josef Urban, Geoff Sutcliffe, Stefan Petrov, and Josef Vysko ˇcil. Machine learning preselected proof steps. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2046–2051, 2011. 26

work page 2046

[65] [65]

Hilbert: Recursively building formal proofs with informal reasoning.arXiv preprint arXiv:2509.22819, 2025

Sumanth Varambally, Thomas V oice, Yanchao Sun, Zhifeng Chen, Rose Yu, and Ke Ye. Hilbert: Recursively building formal proofs with informal reasoning.arXiv preprint arXiv:2509.22819, 2025

work page arXiv 2025

[66] [66]

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Haiming Wang et al. Kimina-prover preview: Towards large formal reasoning models with reinforcement learn- ing.https://arxiv.org/abs/2504.11354, 2025. Accessed: 2025-08-24

work page internal anchor Pith review Pith/arXiv arXiv 2025

[67] [67]

Autoformalization in the Era of Large Language Models: A Survey

Qihao Wu, Haotian Zhang, Jialin Chen, Yizhou Li, Xingjian Zhang, Ce Zhang, and Jie Chen. Autoformalization in the era of large language models: A survey.arXiv preprint arXiv:2505.23486, 2025

work page Pith review arXiv 2025

[68] [68]

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, and Xiaodan Liang. Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data.https: //arxiv.org/abs/2405.14333, 2024. Accessed: 2025-08-24

work page Pith review arXiv 2024

[69] [69]

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Huajian Xin, ZZ Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, et al. Deepseek-prover-v1. 5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search.arXiv preprint arXiv:2408.08152, 2024

work page Pith review arXiv 2024

[70] [70]

Leandojo: Theorem proving with large language models

Zhangir Xin et al. Leandojo: Theorem proving with large language models. InInternational Conference on Learning Representations (ICLR), 2024

work page 2024

[71] [71]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search.arXiv preprint arXiv:2504.08066, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[72] [72]

Lean workbook: A large-scale lean problem set formalized from natural language math problems.https://arxiv.org/abs/2406

Huaiyuan Ying, Zijian Wu, Yihan Geng, Jiayu Wang, Dahua Lin, and Kai Chen. Lean workbook: A large-scale lean problem set formalized from natural language math problems.https://arxiv.org/abs/2406. 03847, 2024. Accessed: 2025-08-24

work page 2024

[73] [73]

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

Kunhao Zheng, Jesse Michael Han, and Stanislas Polu. Minif2f: A cross-system benchmark for formal olympiad- level mathematics.https://arxiv.org/abs/2109.00110, 2021. Accessed: 2025-08-24. 27

work page internal anchor Pith review Pith/arXiv arXiv 2021