Beyond the Library: An Agentic Framework for Autoformalizing Research Mathematics

Arshia Soltani Moakhar; Iman Gholami; Mahdi JafariRaviz; Max Springer; MohammadTaghi Hajiaghayi

arxiv: 2606.31134 · v1 · pith:M5ZHPPSInew · submitted 2026-06-30 · 💻 cs.AI

Beyond the Library: An Agentic Framework for Autoformalizing Research Mathematics

Arshia Soltani Moakhar , Iman Gholami , Max Springer , Mahdi JafariRaviz , MohammadTaghi Hajiaghayi This is my paper

Pith reviewed 2026-07-01 05:52 UTC · model grok-4.3

classification 💻 cs.AI

keywords autoformalizationLean 4agentic frameworkLLMtheorem provingmathematical formalizationSTOCAuxiliary Lemma

0 comments

The pith

An agentic system using general coding LLMs autoformalizes main theorems and proofs from research papers into Lean 4 by dynamically adding and validating new type definitions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework where an orchestrator coordinates multiple agents to translate natural-language research mathematics into Lean 4 code that can be mechanically checked. Because recent papers often introduce concepts absent from existing libraries, the system first extends the library with new type definitions and validates those extensions through a novel Auxiliary Lemma technique before attempting the main theorems. The approach was tested on a sample of PutnamBench problems and five STOC papers across combinatorics, communication complexity, mechanism design, and learning theory, yielding expert-validated formalizations of both statements and proofs, with two of the proofs requiring no axioms beyond Lean's kernel. A sympathetic reader would care because this demonstrates a concrete route to producing machine-checked versions of frontier results without depending on pre-existing formal libraries.

Core claim

The agentic autoformalization framework, powered by general coding LLMs and centered on an orchestrator that manages a multi-agent pipeline, successfully formalizes the main theorems and proofs of five STOC papers into Lean 4 after dynamically extending necessary type definitions and validating them via the Auxiliary Lemma technique; the same system also produces formalizations for 32 PutnamBench problems, with all outputs validated by human experts and two STOC proofs using only Lean's kernel axioms.

What carries the argument

The orchestrator that manages a multi-agent pipeline for autoformalization, using the Auxiliary Lemma technique to validate newly introduced type definitions before formalizing primary theorems.

If this is right

Research-level theorems in combinatorics, communication complexity, mechanism design, and learning theory can be accompanied by machine-checked Lean proofs.
Two of the five STOC formalizations require no axioms beyond Lean's kernel, indicating that minimal-axiom proofs are achievable for some results.
General-purpose coding LLMs can be orchestrated to outperform specialized Lean models on autoformalization of recent papers.
Putnam-level problems can be turned into machine-checked Lean proofs using the same pipeline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the Auxiliary Lemma method scales, future papers could be written with an accompanying formalization step that catches errors before submission.
The framework's success on STOC papers suggests it could be applied to other conference series to build a growing corpus of verified research results.
Combining this approach with interactive theorem-proving interfaces might let authors iteratively refine both informal and formal versions of their arguments in one workflow.

Load-bearing premise

The Auxiliary Lemma technique reliably validates newly introduced type definitions for concepts outside Mathlib without introducing undetected errors that would invalidate the subsequent formalization of the primary theorems.

What would settle it

A STOC paper where a new type definition passes Auxiliary Lemma validation yet produces a Lean formalization that experts later find contains an undetected logical error in the main theorem.

Figures

Figures reproduced from arXiv: 2606.31134 by Arshia Soltani Moakhar, Iman Gholami, Mahdi JafariRaviz, Max Springer, MohammadTaghi Hajiaghayi.

**Figure 2.** Figure 2: Type Formalizer Expanded; This is part of the main pipeline where we formalize and [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Faithfulness Judge: As illustrated, when provided with an informal statement and its [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Lean formalization of the headline theorem of Pham [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Group-level proof-dependency DAGs (box = proof stage, with Lean files [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 7.** Figure 7: Group-level proof-dependency DAGs (box = proof stage, with Lean files [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

read the original abstract

While Large Language Models (LLMs) have demonstrated exceptional capabilities in mathematical reasoning, they frequently produce subtle errors that evade human detection. Formal mathematical languages like Lean 4 offer mechanical proof checking, strongly motivating the need for autoformalization: the automatic translation of natural language mathematics into verifiable code. Recent trends indicate that general-purpose LLMs, heavily optimized for standard programming, now outperform smaller models explicitly fine-tuned for Lean. Leveraging this shift, we introduce an agentic autoformalization framework powered by general coding LLMs. At the core of our system is an orchestrator that manages a multi-agent pipeline tailored for research-level mathematics. Because cutting-edge research frequently relies on concepts outside the scope of existing libraries like Mathlib, our system dynamically extends necessary type definitions and validates them via a novel Auxiliary Lemma technique before formalizing the primary theorems. We applied our approach to PutnamBench, producing machine-checked Lean proofs for a random sample of 32 problems. Furthermore, we evaluate our system on five papers from the ACM Symposium on Theory of Computing (STOC) spanning combinatorics, communication complexity, mechanism design, and learning theory, successfully formalizing their main theorems and validating the generated formalizations with human experts; for all five we also formalize the proofs alongside the statements, and notably two of them are proved with no axioms beyond Lean's kernel. All of our formalizations are available at https://beyondthelibrary.github.io/formal_arxiv .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a multi-agent LLM system producing Lean formalizations of Putnam problems and five STOC papers with human validation and some kernel-only proofs, but the Auxiliary Lemma step for new type definitions lacks any independent soundness check or error analysis.

read the letter

The main takeaway is that this agentic pipeline, built around general coding LLMs and an orchestrator, turns natural-language research statements into Lean code that passes the kernel for real STOC theorems, including full proofs in all five cases and two of them with nothing beyond the kernel itself. That is a concrete step past earlier autoformalization work that stayed inside Mathlib.

What stands out is the practical handling of out-of-library concepts through on-the-fly type extension plus the Auxiliary Lemma validation step, plus the external benchmarks (PutnamBench sample and STOC papers) with independent human expert review. The results are not fitted to the same data, and the formalizations are released. Those elements give the work a usable empirical footprint.

The soft spot is exactly where the stress-test note points: the Auxiliary Lemma technique is load-bearing for the STOC results yet receives no separate argument, no exhaustive failure cases, and no quantitative error rates or ablations in the abstract. Without those, it is difficult to judge whether subtle inconsistencies in the new definitions are reliably caught before the main theorems are formalized. The paper would be stronger with even a short section walking through how the technique was tested on known problematic extensions.

This is aimed at groups working on LLM-assisted formalization and AI for mathematics. It is worth sending to peer review because the external validation and released artifacts are real evidence, even if the central validation method needs more scrutiny. A referee could usefully press on the soundness gap without dismissing the empirical results.

Referee Report

2 major / 2 minor

Summary. The paper introduces an agentic multi-LLM framework for autoformalizing research mathematics into Lean 4. Its core contribution is an orchestrator that dynamically extends Mathlib with new type definitions for concepts outside existing libraries and validates them via a novel Auxiliary Lemma technique before formalizing theorems and proofs. The system is evaluated on a random sample of 32 PutnamBench problems and five STOC papers (combinatorics, communication complexity, mechanism design, learning theory), producing machine-checked formalizations of main theorems and proofs; human experts validated the outputs, and two STOC formalizations require no axioms beyond Lean's kernel. All artifacts are released at the provided GitHub site.

Significance. If the central results hold, the work would be a meaningful step toward research-level autoformalization, particularly by addressing the library-extension barrier that has limited prior systems. The machine-checked proofs on external benchmarks (PutnamBench, STOC papers) with independent human review, plus the release of all formalizations, strengthen the contribution relative to purely informal LLM demonstrations.

major comments (2)

[Abstract; Auxiliary Lemma technique section] Abstract (paragraph on dynamic extension) and the section describing the Auxiliary Lemma technique: the claim that this technique reliably validates newly introduced type definitions for research concepts is load-bearing for the five STOC results, yet the manuscript supplies neither an independent soundness argument nor an exhaustive enumeration of potential failure modes (e.g., undetected missing axioms or inconsistent type extensions) that could invalidate subsequent kernel-checked proofs.
[Evaluation on PutnamBench and STOC papers] Evaluation sections on PutnamBench and STOC papers: the reported successes are presented without quantitative error rates, failure-case breakdowns, or ablation results on the multi-agent pipeline components, leaving the strength of empirical support for the framework's reliability moderate despite the external benchmarks and human validation.

minor comments (2)

[System architecture] The description of the orchestrator's agent roles and hand-off protocols would benefit from a single consolidated diagram or pseudocode listing the exact sequence of LLM calls and validation steps.
[Introduction / Method] Minor notation inconsistency: the term "Auxiliary Lemma technique" is introduced without a formal definition or pseudocode in the first occurrence, making the subsequent claims harder to parse on first reading.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major point below, indicating planned revisions where appropriate to strengthen the discussion of our methods and evaluation.

read point-by-point responses

Referee: [Abstract; Auxiliary Lemma technique section] Abstract (paragraph on dynamic extension) and the section describing the Auxiliary Lemma technique: the claim that this technique reliably validates newly introduced type definitions for research concepts is load-bearing for the five STOC results, yet the manuscript supplies neither an independent soundness argument nor an exhaustive enumeration of potential failure modes (e.g., undetected missing axioms or inconsistent type extensions) that could invalidate subsequent kernel-checked proofs.

Authors: The Auxiliary Lemma technique validates new type definitions by requiring the generation and kernel-checked proof of lemmas that capture their intended mathematical properties before they are used in theorem statements. This provides an empirical rather than meta-theoretic guarantee, as a formal soundness argument would necessitate reasoning about LLM behavior that lies beyond the paper's scope. We will revise the Auxiliary Lemma section to include an explicit enumeration of potential failure modes (such as undetected missing axioms or inconsistent extensions) and clarify how human expert validation of the final formalizations serves as an additional safeguard. We will also update the abstract to note that validation is empirical and supported by expert review. revision: partial
Referee: [Evaluation on PutnamBench and STOC papers] Evaluation sections on PutnamBench and STOC papers: the reported successes are presented without quantitative error rates, failure-case breakdowns, or ablation results on the multi-agent pipeline components, leaving the strength of empirical support for the framework's reliability moderate despite the external benchmarks and human validation.

Authors: The evaluation prioritizes demonstrating successful machine-checked formalizations on external research-level benchmarks (PutnamBench sample and STOC papers), each independently reviewed by human experts, rather than exhaustive error statistics. We agree that additional context on reliability would be beneficial and will add a subsection summarizing observed failure modes encountered during development (e.g., challenges in library extension or proof search) along with any quantitative metrics available from our experimental logs. Full component ablations are resource-intensive given the integrated pipeline and high cost per formalization, but we will discuss the contribution of each agent based on iterative testing. The kernel-checked proofs and expert validation remain the primary evidence of correctness. revision: partial

Circularity Check

0 steps flagged

No significant circularity; evaluations rely on external benchmarks and human review

full rationale

The paper presents an agentic framework for autoformalization and evaluates it on independent external benchmarks (PutnamBench sample and five STOC papers) with human expert validation of the generated Lean formalizations. The Auxiliary Lemma technique is introduced as a novel method for validating dynamic type extensions, but the central results (successful formalization of main theorems, including two kernel-only proofs) are not derived by fitting parameters to the evaluation data, nor do they reduce to self-citations or self-definitional steps. The derivation chain consists of system design choices justified by performance on held-out research papers rather than any closed loop where outputs are constructed from the same inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

This is an engineering systems paper; the central contribution is a new pipeline and validation technique rather than a mathematical derivation resting on fitted parameters or unstated axioms.

axioms (1)

domain assumption General-purpose coding LLMs can be orchestrated to perform autoformalization of research mathematics
The framework is built on the observed trend that general LLMs now outperform specialized Lean models.

invented entities (1)

Auxiliary Lemma technique no independent evidence
purpose: Validate dynamically created type definitions for concepts outside existing libraries before formalizing primary theorems
New method introduced by the authors to address out-of-library concepts.

pith-pipeline@v0.9.1-grok · 5814 in / 1328 out tokens · 42211 ms · 2026-07-01T05:52:50.095380+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 32 canonical work pages · 7 internal anchors

[1]

Acceler- ating scientific research with gemini: Case studies and common techniques.arXiv preprint arXiv:2602.03837, 2026

David P Woodruff, Vincent Cohen-Addad, Lalit Jain, Jieming Mao, Song Zuo, Mohammad- Hossein Bateni, Simina Branzei, Michael P Brenner, Lin Chen, Ying Feng, et al. Acceler- ating scientific research with gemini: Case studies and common techniques.arXiv preprint arXiv:2602.03837, 2026

work page arXiv 2026
[2]

The dawn after the dark: An empirical study on factuality hallucination in large language models

Junyi Li, Jie Chen, Ruiyang Ren, Xiaoxue Cheng, Wayne Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. The dawn after the dark: An empirical study on factuality hallucination in large language models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10879–10899, 2024

2024
[3]

The lean 4 theorem prover and programming language (system description)

Leonardo de Moura and Sebastian Ullrich. The lean 4 theorem prover and programming language (system description)
[4]

Numina-lean- agent: An open and general agentic reasoning system for formal mathematics, 2026

Junqi Liu, Zihao Zhou, Zekai Zhu, Marco Dos Santos, Weikun He, Jiawei Liu, Ran Wang, Yunzhou Xie, Junqiao Zhao, Qiufeng Wang, et al. Numina-Lean-Agent: An open and general agentic reasoning system for formal mathematics.arXiv preprint arXiv:2601.14027, 2026

work page arXiv 2026
[5]

A Minimal Agent for Automated Theorem Proving

Borja Requena Pozo, Austin Letson, Krystian Nowakowski, Izan Beltran Ferreiro, and Leopoldo Sarra. A minimal agent for automated theorem proving, 2026. URL https://arxiv.org/ abs/2602.24273

work page internal anchor Pith review Pith/arXiv arXiv 2026
[6]

Agentic proof automation: A case study.arXiv preprint arXiv:2601.03768, 2026

Yichen Xu and Martin Odersky. Agentic proof automation: A case study.arXiv preprint arXiv:2601.03768, 2026

work page arXiv 2026
[7]

MerLean: An agentic framework for autoformalization in quantum computation.arXiv preprint arXiv:2602.16554, 2026

Yuanjie Ren, Jinzheng Li, and Yidi Qi. MerLean: An agentic framework for autoformalization in quantum computation.arXiv preprint arXiv:2602.16554, 2026

work page arXiv 2026
[8]

M2F: Automated formalization of mathematical literature at scale.arXiv preprint arXiv:2602.17016, 2026

Zichen Wang, Wanli Ma, Zhenyu Ming, Gong Zhang, Kun Yuan, and Zaiwen Wen. M2F: Automated formalization of mathematical literature at scale.arXiv preprint arXiv:2602.17016, 2026

work page arXiv 2026
[9]

Seed-prover 1.5: Mastering undergraduate-level theorem proving via learning from experience.arXiv preprint arXiv:2512.17260, 2025

Jiangjie Chen, Wenxiang Chen, Jiacheng Du, Jinyi Hu, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Wenlei Shi, et al. Seed-prover 1.5: Mastering undergraduate-level theorem proving via learning from experience.arXiv preprint arXiv:2512.17260, 2025

work page arXiv 2025
[10]

Hilbert: Recursively building formal proofs with informal reasoning.arXiv preprint arXiv:2509.22819, 2025

Sumanth Varambally, Thomas V oice, Yanchao Sun, Zhifeng Chen, Rose Yu, and Ke Ye. Hilbert: Recursively building formal proofs with informal reasoning.arXiv preprint arXiv:2509.22819, 2025

work page arXiv 2025
[11]

Aristotle: IMO-level Automated Theorem Proving

Tudor Achim, Alex Best, Alberto Bietti, Kevin Der, Mathis Fédérico, Sergei Gukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, Martin Michelsen, Riley Patterson, Eric Rodriguez, Laura Scharff, Vikram Shanker, Vladmir Sicca, Hari Sowrirajan, Aidan Swope, Matyas Tamas, Vlad Tenev, Jonathan Thomm, Harold Williams, and La...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

URL http://arxiv.org/abs/ 1910.09336

The lean mathematical library.CoRR, abs/1910.09336, 2019. URL http://arxiv.org/abs/ 1910.09336

work page arXiv 1910
[13]

Pearson Education India, 2011

Ian Sommerville.Software engineering, 9/E. Pearson Education India, 2011

2011
[14]

An empirical study of unit test generation with large language models.arXiv preprint arXiv:2406.18181, 2024

Lin Yang, Chen Yang, Shutao Gao, Weijing Wang, Bo Wang, Qihao Zhu, Xiao Chu, Jianyi Zhou, Guangtai Liang, Qianxiang Wang, et al. An empirical study of unit test generation with large language models.arXiv preprint arXiv:2406.18181, 2024

work page arXiv 2024
[15]

Autoformalizer with tool feedback

Qi Guo, Jianing Wang, Jianfei Zhang, Deyang Kong, Xiangzhou Huang, Xiangyu Xi, Wei Wang, Jingang Wang, Xunliang Cai, Shikun Zhang, and Wei Ye. Autoformalizer with tool feedback, 2025. URLhttps://arxiv.org/abs/2510.06857

work page arXiv 2025
[16]

arXiv preprint arXiv:2507.15225 , year =

Yichi Zhou, Jianqiu Zhao, Yongxin Zhang, Bohan Wang, Siran Wang, Luoxin Chen, Jiahui Wang, Haowei Chen, Allan Jie, Xinbo Zhang, Haocheng Wang, Luong Trung, Rong Ye, Phan Nhat Hoang, Huishuai Zhang, Peng Sun, and Hang Li. Solving formal math problems by decomposition and iterative reflection.arXiv preprint arXiv:2507.15225, 2025. 13

work page arXiv 2025
[17]

A sharp version of talagrand’s selector process conjecture and an application to rounding fractional covers

Huy Tuan Pham. A sharp version of talagrand’s selector process conjecture and an application to rounding fractional covers. InProceedings of the 57th Annual ACM Symposium on Theory of Computing, pages 322–328, 2025

2025
[18]

Putnambench: Evaluating neural theorem-provers on the putnam mathematical competition.Advances in Neural Information Processing Systems, 37:11545–11569, 2024

George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding, Michael Jennings, Amitayush Thakur, and Swarat Chaudhuri. Putnambench: Evaluating neural theorem-provers on the putnam mathematical competition.Advances in Neural Information Processing Systems, 37:11545–11569, 2024

2024
[19]

Refuting the direct sum conjecture for total functions in deterministic communication complexity

Simon Mackenzie and Abdallah Saffidine. Refuting the direct sum conjecture for total functions in deterministic communication complexity. InProceedings of the 57th Annual ACM Symposium on Theory of Computing, 2025

2025
[20]

Approximation guarantees of median mechanism in Rd

Nikolai Gravin and Jianhao Jia. Approximation guarantees of median mechanism in Rd. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing, pages 495–506, 2025

2025
[21]

A generalized trace reconstruction problem: Recovering a string of probabilities

Joey Rivkin, Gregory Valiant, and Paul Valiant. A generalized trace reconstruction problem: Recovering a string of probabilities. InProceedings of the 57th Annual ACM Symposium on Theory of Computing, pages 1657–1667, 2025

2025
[22]

Efficiently learning mixtures of two gaussians

Adam Tauman Kalai, Ankur Moitra, and Gregory Valiant. Efficiently learning mixtures of two gaussians. InProceedings of the 42nd Annual ACM Symposium on Theory of Computing, pages 553–562, 2010

2010
[23]

Aleph ai solves 99.4 % of PutnamBench, topping the leader- board

Logical Intelligence. Aleph ai solves 99.4 % of PutnamBench, topping the leader- board. Blog post, 2026. URL https://logicalintelligence.com/blog/ aleph-solves-putnambench. Aleph agent powered by GPT-5.2; 668/672 problems solved on PutnamBench

2026
[24]

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Haiming Wang, Mert Unsal, Xiaohan Lin, Mantas Baksys, Junqi Liu, Marco Dos Santos, Flood Sung, Marina Vinyes, Zhenzhe Ying, Zekai Zhu, et al. Kimina-prover preview: Towards large formal reasoning models with reinforcement learning.arXiv preprint arXiv:2504.11354, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Indi- mathbench: Autoformalizing mathematical reasoning problems with a human touch.arXiv preprint arXiv:2512.00997, 2025

Param Biyani, Shashank Kirtania, Yasharth Bajpai, Sumit Gulwani, and Ashish Tiwari. Indi- mathbench: Autoformalizing mathematical reasoning problems with a human touch.arXiv preprint arXiv:2512.00997, 2025

work page arXiv 2025
[26]

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

ZZ Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, et al. Deepseek-prover-v2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition.arXiv preprint arXiv:2504.21801, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, et al. Goedel-prover-v2: Scaling formal theorem proving with scaffolded data synthesis and self-correction.arXiv preprint arXiv:2508.03613, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

Seed-prover: Deep and broad reasoning for automated theorem proving.arXiv preprint arXiv:2507.23726, 2025

Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Kaijing Ma, et al. Seed-prover: Deep and broad reasoning for automated theorem proving.arXiv preprint arXiv:2507.23726, 2025

work page arXiv 2025
[29]

Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics

Benjamin Breen, Marco Del Tredici, Jacob McCarran, Javier Aspuru Mijares, Weichen Winston Yin, Kfir Sulimany, Jacob M Taylor, Frank HL Koppens, and Dirk Englund. AX-Prover: A deep reasoning agentic framework for theorem proving in mathematics and quantum physics. arXiv preprint arXiv:2510.12787, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

On logarithmic concave measures and functions.Acta Scientiarum Mathe- maticarum (Szeged), 34:335–343, 1973

András Prékopa. On logarithmic concave measures and functions.Acta Scientiarum Mathe- maticarum (Szeged), 34:335–343, 1973

1973
[31]

A remark on stirling’s formula.The American Mathematical Monthly, 62(1): 26–29, 1955

Herbert Robbins. A remark on stirling’s formula.The American Mathematical Monthly, 62(1): 26–29, 1955. 14

1955
[32]

Hummel and Basilis C

Robert A. Hummel and Basilis C. Gidas. Zero crossings and the heat equation. Technical Report 111, Computer Science Division, Courant Institute of Mathematical Sciences, New York University, 1984

1984
[33]

Krantz and Harold R

Steven G. Krantz and Harold R. Parks.The Implicit Function Theorem: History, Theory, and Applications. Birkhäuser, 2002

2002
[34]

Krantz and Harold R

Steven G. Krantz and Harold R. Parks.A Primer of Real Analytic Functions. Birkhäuser, 2 edition, 2002

2002
[35]

F. R. K. Chung, R. L. Graham, P. Frankl, and J. B. Shearer. Some intersection theorems for ordered sets and graphs.Journal of Combinatorial Theory, Series A, 43(1):23–37, 1986

1986
[36]

Cambridge University Press, 1997

Eyal Kushilevitz and Noam Nisan.Communication Complexity. Cambridge University Press, 1997

1997
[37]

Theoremllama: Transforming general-purpose LLMs into lean4 experts, 2024

Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, and Tong Zhang. Theoremllama: Transforming general-purpose LLMs into lean4 experts, 2024. URL https: //arxiv.org/abs/2407.03203

work page arXiv 2024
[38]

Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data.arXiv preprint arXiv:2405.14333, 2024

Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, and Xiaodan Liang. Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data.arXiv preprint arXiv:2405.14333, 2024

work page arXiv 2024
[39]

Goedel-prover: A frontier model for open-source automated theorem proving.arXiv preprint arXiv:2502.07640, 2025

Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, et al. Goedel-prover: A frontier model for open-source automated theorem proving.arXiv preprint arXiv:2502.07640, 2025

work page arXiv 2025
[40]

Leanabell-prover-v2: Verifier-integrated reasoning for formal theorem proving via reinforcement learning.arXiv preprint arXiv:2507.08649, 2025

Xingguang Ji, Yahui Liu, Qi Wang, Jingyuan Zhang, Yang Yue, Rui Shi, Chenxi Sun, Fuzheng Zhang, Guorui Zhou, and Kun Gai. Leanabell-prover-v2: Verifier-integrated reasoning for formal theorem proving via reinforcement learning.arXiv preprint arXiv:2507.08649, 2025

work page arXiv 2025
[41]

Deepseek-prover-v1

Huajian Xin, ZZ Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, et al. Deepseek-prover-v1. 5: Harnessing proof assistant feed- back for reinforcement learning and monte-carlo tree search.arXiv preprint arXiv:2408.08152, 2024

work page arXiv 2024
[42]

Learning to repair Lean proofs from compiler feedback, 2026

Evan Wang, Simon Chess, Daniel Lee, Siyuan Ge, Ajit Mallavarapu, Jarod Alper, and Vasily Ilin. Learning to repair Lean proofs from compiler feedback, 2026. URL https://arxiv. org/abs/2602.02990

work page arXiv 2026
[43]

Leanabell-prover: Posttraining scaling in formal reasoning.arXiv preprint arXiv:2504.06122, 2025

Jingyuan Zhang, Qi Wang, Xingguang Ji, Yahui Liu, Yang Yue, Fuzheng Zhang, Di Zhang, Guorui Zhou, and Kun Gai. Leanabell-prover: Posttraining scaling in formal reasoning.arXiv preprint arXiv:2504.06122, 2025

work page arXiv 2025
[44]

LeanAgent: Lifelong learning for formal theorem proving, 2025

Adarsh Kumarappan, Mo Tiwari, Peiyang Song, Robert Joseph George, Chaowei Xiao, and Anima Anandkumar. LeanAgent: Lifelong learning for formal theorem proving, 2025. URL https://arxiv.org/abs/2410.06209

work page arXiv 2025
[45]

Process-driven autoformalization in lean 4

Jianqiao Lu, Yingjia Wan, Zhengying Liu, Yinya Huang, Jing Xiong, Chengwu Liu, Jianhao Shen, Hui Jin, Jipeng Zhang, Haiming Wang, et al. Process-driven autoformalization in Lean 4. arXiv preprint arXiv:2406.01940, 2024

work page arXiv 2024
[46]

arXiv preprint arXiv:2507.06181 , year =

Zhongyuan Peng, Yifan Yao, Kaijing Ma, Shuyue Guo, Yizhe Li, Yichi Zhang, Chenchen Zhang, Yifan Zhang, Zhouliang Yu, Luming Li, Minghao Liu, Yihang Xia, Jiawei Shen, Yuchen Wu, Yixin Cao, Zhaoxiang Zhang, Wenhao Huang, Jiaheng Liu, and Ge Zhang. CriticLean: Critic-guided reinforcement learning for mathematical formalization, 2025. URL https: //arxiv.org/a...

work page arXiv 2025
[47]

APOLLO: Automated LLM and lean collaboration for advanced formal reasoning.arXiv preprint arXiv:2505.05758, 2025

Azim Ospanov, Farzan Farnia, and Roozbeh Yousefzadeh. APOLLO: Automated LLM and lean collaboration for advanced formal reasoning.arXiv preprint arXiv:2505.05758, 2025

work page arXiv 2025
[48]

Automatic Textbook Formalization

Fabian Gloeckle, Ahmad Rammal, Charles Arnal, Remi Munos, Vivien Cabannes, Gabriel Synnaeve, and Amaury Hayat. Automatic textbook formalization, 2026. URL https://arxiv. org/abs/2604.03071. 15

work page internal anchor Pith review Pith/arXiv arXiv 2026
[49]

ProofBridge: Auto-formalization of natural language proofs in Lean via joint embeddings.arXiv preprint arXiv:2510.15681, 2025

Prithwish Jana, Kaan Kale, Ahmet Ege Tanriverdi, Cruise Song, Sriram Vishwanath, and Vijay Ganesh. ProofBridge: Auto-formalization of natural language proofs in Lean via joint embeddings.arXiv preprint arXiv:2510.15681, 2025

work page arXiv 2025
[50]

Lean for science formalization

Przemyslaw Chojecki. Lean for science formalization. 2026

2026
[51]

Formalizing chemical physics using the lean theorem prover

Maxwell P Bobbin, Samiha Sharlin, Parivash Feyzishendi, An Hong Dang, Catherine M Wraback, and Tyler R Josephson. Formalizing chemical physics using the lean theorem prover. Digital Discovery, 3(2):264–280, 2024

2024
[52]

Veribench: End-to-end formal verification benchmark for AI code generation in lean 4

Brando Miranda, Zhanke Zhou, Allen Nie, Elyas Obbad, Leni Aniva, Kai Fronsdal, Weston Kirk, Dilara Soylu, Andrea Yu, Ying Li, and Sanmi Koyejo. Veribench: End-to-end formal verification benchmark for AI code generation in lean 4. In2nd AI for Math Workshop @ ICML 2025, 2025. URLhttps://openreview.net/forum?id=rWkGFmnSNl. A Limitations • Algorithmic result...

2025
[53]

1968_b2, 1971_a2, 1983_b2, 1992_a2, 2021_a2
[54]

1975_a3, 1994_b3, 1997_a3, 2010_a3, 2022_b3
[55]

1965_b4, 1970_b4, 1971_a4, 1975_a4, 1987_b4, 1992_b4, 1993_a4, 2000_a4, 2005_a4, 2011_b4, 2019_a4
[56]

2002_a5, 2009_a5, 2012_a5
[57]

key property of towers of minimum fragments

1975_b6, 1986_b6, 1989_b6, 2003_b6, 2008_b6, 2014_a6, 2016_b6 During the evaluation, we disabled all internet access to prevent the models from retrieving external resources, existing proofs, or data leaks. We initialized the agents in their workspace repository with the specific Putnam problem and invoked them using the following system command: PutnamBe...

2025

[1] [1]

Acceler- ating scientific research with gemini: Case studies and common techniques.arXiv preprint arXiv:2602.03837, 2026

David P Woodruff, Vincent Cohen-Addad, Lalit Jain, Jieming Mao, Song Zuo, Mohammad- Hossein Bateni, Simina Branzei, Michael P Brenner, Lin Chen, Ying Feng, et al. Acceler- ating scientific research with gemini: Case studies and common techniques.arXiv preprint arXiv:2602.03837, 2026

work page arXiv 2026

[2] [2]

The dawn after the dark: An empirical study on factuality hallucination in large language models

Junyi Li, Jie Chen, Ruiyang Ren, Xiaoxue Cheng, Wayne Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. The dawn after the dark: An empirical study on factuality hallucination in large language models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10879–10899, 2024

2024

[3] [3]

The lean 4 theorem prover and programming language (system description)

Leonardo de Moura and Sebastian Ullrich. The lean 4 theorem prover and programming language (system description)

[4] [4]

Numina-lean- agent: An open and general agentic reasoning system for formal mathematics, 2026

Junqi Liu, Zihao Zhou, Zekai Zhu, Marco Dos Santos, Weikun He, Jiawei Liu, Ran Wang, Yunzhou Xie, Junqiao Zhao, Qiufeng Wang, et al. Numina-Lean-Agent: An open and general agentic reasoning system for formal mathematics.arXiv preprint arXiv:2601.14027, 2026

work page arXiv 2026

[5] [5]

A Minimal Agent for Automated Theorem Proving

Borja Requena Pozo, Austin Letson, Krystian Nowakowski, Izan Beltran Ferreiro, and Leopoldo Sarra. A minimal agent for automated theorem proving, 2026. URL https://arxiv.org/ abs/2602.24273

work page internal anchor Pith review Pith/arXiv arXiv 2026

[6] [6]

Agentic proof automation: A case study.arXiv preprint arXiv:2601.03768, 2026

Yichen Xu and Martin Odersky. Agentic proof automation: A case study.arXiv preprint arXiv:2601.03768, 2026

work page arXiv 2026

[7] [7]

MerLean: An agentic framework for autoformalization in quantum computation.arXiv preprint arXiv:2602.16554, 2026

Yuanjie Ren, Jinzheng Li, and Yidi Qi. MerLean: An agentic framework for autoformalization in quantum computation.arXiv preprint arXiv:2602.16554, 2026

work page arXiv 2026

[8] [8]

M2F: Automated formalization of mathematical literature at scale.arXiv preprint arXiv:2602.17016, 2026

Zichen Wang, Wanli Ma, Zhenyu Ming, Gong Zhang, Kun Yuan, and Zaiwen Wen. M2F: Automated formalization of mathematical literature at scale.arXiv preprint arXiv:2602.17016, 2026

work page arXiv 2026

[9] [9]

Seed-prover 1.5: Mastering undergraduate-level theorem proving via learning from experience.arXiv preprint arXiv:2512.17260, 2025

Jiangjie Chen, Wenxiang Chen, Jiacheng Du, Jinyi Hu, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Wenlei Shi, et al. Seed-prover 1.5: Mastering undergraduate-level theorem proving via learning from experience.arXiv preprint arXiv:2512.17260, 2025

work page arXiv 2025

[10] [10]

Hilbert: Recursively building formal proofs with informal reasoning.arXiv preprint arXiv:2509.22819, 2025

Sumanth Varambally, Thomas V oice, Yanchao Sun, Zhifeng Chen, Rose Yu, and Ke Ye. Hilbert: Recursively building formal proofs with informal reasoning.arXiv preprint arXiv:2509.22819, 2025

work page arXiv 2025

[11] [11]

Aristotle: IMO-level Automated Theorem Proving

Tudor Achim, Alex Best, Alberto Bietti, Kevin Der, Mathis Fédérico, Sergei Gukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, Martin Michelsen, Riley Patterson, Eric Rodriguez, Laura Scharff, Vikram Shanker, Vladmir Sicca, Hari Sowrirajan, Aidan Swope, Matyas Tamas, Vlad Tenev, Jonathan Thomm, Harold Williams, and La...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[12] [12]

URL http://arxiv.org/abs/ 1910.09336

The lean mathematical library.CoRR, abs/1910.09336, 2019. URL http://arxiv.org/abs/ 1910.09336

work page arXiv 1910

[13] [13]

Pearson Education India, 2011

Ian Sommerville.Software engineering, 9/E. Pearson Education India, 2011

2011

[14] [14]

An empirical study of unit test generation with large language models.arXiv preprint arXiv:2406.18181, 2024

Lin Yang, Chen Yang, Shutao Gao, Weijing Wang, Bo Wang, Qihao Zhu, Xiao Chu, Jianyi Zhou, Guangtai Liang, Qianxiang Wang, et al. An empirical study of unit test generation with large language models.arXiv preprint arXiv:2406.18181, 2024

work page arXiv 2024

[15] [15]

Autoformalizer with tool feedback

Qi Guo, Jianing Wang, Jianfei Zhang, Deyang Kong, Xiangzhou Huang, Xiangyu Xi, Wei Wang, Jingang Wang, Xunliang Cai, Shikun Zhang, and Wei Ye. Autoformalizer with tool feedback, 2025. URLhttps://arxiv.org/abs/2510.06857

work page arXiv 2025

[16] [16]

arXiv preprint arXiv:2507.15225 , year =

Yichi Zhou, Jianqiu Zhao, Yongxin Zhang, Bohan Wang, Siran Wang, Luoxin Chen, Jiahui Wang, Haowei Chen, Allan Jie, Xinbo Zhang, Haocheng Wang, Luong Trung, Rong Ye, Phan Nhat Hoang, Huishuai Zhang, Peng Sun, and Hang Li. Solving formal math problems by decomposition and iterative reflection.arXiv preprint arXiv:2507.15225, 2025. 13

work page arXiv 2025

[17] [17]

A sharp version of talagrand’s selector process conjecture and an application to rounding fractional covers

Huy Tuan Pham. A sharp version of talagrand’s selector process conjecture and an application to rounding fractional covers. InProceedings of the 57th Annual ACM Symposium on Theory of Computing, pages 322–328, 2025

2025

[18] [18]

Putnambench: Evaluating neural theorem-provers on the putnam mathematical competition.Advances in Neural Information Processing Systems, 37:11545–11569, 2024

George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding, Michael Jennings, Amitayush Thakur, and Swarat Chaudhuri. Putnambench: Evaluating neural theorem-provers on the putnam mathematical competition.Advances in Neural Information Processing Systems, 37:11545–11569, 2024

2024

[19] [19]

Refuting the direct sum conjecture for total functions in deterministic communication complexity

Simon Mackenzie and Abdallah Saffidine. Refuting the direct sum conjecture for total functions in deterministic communication complexity. InProceedings of the 57th Annual ACM Symposium on Theory of Computing, 2025

2025

[20] [20]

Approximation guarantees of median mechanism in Rd

Nikolai Gravin and Jianhao Jia. Approximation guarantees of median mechanism in Rd. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing, pages 495–506, 2025

2025

[21] [21]

A generalized trace reconstruction problem: Recovering a string of probabilities

Joey Rivkin, Gregory Valiant, and Paul Valiant. A generalized trace reconstruction problem: Recovering a string of probabilities. InProceedings of the 57th Annual ACM Symposium on Theory of Computing, pages 1657–1667, 2025

2025

[22] [22]

Efficiently learning mixtures of two gaussians

Adam Tauman Kalai, Ankur Moitra, and Gregory Valiant. Efficiently learning mixtures of two gaussians. InProceedings of the 42nd Annual ACM Symposium on Theory of Computing, pages 553–562, 2010

2010

[23] [23]

Aleph ai solves 99.4 % of PutnamBench, topping the leader- board

Logical Intelligence. Aleph ai solves 99.4 % of PutnamBench, topping the leader- board. Blog post, 2026. URL https://logicalintelligence.com/blog/ aleph-solves-putnambench. Aleph agent powered by GPT-5.2; 668/672 problems solved on PutnamBench

2026

[24] [24]

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Haiming Wang, Mert Unsal, Xiaohan Lin, Mantas Baksys, Junqi Liu, Marco Dos Santos, Flood Sung, Marina Vinyes, Zhenzhe Ying, Zekai Zhu, et al. Kimina-prover preview: Towards large formal reasoning models with reinforcement learning.arXiv preprint arXiv:2504.11354, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[25] [25]

Indi- mathbench: Autoformalizing mathematical reasoning problems with a human touch.arXiv preprint arXiv:2512.00997, 2025

Param Biyani, Shashank Kirtania, Yasharth Bajpai, Sumit Gulwani, and Ashish Tiwari. Indi- mathbench: Autoformalizing mathematical reasoning problems with a human touch.arXiv preprint arXiv:2512.00997, 2025

work page arXiv 2025

[26] [26]

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

ZZ Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, et al. Deepseek-prover-v2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition.arXiv preprint arXiv:2504.21801, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[27] [27]

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, et al. Goedel-prover-v2: Scaling formal theorem proving with scaffolded data synthesis and self-correction.arXiv preprint arXiv:2508.03613, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

Seed-prover: Deep and broad reasoning for automated theorem proving.arXiv preprint arXiv:2507.23726, 2025

Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Kaijing Ma, et al. Seed-prover: Deep and broad reasoning for automated theorem proving.arXiv preprint arXiv:2507.23726, 2025

work page arXiv 2025

[29] [29]

Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics

Benjamin Breen, Marco Del Tredici, Jacob McCarran, Javier Aspuru Mijares, Weichen Winston Yin, Kfir Sulimany, Jacob M Taylor, Frank HL Koppens, and Dirk Englund. AX-Prover: A deep reasoning agentic framework for theorem proving in mathematics and quantum physics. arXiv preprint arXiv:2510.12787, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

On logarithmic concave measures and functions.Acta Scientiarum Mathe- maticarum (Szeged), 34:335–343, 1973

András Prékopa. On logarithmic concave measures and functions.Acta Scientiarum Mathe- maticarum (Szeged), 34:335–343, 1973

1973

[31] [31]

A remark on stirling’s formula.The American Mathematical Monthly, 62(1): 26–29, 1955

Herbert Robbins. A remark on stirling’s formula.The American Mathematical Monthly, 62(1): 26–29, 1955. 14

1955

[32] [32]

Hummel and Basilis C

Robert A. Hummel and Basilis C. Gidas. Zero crossings and the heat equation. Technical Report 111, Computer Science Division, Courant Institute of Mathematical Sciences, New York University, 1984

1984

[33] [33]

Krantz and Harold R

Steven G. Krantz and Harold R. Parks.The Implicit Function Theorem: History, Theory, and Applications. Birkhäuser, 2002

2002

[34] [34]

Krantz and Harold R

Steven G. Krantz and Harold R. Parks.A Primer of Real Analytic Functions. Birkhäuser, 2 edition, 2002

2002

[35] [35]

F. R. K. Chung, R. L. Graham, P. Frankl, and J. B. Shearer. Some intersection theorems for ordered sets and graphs.Journal of Combinatorial Theory, Series A, 43(1):23–37, 1986

1986

[36] [36]

Cambridge University Press, 1997

Eyal Kushilevitz and Noam Nisan.Communication Complexity. Cambridge University Press, 1997

1997

[37] [37]

Theoremllama: Transforming general-purpose LLMs into lean4 experts, 2024

Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, and Tong Zhang. Theoremllama: Transforming general-purpose LLMs into lean4 experts, 2024. URL https: //arxiv.org/abs/2407.03203

work page arXiv 2024

[38] [38]

Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data.arXiv preprint arXiv:2405.14333, 2024

Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, and Xiaodan Liang. Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data.arXiv preprint arXiv:2405.14333, 2024

work page arXiv 2024

[39] [39]

Goedel-prover: A frontier model for open-source automated theorem proving.arXiv preprint arXiv:2502.07640, 2025

Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, et al. Goedel-prover: A frontier model for open-source automated theorem proving.arXiv preprint arXiv:2502.07640, 2025

work page arXiv 2025

[40] [40]

Leanabell-prover-v2: Verifier-integrated reasoning for formal theorem proving via reinforcement learning.arXiv preprint arXiv:2507.08649, 2025

Xingguang Ji, Yahui Liu, Qi Wang, Jingyuan Zhang, Yang Yue, Rui Shi, Chenxi Sun, Fuzheng Zhang, Guorui Zhou, and Kun Gai. Leanabell-prover-v2: Verifier-integrated reasoning for formal theorem proving via reinforcement learning.arXiv preprint arXiv:2507.08649, 2025

work page arXiv 2025

[41] [41]

Deepseek-prover-v1

Huajian Xin, ZZ Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, et al. Deepseek-prover-v1. 5: Harnessing proof assistant feed- back for reinforcement learning and monte-carlo tree search.arXiv preprint arXiv:2408.08152, 2024

work page arXiv 2024

[42] [42]

Learning to repair Lean proofs from compiler feedback, 2026

Evan Wang, Simon Chess, Daniel Lee, Siyuan Ge, Ajit Mallavarapu, Jarod Alper, and Vasily Ilin. Learning to repair Lean proofs from compiler feedback, 2026. URL https://arxiv. org/abs/2602.02990

work page arXiv 2026

[43] [43]

Leanabell-prover: Posttraining scaling in formal reasoning.arXiv preprint arXiv:2504.06122, 2025

Jingyuan Zhang, Qi Wang, Xingguang Ji, Yahui Liu, Yang Yue, Fuzheng Zhang, Di Zhang, Guorui Zhou, and Kun Gai. Leanabell-prover: Posttraining scaling in formal reasoning.arXiv preprint arXiv:2504.06122, 2025

work page arXiv 2025

[44] [44]

LeanAgent: Lifelong learning for formal theorem proving, 2025

Adarsh Kumarappan, Mo Tiwari, Peiyang Song, Robert Joseph George, Chaowei Xiao, and Anima Anandkumar. LeanAgent: Lifelong learning for formal theorem proving, 2025. URL https://arxiv.org/abs/2410.06209

work page arXiv 2025

[45] [45]

Process-driven autoformalization in lean 4

Jianqiao Lu, Yingjia Wan, Zhengying Liu, Yinya Huang, Jing Xiong, Chengwu Liu, Jianhao Shen, Hui Jin, Jipeng Zhang, Haiming Wang, et al. Process-driven autoformalization in Lean 4. arXiv preprint arXiv:2406.01940, 2024

work page arXiv 2024

[46] [46]

arXiv preprint arXiv:2507.06181 , year =

Zhongyuan Peng, Yifan Yao, Kaijing Ma, Shuyue Guo, Yizhe Li, Yichi Zhang, Chenchen Zhang, Yifan Zhang, Zhouliang Yu, Luming Li, Minghao Liu, Yihang Xia, Jiawei Shen, Yuchen Wu, Yixin Cao, Zhaoxiang Zhang, Wenhao Huang, Jiaheng Liu, and Ge Zhang. CriticLean: Critic-guided reinforcement learning for mathematical formalization, 2025. URL https: //arxiv.org/a...

work page arXiv 2025

[47] [47]

APOLLO: Automated LLM and lean collaboration for advanced formal reasoning.arXiv preprint arXiv:2505.05758, 2025

Azim Ospanov, Farzan Farnia, and Roozbeh Yousefzadeh. APOLLO: Automated LLM and lean collaboration for advanced formal reasoning.arXiv preprint arXiv:2505.05758, 2025

work page arXiv 2025

[48] [48]

Automatic Textbook Formalization

Fabian Gloeckle, Ahmad Rammal, Charles Arnal, Remi Munos, Vivien Cabannes, Gabriel Synnaeve, and Amaury Hayat. Automatic textbook formalization, 2026. URL https://arxiv. org/abs/2604.03071. 15

work page internal anchor Pith review Pith/arXiv arXiv 2026

[49] [49]

ProofBridge: Auto-formalization of natural language proofs in Lean via joint embeddings.arXiv preprint arXiv:2510.15681, 2025

Prithwish Jana, Kaan Kale, Ahmet Ege Tanriverdi, Cruise Song, Sriram Vishwanath, and Vijay Ganesh. ProofBridge: Auto-formalization of natural language proofs in Lean via joint embeddings.arXiv preprint arXiv:2510.15681, 2025

work page arXiv 2025

[50] [50]

Lean for science formalization

Przemyslaw Chojecki. Lean for science formalization. 2026

2026

[51] [51]

Formalizing chemical physics using the lean theorem prover

Maxwell P Bobbin, Samiha Sharlin, Parivash Feyzishendi, An Hong Dang, Catherine M Wraback, and Tyler R Josephson. Formalizing chemical physics using the lean theorem prover. Digital Discovery, 3(2):264–280, 2024

2024

[52] [52]

Veribench: End-to-end formal verification benchmark for AI code generation in lean 4

Brando Miranda, Zhanke Zhou, Allen Nie, Elyas Obbad, Leni Aniva, Kai Fronsdal, Weston Kirk, Dilara Soylu, Andrea Yu, Ying Li, and Sanmi Koyejo. Veribench: End-to-end formal verification benchmark for AI code generation in lean 4. In2nd AI for Math Workshop @ ICML 2025, 2025. URLhttps://openreview.net/forum?id=rWkGFmnSNl. A Limitations • Algorithmic result...

2025

[53] [53]

1968_b2, 1971_a2, 1983_b2, 1992_a2, 2021_a2

[54] [54]

1975_a3, 1994_b3, 1997_a3, 2010_a3, 2022_b3

[55] [55]

1965_b4, 1970_b4, 1971_a4, 1975_a4, 1987_b4, 1992_b4, 1993_a4, 2000_a4, 2005_a4, 2011_b4, 2019_a4

[56] [56]

2002_a5, 2009_a5, 2012_a5

[57] [57]

key property of towers of minimum fragments

1975_b6, 1986_b6, 1989_b6, 2003_b6, 2008_b6, 2014_a6, 2016_b6 During the evaluation, we disabled all internet access to prevent the models from retrieving external resources, existing proofs, or data leaks. We initialized the agents in their workspace repository with the specific Putnam problem and invoked them using the following system command: PutnamBe...

2025