Recognition: 2 theorem links
· Lean TheoremAutomatic Textbook Formalization
Pith reviewed 2026-05-13 20:03 UTC · model grok-4.3
The pith
An AI system has automatically formalized a full 500-page graduate algebraic combinatorics textbook into Lean code in one week.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A multi-agent AI system successfully produced a complete, standalone formalization of a graduate-level algebraic combinatorics textbook exceeding 500 pages, resulting in 130K lines of Lean code and 5,900 declarations. The work was carried out in one week by 30K agents collaborating in parallel on a shared version-controlled codebase, establishing a new scale for automatic textbook formalization with only minimal ongoing human input.
What carries the argument
A parallel multi-agent system in which 30,000 instances of an AI model collaborate on a shared version-controlled codebase to generate, verify, and integrate Lean formalizations of textbook content.
If this is right
- Graduate-level mathematics textbooks can be turned into machine-checked libraries on a timescale of days.
- The monetary cost of such formalizations can match or fall below the salary cost of equivalent human expert teams.
- Further efficiencies in the agent process are expected even without improvements to the underlying model.
- The resulting open-source Lean library and comparison website enable community inspection and extension of the formalization.
Where Pith is reading between the lines
- The same parallel-agent method could be tested on textbooks from other areas of mathematics to check how domain-specific the success is.
- If the generated libraries prove reliable, they could serve as starting points for connecting previously separate formal developments in Lean.
- Rapid generation of large verified texts raises the question of how to maintain and update them when new results appear in the source material.
Load-bearing premise
The multi-agent AI process produces output that is correct, complete, and faithful to the textbook content with only minimal human oversight after initial setup.
What would settle it
A human mathematician auditing the Lean library and identifying a substantial number of theorems whose formal statements or proofs deviate from or omit key content in the original textbook.
read the original abstract
We present a case study where an automatic AI system formalizes a textbook with more than 500 pages of graduate-level algebraic combinatorics to Lean. The resulting formalization represents a new milestone in textbook formalization scale and proficiency, moving from early results in undergraduate topology and restructuring of existing library content to a full standalone formalization of a graduate textbook. The formalization comprises 130K lines of code and 5900 Lean declarations and was conducted within one week by a total of 30K Claude 4.5 Opus agents collaborating in parallel on a shared code base via version control, simultaneously setting a record in multi-agent software engineering with usable results. The inference cost matches or undercuts what we estimate as the salaries required for a team of human experts, and we expect there is still the potential for large efficiencies to be made without the need for better models. We make our code, the resulting Lean code base and a side-by-side blueprint website available open-source.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a case study of an automatic multi-agent AI system that uses 30,000 Claude 4.5 Opus agents collaborating via version control to formalize a graduate-level algebraic combinatorics textbook exceeding 500 pages into Lean. The resulting artifact comprises 130K lines of code and 5900 declarations, completed in one week, with open-sourcing of the code base, Lean formalization, and a side-by-side blueprint website; the authors position this as a new milestone in textbook formalization scale and multi-agent software engineering, with inference costs competitive with human expert salaries.
Significance. If the formalization is largely automatic, faithful, and complete, the work marks a substantial advance in automated formal mathematics by scaling from prior undergraduate or library-restructuring efforts to a full standalone graduate textbook. The open-sourcing of the 130K-line Lean base and blueprint website, together with the demonstrated parallel multi-agent workflow, supplies concrete artifacts and a replicable engineering pattern that could accelerate future large-scale formalization projects.
major comments (2)
- [Abstract and results section] Abstract and results section: the central claim that the formalization was conducted 'with only minimal human oversight after initial setup' lacks any quantitative audit of human intervention volume (e.g., number of human commits, lines edited post-generation, or frequency of 'sorry' resolutions). This metric is load-bearing for the 'automatic' descriptor and the cost-comparison argument.
- [Results and evaluation] Results and evaluation: no quantitative breakdown is supplied of coverage (fraction of textbook statements formalized), error rates in generated code, or independent verification that the 5900 Lean declarations faithfully reproduce the textbook content. Soundness therefore rests entirely on the authors' internal assessment rather than reported measurements.
minor comments (2)
- [Methods] The description of the shared codebase and version-control workflow would benefit from additional detail on how merge conflicts and parallel edits were managed across 30K agents.
- Consider adding a summary table of key metrics (agent count, LOC, declarations, wall-clock time, estimated cost) for quick reference.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of our work and for the constructive comments that help clarify the presentation of our results. We address each major comment below, indicating revisions where appropriate to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and results section] Abstract and results section: the central claim that the formalization was conducted 'with only minimal human oversight after initial setup' lacks any quantitative audit of human intervention volume (e.g., number of human commits, lines edited post-generation, or frequency of 'sorry' resolutions). This metric is load-bearing for the 'automatic' descriptor and the cost-comparison argument.
Authors: We agree that a quantitative audit of human intervention would provide stronger evidence for the automatic character of the process. The multi-agent system was explicitly designed to operate with only initial setup and minimal subsequent oversight, relying on version control for agent collaboration. While detailed per-intervention logs were not collected in advance, our records show that human commits formed a small minority of total activity and that agents resolved the large majority of 'sorry' statements autonomously. We will add a new paragraph to the Results section with these available metrics and a description of the oversight process to support the claim. revision: yes
-
Referee: [Results and evaluation] Results and evaluation: no quantitative breakdown is supplied of coverage (fraction of textbook statements formalized), error rates in generated code, or independent verification that the 5900 Lean declarations faithfully reproduce the textbook content. Soundness therefore rests entirely on the authors' internal assessment rather than reported measurements.
Authors: We appreciate the call for more explicit evaluation metrics. The 5900 declarations provide complete coverage of the textbook's definitions, theorems, and proofs, and the final Lean code base compiles successfully with no remaining 'sorry' placeholders, indicating an effective error rate of zero in the delivered artifact. We will revise the evaluation section to state these points explicitly and include a high-level mapping of chapters to formalized content. Independent verification by unaffiliated parties was not performed in this case study; we will note this limitation and point to the open-sourced repository as enabling future community review. revision: partial
- Independent verification of faithfulness to the textbook content by parties external to the project
Circularity Check
No circularity: empirical case study with no derivation chain
full rationale
The paper reports an engineering outcome: a multi-agent AI system produced a 130K-line Lean formalization of a 500-page graduate textbook in one week. No mathematical derivations, predictions of fitted quantities, or self-referential definitions are present. Claims rest on reported metrics, open-sourced code, and cost comparisons rather than any chain that reduces to its own inputs by construction. No load-bearing self-citations or ansatzes are invoked.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math The Lean theorem prover correctly validates the formalized statements.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The formalization comprises 130K lines of code and 5900 Lean declarations and was conducted within one week by a total of 30K Claude 4.5 Opus agents collaborating in parallel on a shared code base via version control
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery theorem unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We use git for version control and adopt a trunk-based development model where agents work on short-lived feature branches
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
https://arxiv.org/abs/2510.01346. Lars Becker, María Inés de Frutos-Fernández, Leo Diedering, Floris van Doorn, Sébastien Gouëzel, Asgar Jamneshan, Evgenia Karunus, Edward van de Meent, Pietro Monticone, Jasper Mulder-Sohn, Jim Portegies, Joris Roos, Michael Rothgang, Rajula Srivastava, James Sundstrom, Jeremy Tan, and Christoph Thiele. A blueprint for th...
work page internal anchor Pith review arXiv 2025
-
[2]
Matthew Bolan, Joachim Breitner, Jose Brox, Nicholas Carlini, Mario Carneiro, Floris van Doorn, Martin Dvorak, Andrés Goens, Aaron Hill, Harald Husum, et al. The equational theories project: Advancing collaborative mathematical research at scale.arXiv preprint arXiv:2512.07087,
-
[3]
Kevin Buzzard. The future of mathematics? Talk slides, Pittsburgh, January 2020.https://www.andrew.cmu.edu/ user/avigad/meetings/fomm2020/slides/fomm_buzzard.pdf. Slides from a talk given in Pittsburgh (Jan 2020). Nicholas Carlini. Building a c compiler with a team of parallel claudes, February 2026.https://www.anthropic.com/ engineering/building-c-compil...
work page 2020
-
[4]
Fel’s conjecture on syzygies of numerical semigroups, 2026.https://arxiv.org/abs/2602.03716
Evan Chen, Chris Cummins, GSM, Dejan Grubisic, Leopold Haller, Letong Hong, Andranik Kurghinyan, Kenny Lau, Hugh Leather, Seewoo Lee, Aram Markosyan, Ken Ono, Manooshree Patel, Gaurang Pendharkar, Vedant Rathi, Alex Schneidman, Volker Seeker, Shubho Sengupta, Ishan Sinha, Jimmy Xin, and Jujian Zhang. Fel’s conjecture on syzygies of numerical semigroups, 2...
-
[5]
Ben Cottier, Ben Snodin, David Owen, and Tom Adamczewski
https://arxiv.org/abs/2309.14870. Ben Cottier, Ben Snodin, David Owen, and Tom Adamczewski. Llm inference prices have fallen rapidly but unequally across tasks, 2025.https://epoch.ai/data-insights/llm-inference-price-trends. Accessed: 2026-03-02. Kefan Dong and Tengyu Ma. Stp: Self-play llm theorem provers with iterative conjecturing and proving,
-
[6]
https://arxiv.org/abs/2502.00212. Ege Erdil and Tamay Besiroglu. Algorithmic progress in computer vision, 2023.https://arxiv.org/abs/2212.05153. David Gerard. Cursor lies about vibe-coding a web browser with ai, January 2026.https://pivot-to-ai.com/2026/01/ 27/cursor-lies-about-vibe-coding-a-web-browser-with-ai/. Blog post. Georges Gonthier. A computer-ch...
-
[7]
doi: 10.1007/978-3-642-39634-2\_14.https://inria.hal.science/hal-00816699
Springer. doi: 10.1007/978-3-642-39634-2\_14.https://inria.hal.science/hal-00816699. Darij Grinberg. An introduction to algebraic combinatorics.arXiv preprint arXiv:2506.00738,
-
[8]
The price of progress: Algorithmic efficiency and the falling cost of AI inference,
Hans Gundlach, Jayson Lynch, Matthias Mertens, and Neil Thompson. The price of progress: Algorithmic efficiency and the falling cost of ai inference, 2025.https://arxiv.org/abs/2511.23455. Emmanuel Gunther, Miguel Pagano, Pedro Sánchez Terraf, and Matías Steinberg. The independence of the continuum hypothesis in isabelle/zf.Archive of Formal Proofs, March
-
[9]
Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z
doi: 10.1017/fmp.2017.1. Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z. Horváth, Goran Žužić, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Masoom, Ottavia Bertolli, Tom Zahavy, Amol Mandhane, Jessica Yung, Iuliya Beloshapka, Borja Ibarz, Vivek Veeriah, Lei Yu, Oliver Nash, Paul Lezeau, Salvatore Mercuri, Calle Sönne, B...
-
[10]
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
https://arxiv.org/abs/2210.12283. Keller Jordan, Jeremy Bernstein, Brendan Rappazzo, @fernbear.bsky.social, Boza Vlado, You Jiacheng, Franz Cesista, Braden Koszarsky, and @Grad62304977. modded-nanogpt: Speedrunning the nanogpt baseline,
-
[11]
https://github.com/KellerJordan/modded-nanogpt. Hongjian Li. Extension of the fundamental theorem of algebra to polynomial matrix equations overq-circulant matrices. arXiv preprint arXiv:2601.13775,
-
[12]
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
Wilson Lin. Scaling long-running autonomous coding, January 2026.https://cursor.com/blog/scaling-agents. Blog post. Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, et al. Goedel-prover-v2: Scaling formal theorem proving with scaffolded data synthesis and self-correction.arXiv preprint...
-
[13]
Junqi Liu, Zihao Zhou, Zekai Zhu, Marco Dos Santos, Weikun He, Jiawei Liu, Ran Wang, Yunzhou Xie, Junqiao Zhao, Qiufeng Wang, et al. Numina-lean-agent: An open and general agentic reasoning system for formal mathematics. arXiv preprint arXiv:2601.14027,
-
[14]
At which training stage does code data help llms reasoning?, 2023.https://arxiv.org/abs/2309.16298
Yingwei Ma, Yue Liu, Yue Yu, Yuanliang Zhang, Yu Jiang, Changjian Wang, and Shanshan Li. At which training stage does code data help llms reasoning?, 2023.https://arxiv.org/abs/2309.16298. Math, Inc. Introducing Gauss, an agent for autoformalization.https://www.math.inc/gauss, September
-
[15]
Accessed: 2026-03-12. Math, Inc. Completing the formal proof of higher-dimensional sphere packing, March 2026.https://www.math.inc/ sphere-packing. Announcing the formalization of Maryna Viazovska’s E8 and Leech lattice proofs using the Gauss autoformalization agent. 16 The mathlib Community. The lean mathematical library. InProceedings of the 9th ACM SIG...
work page 2026
-
[16]
doi: 10.1145/3372885. 3373824.http://dx.doi.org/10.1145/3372885.3373824. Kim Morrison and the leanprover community. repl: A read-eval-print-loop for Lean
-
[17]
ISBN 978-0-13- 468951-7. Z. Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, Z. F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao, Daya Guo, and Chong Ruan. Deepseek-prover-v2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposi...
work page internal anchor Pith review arXiv 2025
-
[18]
Benjamin Skuse. Watershed moment for ai–human collaboration in math.IEEE Spectrum, March 2026.https: //spectrum.ieee.org/ai-proof-verification. Josef Urban. 130k lines of formal topology in two weeks: Simple and cheap autoformalization for everyone?,
work page 2026
-
[19]
Floris van Doorn, Patrick Massot, and Oliver Nash
https://arxiv.org/abs/2601.03298. Floris van Doorn, Patrick Massot, and Oliver Nash. Formalising the h-principle and sphere eversion. InProceedings of the 12th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2023, page 121–134, New York, NY, USA,
-
[20]
Association for Computing Machinery. ISBN 9798400700262. doi: 10.1145/3573105.3575688. https://doi.org/10.1145/3573105.3575688. Steven J. Vaughan-Nichols. Ok, so anthropic’s AI built a C compiler. that don’t impress me much, February
-
[21]
https://www.theregister.com/2026/02/13/anthropic_c_compiler/. Opinion piece. VladimirVoevodsky. Theoriginsandmotivationsofunivalentfoundations: Apersonalmissiontodevelopcomputerproof verification to avoid mathematical mistakes, 2014.https://www.ias.edu/ideas/2014/voevodsky-origins. Published inThe Institute Letter(Summer 2014), Institute for Advanced Stud...
work page 2026
-
[22]
Video: https://www.ias.edu/video/voevodsky14/. Haiming Wang, Mert Unsal, Xiaohan Lin, Mantas Baksys, Junqi Liu, Marco Dos Santos, Flood Sung, Marina Vinyes, Zhenzhe Ying, Zekai Zhu, Jianqiao Lu, Hugues de Saxcé, Bolton Bailey, Chendong Song, Chenjun Xiao, Dehao Zhang, Ebony Zhang, Frederick Pu, Han Zhu, Jiawei Liu, Jonas Bayer, Julien Michel, Longhui Yu, ...
-
[23]
Huajian Xin, ZZ Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, et al. Deepseek-prover-v1. 5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search.arXiv preprint arXiv:2408.08152,
-
[24]
17 Dayu Yang, Tianyang Liu, Daoan Zhang, Antoine Simoulin, Xiaoyi Liu, Yuwei Cao, Zhaopu Teng, Xin Qian, Grey Yang, Jiebo Luo, and Julian McAuley. Code to think, think to code: A survey on code-enhanced reasoning and reasoning-driven code intelligence in llms, 2025.https://arxiv.org/abs/2502.19411. John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Li...
-
[25]
Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye, Xianjun Yang, Lichang Chen, William Yang Wang, and Linda Ruth Petzold. Unveiling the impact of coding data instruction fine-tuning on large language models reasoning, 2024.https: //arxiv.org/abs/2405.20535. 18 Appendix A Estimating Token Caching Statistics Unfortunately, our logs only contain the total number of input ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.