Advancing Mathematics Research with AI-Driven Formal Proof Search
Pith reviewed 2026-05-22 05:05 UTC · model grok-4.3
The pith
An AI agent using large language models generates and verifies formal proofs in Lean to solve open Erdős problems and OEIS conjectures.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an AI agent can autonomously resolve open problems by producing Lean-verifiable formal proofs, as shown by the resolution of 9 Erdős problems and 44 OEIS conjectures, with the technique now applied across multiple branches of mathematics.
What carries the argument
An agent that alternates large-language-model generation of candidate Lean proofs with automated verification inside the Lean kernel.
If this is right
- Formal verification can be inserted into the workflow of active mathematical research at modest per-problem cost.
- The same agent design works across combinatorics, optimization, graph theory, algebraic geometry, and quantum optics.
- A basic alternating generation-and-verification loop already reproduces the Erdős successes, though harder problems raise its cost.
- Large-scale testing on hundreds of open statements is now feasible with current language-model capabilities.
Where Pith is reading between the lines
- If the cost stays low, smaller research groups could run their own formal-proof searches on local conjectures without large budgets.
- The approach could be tried in other formal systems besides Lean to broaden the set of checkable problems.
- Hybrid loops that let human mathematicians edit the agent's intermediate steps might solve still harder open questions.
- Success on Erdős and OEIS lists suggests the method could be aimed next at problems that have resisted both human and computer attack for decades.
Load-bearing premise
The targeted Erdős problems and OEIS conjectures were still open before this work, and the Lean outputs are complete formal proofs produced without essential human intervention in the derivation steps.
What would settle it
An independent Lean expert confirming that one of the reported proofs is incomplete or that any of the nine Erdős problems had already been solved before the agent ran would disprove the claim of autonomous resolution.
Figures
read the original abstract
Large language models (LLMs) increasingly excel at mathematical reasoning, but their unreliability limits their utility in mathematics research. A mitigation is using LLMs to generate formal proofs in languages like Lean. We perform the first large-scale evaluation of this method's ability to solve open problems. Our most capable agent autonomously resolved 9 of 353 open Erd\H{o}s problems at the per-problem cost of a few hundred dollars, proved 44/492 OEIS conjectures, and is being deployed in combinatorics, optimization, graph theory, algebraic geometry, and quantum optics research. A basic agent alternating LLM-based generation with Lean-based verification replicated the Erd\H{o}s successes but proved costlier on the hardest problems. These findings demonstrate the power of AI-aided formal proof search and shed light on the agent designs that enable it.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports the first large-scale evaluation of LLM-based agents for generating formal proofs in Lean to solve open mathematical problems. The central claims are that the most capable agent autonomously resolved 9 of 353 open Erdős problems at a per-problem cost of a few hundred dollars, proved 44 of 492 OEIS conjectures, and is now deployed across combinatorics, optimization, graph theory, algebraic geometry, and quantum optics; a simpler alternating generation-verification agent is shown to replicate some successes but at higher cost on hard instances.
Significance. If the reported successes are independently confirmed, the work would demonstrate that current LLM-driven formal proof search can address genuinely open problems at modest cost, providing a concrete benchmark for AI-assisted mathematics. The use of Lean verification supplies a high standard of correctness that distinguishes this from informal LLM reasoning claims, and the multi-domain deployment suggests immediate research utility beyond the Erdős and OEIS benchmarks.
major comments (2)
- [Abstract and results section] Abstract and results section: the headline claim that the agent 'autonomously resolved 9 of 353 open Erdős problems' is load-bearing for the autonomy and cost assertions, yet the manuscript supplies neither per-problem literature-search logs confirming the problems were previously unsolved nor proof-size statistics or artifact links demonstrating that the Lean outputs are complete, hole-free formal proofs rather than partial derivations requiring human post-editing. Without these, the success counts cannot be fully assessed.
- [Methods and evaluation sections] Methods and evaluation sections: the reported success counts (9/353 Erdős, 44/492 OEIS) are presented without error bars, failure-mode breakdowns, or independent audit of the Lean artifacts. This omission directly affects the reliability of the comparative claim that the basic agent 'replicated the Erdős successes but proved costlier on the hardest problems.'
minor comments (2)
- [§2] The description of agent architectures would benefit from a clearer tabular comparison of the 'most capable' versus 'basic' designs, including exact prompting strategies and Lean interaction loops.
- [Figures and tables] Figure captions and table legends should explicitly state whether the reported costs include only inference or also human oversight time.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which help clarify the presentation of our results on LLM-based formal proof search. We address each major comment below and describe the revisions we will incorporate to improve transparency and verifiability.
read point-by-point responses
-
Referee: [Abstract and results section] Abstract and results section: the headline claim that the agent 'autonomously resolved 9 of 353 open Erdős problems' is load-bearing for the autonomy and cost assertions, yet the manuscript supplies neither per-problem literature-search logs confirming the problems were previously unsolved nor proof-size statistics or artifact links demonstrating that the Lean outputs are complete, hole-free formal proofs rather than partial derivations requiring human post-editing. Without these, the success counts cannot be fully assessed.
Authors: We agree that explicit documentation strengthens the autonomy and correctness claims. In the revised manuscript we will add a dedicated subsection (and supplementary table) that lists, for each of the nine solved Erdős problems, the pre-experiment literature references establishing that the problem remained open, together with links to the corresponding Lean proof files in a public repository. We will also report proof-size statistics (number of tactics, lines of code, and kernel-verification time) for every success. All reported proofs were produced and checked end-to-end by the agent with no human post-editing; the added artifacts will make this verifiable. revision: yes
-
Referee: [Methods and evaluation sections] Methods and evaluation sections: the reported success counts (9/353 Erdős, 44/492 OEIS) are presented without error bars, failure-mode breakdowns, or independent audit of the Lean artifacts. This omission directly affects the reliability of the comparative claim that the basic agent 'replicated the Erdős successes but proved costlier on the hardest problems.'
Authors: We concur that quantitative uncertainty measures and failure analysis improve interpretability. The revision will include binomial 95 % confidence intervals for both success rates and a breakdown of the dominant failure modes (timeout, tactic failure, search exhaustion) across the full set of attempts. We will also expand the agent-comparison section with per-problem cost figures for the hardest instances to support the cost-effectiveness claim. All Lean artifacts and run logs will be released publicly to enable independent audit; performing a third-party audit ourselves lies outside the scope of the present study. revision: partial
Circularity Check
No significant circularity; results are empirical evaluations on external benchmarks
full rationale
The paper reports counts of resolved open Erdős problems and OEIS conjectures obtained by running LLM-based agents and verifying outputs in Lean. These success metrics are generated by direct application to independently defined external problems rather than by fitting parameters to subsets of the target data or by self-referential definitions. No equations, ansatzes, or uniqueness theorems are invoked that reduce the headline claims to the inputs by construction. The methodology relies on external verification and open problem lists, making the derivation chain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Lean theorem prover provides sound verification of formal proofs
Reference graph
Works this paper leans on
-
[1]
Aristotle: IMO-level Automated Theorem Proving
TudorAchim, AlexBest, AlbertoBietti, KevinDer, MathïsFédérico, SergeiGukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, et al. Aristotle: IMO-level automated theorem proving.arXiv preprint arXiv:2510.01346, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Primitive sets and von mangoldt chains: Erdős problem #1196 and beyond, 2026
BorisAlexeev,KevinBarreto,YanyangLi,JaredDukerLichtman,LiamPrice,JibranIqbal Shah, Quanyu Tang, and Terence Tao. Primitive sets and von mangoldt chains: Erdős problem #1196 and beyond, 2026
work page 2026
-
[3]
Short proofs in combinatorics and number theory.arXiv preprint arXiv:2603.29961, 2026
Boris Alexeev, Moe Putterman, Mehtaab Sawhney, Mark Sellke, and Gregory Valiant. Short proofs in combinatorics and number theory.arXiv preprint arXiv:2603.29961, 2026
-
[4]
Short proofs in combinatorics, probability and number theory II
Boris Alexeev, Moe Putterman, Mehtaab Sawhney, Mark Sellke, and Gregory Valiant. Short proofs in combinatorics, probability and number theory ii.arXiv preprint arXiv:2604.06609, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[5]
LeniAniva,ChuyueSun,BrandoMiranda,ClarkBarrett,andSanmiKoyejo. Pantograph: A machine-to-machine interaction interface for advanced theorem proving, high level reasoning, and data extraction in lean 4, 2025
work page 2025
- [6]
- [7]
-
[8]
Thomas Bloom. Erdosproblems.com. https://www.erdosproblems.com, 2026
work page 2026
-
[9]
Mats Boij, Juan C. Migliore, Rosa M. Miró-Roig, Uwe Nagel, and Fabrizio Zanello.On the Shape of a Pure O-Sequence, volume 218 ofMemoirs of the American Mathematical Society. American Mathematical Society, Providence, RI, 2012
work page 2012
-
[10]
J. A. Bondy and R. L. Hemminger. Graph reconstruction–a survey.Journal of Graph Theory, 1(3):227–268, 1977
work page 1977
-
[11]
Jim Bryan, Balázs Elek, Freddie Manners, George Salafatinos, and Ravi Vakil. The motivic class of the space of genus0maps to the flag variety.arXiv preprint arXiv:2601.07222, 2026
-
[12]
S. A. Burr, P. Erdős, R. L. Graham, and W. Wen-Ching Li. Complete sequences of sets of integer powers.Acta Arithmetica, 77(2):133–138, 1996
work page 1996
-
[13]
Last-Iterate Convergence of Anchored Gradient Descent
Yang Cai and Weiqiang Zheng. Last-iterate convergence of anchored gradient descent. arXiv preprint arXiv:2604.12235, 2026. 11 Advancing Mathematics Research with AI-Driven Formal Proof Search
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[14]
François Caron and Arnaud Doucet. Efficient Bayesian inference for generalized Bradley–Terry models.Journal of Computational and Graphical Statistics, 21(1):174– 196, 2012
work page 2012
-
[15]
Jiangjie Chen, Wenxiang Chen, Jiacheng Du, Jinyi Hu, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Wenlei Shi, et al. Seed-prover 1.5: Mastering undergraduate-level theorem proving via learning from experience.arXiv preprint arXiv:2512.17260, 2025
-
[16]
Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model, 2024
DeepSeek-AI. Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model, 2024
work page 2024
-
[17]
On Erd\H{o}s and S\'ark\"ozy's sequences with Property P
Christian Elsholtz and Stefan Planitzer. On erdős and sárközy’s sequences with property p.arXiv preprint arXiv:1609.07935, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[18]
On the divisibility properties of sequences of integers
Paul Erdős and Alice Sárközi. On the divisibility properties of sequences of integers. Proceedings of The London Mathematical Society, pages 97–101, 1970
work page 1970
-
[19]
Siemion Fajtlowicz. On conjectures of graffiti. In J. Akiyama, Y. Egawa, and H. Enomoto, editors,Graph Theory and Applications, volume 38 ofAnnals of Discrete Mathematics, pages 113–118. Elsevier, 1988
work page 1988
-
[20]
Aletheia tackles firstproof autonomously.arXiv preprint arXiv:2602.21201, 2026
Tony Feng, Junehyuk Jung, Sang-hyun Kim, Carlo Pagano, Sergei Gukov, Chiang- Chiang Tsai, David Woodruff, Adel Javanmard, Aryan Mokhtari, Dawsen Hwang, et al. Aletheia tackles firstproof autonomously.arXiv preprint arXiv:2602.21201, 2026
-
[21]
Tony Feng, Trieu H. Trinh, Garrett Bingham, Dawsen Hwang, Yuri Chervonyi, Junehyuk Jung,JoonkyungLee,CarloPagano,SanghyunKim,FedericoPasqualotto,SergeiGukov, Jonathan N. Lee, Junsu Kim, Kaiying Hou, Golnaz Ghiasi, Yi Tay, YaGuang Li, Chenkai Kuang, Yuan Liu, Hanzhao Lin, Evan Zheran Liu, Nigamaa Nayakanti, Xiaomeng Yang, Heng-Tze Cheng, Demis Hassabis, Ko...
-
[22]
A strict separation between two notions of quadratically structured functions
Moritz Firsching and Bogdan Georgiev. A strict separation between two notions of quadratically structured functions. In preparation, 2026
work page 2026
-
[23]
Moritz Firsching, Paul Lezeau, Salvatore Mercuri, Miklós Z. Horváth, Yaël Dillies, Calle Sönne, Eric Wieser, Fred Zhang, Thomas Hubert, Blaise Agüera y Arcas, and Pushmeet Kohli. Formal conjectures: An open and evolving benchmark for verified discovery in mathematics, 2026
work page 2026
-
[24]
Safeverify.https://github.com/GasStationManager/Saf eVerify, 2025
GasStationManager. Safeverify.https://github.com/GasStationManager/Saf eVerify, 2025. [Accessed: 2026-05-12]
work page 2025
-
[25]
Mathematical exploration and discovery at scale
Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, and Adam Zsolt Wagner. Mathe- matical exploration and discovery at scale.arXiv preprint arXiv:2511.02864, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[26]
Google DeepMind. Gemini 3.1 deep think, 2026. Accessed: 2026-04-30
work page 2026
-
[27]
Ben Green. 100 open problems. https://people.maths.ox.ac.uk/greenbj/papers/open- problems.pdf, 2024. 12 Advancing Mathematics Research with AI-Driven Formal Proof Search
work page 2024
-
[28]
A Milestone in Formalization: The Sphere Packing Problem in Dimension 8
Sidharth Hariharan, Christopher Birkbeck, Seewoo Lee, Ho Kiu Gareth Ma, Bhavik Mehta, Auguste Poiroux, and Maryna Viazovska. A milestone in formalization: The sphere packing problem in dimension 8.arXiv preprint arXiv:2604.23468, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[29]
Olympiad-level formal mathematical reasoning with reinforcement learning.Nature, pages 1–3, 2025
Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z Horváth, Goran Žužić, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Masoom, Ottavia Bertolli, Tom Zahavy, Amol Mandhane, Jessica Yung, Iuliya Beloshapka, Borja Ibarz, Vivek Veeriah, Lei Yu, Oliver Nash, Paul Lezeau, Salvatore Mercuri, Calle Sönne, Bhavik Mehta, Alex Davies, ...
work page 2025
-
[30]
Learning and planning in complex action spaces
Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Mohammadamin Barekatain, Simon Schmitt, and David Silver. Learning and planning in complex action spaces. In International Conference on Machine Learning, 2021
work page 2021
-
[31]
Geoffrey Huntley. Ralph wiggum as a "software engineer".https://ghuntley.com /ralph, 2025. Blog post
work page 2025
-
[32]
Equality in Fill's spectral gap problem
Vishesh Jain and Clayton Mizgerd. Equality in fill’s spectral gap problem.arXiv preprint arXiv:2604.03937, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[33]
Uijeong Jang and Ernest K Ryu. Point convergence of nesterov’s accelerated gradient method: An ai-assisted proof.arXiv preprint arXiv:2510.23513, 2025
-
[34]
Albert Q Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, and Guillaume Lample. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs.arXiv preprint arXiv:2210.12283, 2022
-
[35]
Albert Qiaochu Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, and Mateja Jamnik. Thor: Wielding hammers to integrate language models and automated theorem provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022
work page 2022
-
[36]
Paul J. Kelly. A congruence theorem for trees.Pacific Journal of Mathematics, 7(1):961– 968, 1957
work page 1957
-
[37]
John R. Koza. Genetic programming as a means for programming computers by natural selection.Statistics and Computing, 4(2):87–112, 1994
work page 1994
-
[38]
A Tensor-Algebraic No-Go Theorem for High-Dimensional Photonic GHZ States
Mario Krenn, Moritz Firsching, George Tsoukalas, Rishikesh Gajjala, Xuemei Gu, and Swarat Chaudhuri. A Tensor-Algebraic No-Go Theorem for High-Dimensional Photonic GHZ States. In preparation, 2026
work page 2026
-
[39]
Mario Krenn, Xuemei Gu, and Anton Zeilinger. Quantum experiments and graphs: Multiparty states as coherent superpositions of perfect matchings.Physical Review Letters, 119(24), December 2017. 13 Advancing Mathematics Research with AI-Driven Formal Proof Search
work page 2017
-
[40]
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, et al. Goedel-prover-v2: Scaling formal theorem proving with scaffolded data synthesis and self-correction.arXiv preprint arXiv:2508.03613, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[41]
Luce.Individual Choice Behavior: A Theoretical Analysis
R.D. Luce.Individual Choice Behavior: A Theoretical Analysis. Wiley, 1959
work page 1959
-
[42]
Gauss: An agent for autoformalization, 2026
Math Inc. Gauss: An agent for autoformalization, 2026. Accessed: 2026
work page 2026
-
[43]
Thelean4theoremproverandprogramming language
LeonardodeMouraandSebastianUllrich. Thelean4theoremproverandprogramming language. InInternational Conference on Automated Deduction, pages 625–635. Springer, 2021
work page 2021
-
[44]
Ansh Nagda, Prabhakar Raghavan, and Abhradeep Thakurta. Reinforced generation of combinatorial structures: Hardness of approximation.arXiv preprint arXiv:2509.18057, 2025
-
[45]
Reinforced Generation of Combinatorial Structures: Ramsey Numbers
Ansh Nagda, Prabhakar Raghavan, and Abhradeep Thakurta. Reinforced generation of combinatorial structures: Ramsey numbers.arXiv preprint arXiv:2603.09172, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[46]
Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. Alphaevolve: A coding agent for scientific and algor...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
R. L. Plackett. The analysis of permutations.Journal of the Royal Statistical Society. Series C (Applied Statistics), 24(2):193–202, 1975
work page 1975
-
[48]
Generative language modeling for automated theorem proving
StanislasPoluandIlyaSutskever. Generativelanguagemodelingforautomatedtheorem proving.arXiv preprint arXiv:2009.03393, 2020
-
[49]
On infinite sets with no3on a line.arXiv preprint arXiv:2602.21275, 2026
Moe Putterman, Mehtaab Sawhney, and Gregory Valiant. On infinite sets with no3on a line.arXiv preprint arXiv:2602.21275, 2026
-
[50]
Christian Reiher, Vojtěch Rödl, and Marcelo Sales. Colouring versus density in integers and hales–jewett cubes.Journal of the London Mathematical Society, 110(5):e12987, 2024
work page 2024
-
[51]
Pawan Kumar, Emilien Dupont, Francisco J
Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli, and Alhussein Fawzi. Mathematical discoveries from program search with large language models.Nature, 625(7995):468– 475, 2024
work page 2024
-
[52]
Ernest K. Ryu, Kun Yuan, and Wotao Yin. Ode analysis of stochastic gradient methods with optimism and anchoring for minimax problems.arXiv preprint arXiv:1905.10899, 2019. 14 Advancing Mathematics Research with AI-Driven Formal Proof Search
-
[53]
Johannes Schmitt. Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with ai.arXiv preprint arXiv:2512.14575, 2025
-
[54]
On a problem of erdős and sárközy.Journal of Combinatorial Theory, Series A, 94(1):191–195, 2001
Tomasz Schoen. On a problem of erdős and sárközy.Journal of Combinatorial Theory, Series A, 94(1):191–195, 2001
work page 2001
-
[55]
A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, 2018
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, 2018
work page 2018
-
[56]
Neil J. Sloane. The on-line encyclopedia of integer sequences. InProceedings of the 14th Symposium on Towards Mechanized Mathematical Assistants: 6th International Conference, Calculemus ’07 / MKM ’07, page 130, Berlin, Heidelberg, 2007. Springer- Verlag
work page 2007
-
[57]
Richard P. Stanley. Hilbert functions of graded algebras.Advances in Mathematics, 28(1):57–83, 1978
work page 1978
-
[58]
An Improved Last-Iterate Convergence Rate for Anchored Gradient Descent Ascent
Anja Surina, Arun Suggala, George Tsoukalas, Anton Kovsharov, Sergey Shirobokov, Francisco JR Ruiz, Pushmeet Kohli, and Swarat Chaudhuri. An improved last-iterate convergencerateforanchoredgradientdescentascent.arXivpreprintarXiv:2604.03782, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[59]
Terence Tao and contributors. Ai contributions to Erdős problems.https://github .com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-p roblems, 2026. Accessed: 2026-04-23
work page 2026
-
[60]
The Mathlib Community. The Lean Mathematical Library. InProceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2020, New Orleans, LA, USA, 2020. ACM
work page 2020
-
[61]
Stanislaw M. Ulam. A collection of mathematical problems.New York and London: Interscience Publishers, 1960
work page 1960
-
[62]
Stanislaw M. Ulam.A Collection of Mathematical Problems, volume 8 ofInterscience Tracts in Pure and Applied Mathematics. Interscience Publishers, New York, 1960
work page 1960
-
[63]
Sumanth Varambally, Thomas Voice, Yanchao Sun, Zhifeng Chen, Rose Yu, and Ke Ye. Hilbert: Recursively building formal proofs with informal reasoning.arXiv preprint arXiv:2509.22819, 2025
-
[64]
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
Haiming Wang, Mert Unsal, Xiaohan Lin, Mantas Baksys, Junqi Liu, Marco Dos Santos, Flood Sung, Marina Vinyes, Zhenzhe Ying, Zekai Zhu, et al. Kimina-prover preview: Towards large formal reasoning models with reinforcement learning.arXiv preprint arXiv:2504.11354, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[65]
David P Woodruff, Vincent Cohen-Addad, Lalit Jain, Jieming Mao, Song Zuo, Moham- madHossein Bateni, Simina Branzei, Michael P Brenner, Lin Chen, Ying Feng, et al. 15 Advancing Mathematics Research with AI-Driven Formal Proof Search Accelerating scientific research with gemini: Case studies and common techniques. arXiv preprint arXiv:2602.03837, 2026
-
[66]
Formal mathematical reasoning: A new frontier in ai.arXiv preprint 2412.16075, 2024
Kaiyu Yang, Gabriel Poesia, Jingxuan He, Wenda Li, Kristin Lauter, Swarat Chaudhuri, and Dawn Song. Formal mathematical reasoning: A new frontier in ai.arXiv preprint 2412.16075, 2024
-
[67]
Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan J Prenger, and Animashree Anandkumar. Leandojo: Theorem proving with retrieval-augmented language models.Advances in Neural Information Processing Systems, 36:21573–21612, 2023
work page 2023
-
[68]
Fabrizio Zanello. Log-concavity of level Hilbert functions and pure o-sequences.Journal of Commutative Algebra, 16(2):245–256, 2024
work page 2024
-
[69]
Daniel Zheng, Ingrid von Glehn, Yori Zwols, Iuliya Beloshapka, Lars Buesing, Daniel M. Roy, Martin Wattenberg, Bogdan Georgiev, Tatiana Schmidt, Andrew Cowie, Fernanda Viegas, Dimitri Kanevsky, Vineet Kahlon, Hartmut Maennel, Sophia Alj, George Holland, Alex Davies, and Pushmeet Kohli. AI Co-Mathematician: Accelerating mathematicians with agentic AI.arXiv...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[70]
Population Database and Matchmaking
Database sampling:The controller selects a root proof sketch𝑆root by sampling from the database, along with𝑀= 2auxiliary inspiration sketches 𝑆insp. The selection strat- egy balances exploitation of high-rated sketches with exploration of diverse candidates (see “Population Database and Matchmaking”)
-
[71]
Prompt construction:A prompt X is assembled to guide the LLM. It integrates the formal problem specification, the Lean source code and natural language plan of𝑆root, and structured feedback derived from AlphaProof’s previous attempts on{𝑆insp}. As in AlphaEvolve, the controller encourages diversity by stochastically injecting instructions such as “decompo...
-
[72]
Prover subagent:The assembled prompt X is dispatched to the LLM (Gemini 3.1 Pro), initiating a multi-turn episode. To scale to large Lean files, the subagent outputs mutations via asearch_replace tool in a compact diff format rather than rewriting the entire file. The subagent can also query AlphaProof to test specific subgoals mid- episode; the feedback ...
-
[73]
Validation:Once the candidate sketch𝑆′ passes the sandbox check, it undergoes formal validation. The system extracts all remainingsorry subgoals and cross-references them against a global goal cache using a deep hash of their exact Lean state (see “Global GoalCaching”). Ifa subgoal waspreviously resolved, theproof isretrieved immediately; otherwise, it is...
-
[74]
Database registration:The candidate 𝑆′, along with per-subgoal feedback from Al- phaProof, is registered in the database. Its fitness is then determined asynchronously via Elo matchmaking. AlphaProof has a Test-Time Reinforcement Learning (TTRL) mode in which it learns to solve a problem by solving its AI-generated variants at inference time; however, we ...
work page 2026
-
[75]
4𝑁1(𝑋) +𝑁 3(𝑋) ≤ 3|𝑋| + 2𝑁2(𝑋), which follows from writing𝑁𝑘(𝑥)= Í 𝑥∈ℤ 1𝑋 (𝑥) 1𝑋 (𝑥+ 𝑘) and a pointwise check of the indicator function1𝑋 across all4-point windows (𝑥, 𝑥+ 1, 𝑥+ 2, 𝑥+ 3); that is, summing the local inequality1𝑋 (𝑥) + 1𝑋 (𝑥+ 1) + 1𝑋 (𝑥+
-
[76]
+ 1𝑋 (𝑥) 1𝑋 (𝑥+ 2) + 1𝑋 (𝑥+ 1)1𝑋 (𝑥+ 3) ≥ 1𝑋 (𝑥) 1𝑋 (𝑥+ 1) + 21𝑋 (𝑥+ 1)1𝑋 (𝑥+ 2) + 1𝑋 (𝑥+2)1 𝑋 (𝑥+3) +1 𝑋 (𝑥)1 𝑋 (𝑥+3)and then summing over all𝑥∈ℤ
-
[77]
Consider each pair(𝑥, 𝑥+ 2) ∈𝑋 2 and we proceed by cases on𝑥+ 1
2𝑁2(𝑋) ≤𝑁 3(𝑋) + 2𝑉2(𝑋) + 2𝐼(𝑋) . Consider each pair(𝑥, 𝑥+ 2) ∈𝑋 2 and we proceed by cases on𝑥+ 1. Let 𝐺={𝑥∈𝑋|𝑥+ 1 ∉𝑋∧𝑥+ 2 ∈𝑋} . If 𝑥+ 1 ∈𝑋 , then the pair is counted exactly by𝑉2, so𝑁 2(𝑋)=𝑉 2(𝑋) + |𝐺|. If𝑥+1∉𝑋, then •If𝑥−1∈𝑋, then(𝑥−1, 𝑥+2)is a pair counted in𝑁 3(𝑋). •If𝑥+3∈𝑋, then(𝑥, 𝑥+3)is a pair counted in𝑁 3(𝑋). 41 Advancing Mathematics Research wit...
-
[78]
The cases𝛿= 0and 𝛿=−𝑘 each contribute≤ |𝐴| (one coordinate determines the rest). For any 𝛿∉{ 0,−𝑘} , the Sidon property ensures that there is at most one pair(𝑎, 𝑐) with 𝑎−𝑐=𝛿 and at most one pair(𝑑, 𝑏) with 𝑑−𝑏=𝛿+𝑘 . Consequently, each gap of size𝑘in𝐷identifies exactly one quadruple. Thus|quad 𝑘| ≤𝑁 𝑘(𝐷) +2𝑛
-
[79]
Combining these bounds shows that𝑁𝑘(𝐷) and 𝑁𝑘(𝑆) are related up to𝑂(𝑛) error
Each "good" element 𝑠∈𝑆 with 𝑠+𝑘∈𝑆 (excluding ≤ 2𝑛 doubles of the form2𝑎) produces≥4quadruples, giving4𝑁 𝑘(𝑆) ≤ |quad 𝑘| +8𝑛. Combining these bounds shows that𝑁𝑘(𝐷) and 𝑁𝑘(𝑆) are related up to𝑂(𝑛) error. Apply- ing fact (2) to𝐷=𝐴−𝐴gives 4𝑁1(𝐷) +𝑁 3(𝐷) ≤3|𝐷| +2𝑁 2(𝐷) and then substitute the above two quadruple transfer bounds for𝑘= 1, 2, 3and collecting al...
work page 2002
-
[80]
Í 𝑅𝑛,𝑘 ≤2𝑛2 𝑛/2 =𝑜 2𝑛+1 𝑛6 . 44 Advancing Mathematics Research with AI-Driven Formal Proof Search Since all residual terms are bounded by𝑜 2𝑛+1 𝑛6 , we conclude that 𝑎𝑛 =𝑓 𝑛 +𝑜 2𝑛+1 𝑛6 = 2𝑛+1 𝑛 1+ 1 𝑛 + 3 𝑛2 + 13 𝑛3 + 75 𝑛4 + 541 𝑛5 +𝑜 1 𝑛5 This establishes the conjectured asymptotic expansion.□ Theorem.(A conjecture of OEIS A228143, 2018) Let𝑠𝑚 = Í𝑚 𝑘=0 ...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.