Advancing Mathematics Research with AI-Driven Formal Proof Search

Adam Zsolt Wagner; Aja Huang; Andrew Ferrauiolo; Anja Surina; Anton Kovsharov; Arun Suggala; Codrut Grosu; Eric Wieser; Francisco J. R. Ruiz; George Tsoukalas

arxiv: 2605.22763 · v1 · pith:W6IWLF6Enew · submitted 2026-05-21 · 💻 cs.AI

Advancing Mathematics Research with AI-Driven Formal Proof Search

George Tsoukalas , Anton Kovsharov , Sergey Shirobokov , Anja Surina , Moritz Firsching , Gergely B\'erczi , Francisco J. R. Ruiz , Arun Suggala

show 12 more authors

Adam Zsolt Wagner Eric Wieser Lei Yu Aja Huang Mikl\'os Z. Horv\'ath Andrew Ferrauiolo Henryk Michalewski Codrut Grosu Thomas Hubert Matej Balog Pushmeet Kohli Swarat Chaudhuri

This is my paper

Pith reviewed 2026-05-22 05:05 UTC · model grok-4.3

classification 💻 cs.AI

keywords AI for mathematicsformal proofsLean theorem proverErdős problemsOEIS conjecturesautomated theorem provingproof searchmathematical reasoning

0 comments

The pith

An AI agent using large language models generates and verifies formal proofs in Lean to solve open Erdős problems and OEIS conjectures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large language models can help solve genuinely open mathematical questions by writing formal proofs that a computer can check in the Lean theorem prover. Their strongest agent found proofs for 9 out of 353 listed Erdős problems at a few hundred dollars each and settled 44 out of 492 OEIS conjectures. A simpler agent that keeps generating candidate proofs with the language model and checking them in Lean also succeeded on the Erdős set, though it cost more on the hardest cases. The same approach is already being tried inside active research programs in combinatorics, optimization, graph theory, algebraic geometry, and quantum optics. Readers should care because the method supplies an independent, machine-checkable record instead of relying only on human judgment for difficult claims.

Core claim

The central claim is that an AI agent can autonomously resolve open problems by producing Lean-verifiable formal proofs, as shown by the resolution of 9 Erdős problems and 44 OEIS conjectures, with the technique now applied across multiple branches of mathematics.

What carries the argument

An agent that alternates large-language-model generation of candidate Lean proofs with automated verification inside the Lean kernel.

If this is right

Formal verification can be inserted into the workflow of active mathematical research at modest per-problem cost.
The same agent design works across combinatorics, optimization, graph theory, algebraic geometry, and quantum optics.
A basic alternating generation-and-verification loop already reproduces the Erdős successes, though harder problems raise its cost.
Large-scale testing on hundreds of open statements is now feasible with current language-model capabilities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the cost stays low, smaller research groups could run their own formal-proof searches on local conjectures without large budgets.
The approach could be tried in other formal systems besides Lean to broaden the set of checkable problems.
Hybrid loops that let human mathematicians edit the agent's intermediate steps might solve still harder open questions.
Success on Erdős and OEIS lists suggests the method could be aimed next at problems that have resisted both human and computer attack for decades.

Load-bearing premise

The targeted Erdős problems and OEIS conjectures were still open before this work, and the Lean outputs are complete formal proofs produced without essential human intervention in the derivation steps.

What would settle it

An independent Lean expert confirming that one of the reported proofs is incomplete or that any of the nine Erdős problems had already been solved before the agent ran would disprove the claim of autonomous resolution.

Figures

Figures reproduced from arXiv: 2605.22763 by Adam Zsolt Wagner, Aja Huang, Andrew Ferrauiolo, Anja Surina, Anton Kovsharov, Arun Suggala, Codrut Grosu, Eric Wieser, Francisco J. R. Ruiz, George Tsoukalas, Gergely B\'erczi, Henryk Michalewski, Lei Yu, Matej Balog, Mikl\'os Z. Horv\'ath, Moritz Firsching, Pushmeet Kohli, Sergey Shirobokov, Swarat Chaudhuri, Thomas Hubert.

**Figure 1.** Figure 1: Pseudocode for ’s main components. The main loop creates a pool of asynchronous sketchers and raters and awaits the creation of a full (sorry-free) proof. Each prover subagent samples a parent sketch from the population using the P-UCB strategy and creates a stateful conversation session with an LLM (Gemini 3.1 Pro) instance. In this conversation, it receives instructions to perform tool calls from the LLM… view at source ↗

**Figure 4.** Figure 4: Sketcher agent prompt (condensed). Elided text is represented by [...]. Text in braces denotes template variables populated at runtime. For example, {code} is replaced by the current Lean file. 5 [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗

**Figure 3.** Figure 3: Rater agent prompt (condensed). Elided text is represented by [...]. Text in braces denotes template variables populated at runtime. For example, {player blocks} is replaced by the sketches to be compared. 4 [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 2.** Figure 2: Rater agent prompt (condensed). Elided text is represented by [...]. Text in braces denotes template variables populated at runtime. For example, {player blocks} is replaced by the sketches to be compared. 3 [PITH_FULL_IMAGE:figures/full_fig_p025_2.png] view at source ↗

read the original abstract

Large language models (LLMs) increasingly excel at mathematical reasoning, but their unreliability limits their utility in mathematics research. A mitigation is using LLMs to generate formal proofs in languages like Lean. We perform the first large-scale evaluation of this method's ability to solve open problems. Our most capable agent autonomously resolved 9 of 353 open Erd\H{o}s problems at the per-problem cost of a few hundred dollars, proved 44/492 OEIS conjectures, and is being deployed in combinatorics, optimization, graph theory, algebraic geometry, and quantum optics research. A basic agent alternating LLM-based generation with Lean-based verification replicated the Erd\H{o}s successes but proved costlier on the hardest problems. These findings demonstrate the power of AI-aided formal proof search and shed light on the agent designs that enable it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper scales LLM agents to hundreds of open Erdős problems and OEIS conjectures with Lean verification, claiming 9 solutions at low cost, but the autonomy claims need per-problem confirmation on openness and proof completeness.

read the letter

The paper's main contribution is running LLM agents at scale on open Erdős problems and OEIS conjectures, claiming to resolve 9 of 353 Erdős ones in Lean at low per-problem cost. This is new because earlier LLM formalization stayed smaller or used already-solved statements. They do a solid job with the experimental setup. Selecting hundreds of externally defined open problems and verifying in Lean keeps things non-circular and reproducible. The comparison of basic versus capable agents shows where design choices matter, and the note on deployments in combinatorics and other fields adds some real-world context. The math and data presentation looks straightforward without obvious fitting issues. The soft spots are around the headline results. We need independent checks that those 9 problems were truly open and that the Lean outputs are complete formal proofs rather than partial ones needing human fixes. The abstract does not provide those per-problem details or proof-size stats, so the autonomy and cost claims could be overstated if any cases fall short. No error bars or failure mode breakdown is another gap, though not central. The stress-test concern holds up based on what's in the abstract. This paper is for folks working on AI for theorem proving or using formal tools in math research. A reader focused on empirical evaluations of LLMs in hard domains would get value from the scale and the verification approach. It shows honest engagement with the literature on LLM limitations by emphasizing formal search. I would recommend sending it for peer review. The empirical scope is substantial enough to deserve referee time, even with the need for added evidence on the 9 cases.

Referee Report

2 major / 2 minor

Summary. The manuscript reports the first large-scale evaluation of LLM-based agents for generating formal proofs in Lean to solve open mathematical problems. The central claims are that the most capable agent autonomously resolved 9 of 353 open Erdős problems at a per-problem cost of a few hundred dollars, proved 44 of 492 OEIS conjectures, and is now deployed across combinatorics, optimization, graph theory, algebraic geometry, and quantum optics; a simpler alternating generation-verification agent is shown to replicate some successes but at higher cost on hard instances.

Significance. If the reported successes are independently confirmed, the work would demonstrate that current LLM-driven formal proof search can address genuinely open problems at modest cost, providing a concrete benchmark for AI-assisted mathematics. The use of Lean verification supplies a high standard of correctness that distinguishes this from informal LLM reasoning claims, and the multi-domain deployment suggests immediate research utility beyond the Erdős and OEIS benchmarks.

major comments (2)

[Abstract and results section] Abstract and results section: the headline claim that the agent 'autonomously resolved 9 of 353 open Erdős problems' is load-bearing for the autonomy and cost assertions, yet the manuscript supplies neither per-problem literature-search logs confirming the problems were previously unsolved nor proof-size statistics or artifact links demonstrating that the Lean outputs are complete, hole-free formal proofs rather than partial derivations requiring human post-editing. Without these, the success counts cannot be fully assessed.
[Methods and evaluation sections] Methods and evaluation sections: the reported success counts (9/353 Erdős, 44/492 OEIS) are presented without error bars, failure-mode breakdowns, or independent audit of the Lean artifacts. This omission directly affects the reliability of the comparative claim that the basic agent 'replicated the Erdős successes but proved costlier on the hardest problems.'

minor comments (2)

[§2] The description of agent architectures would benefit from a clearer tabular comparison of the 'most capable' versus 'basic' designs, including exact prompting strategies and Lean interaction loops.
[Figures and tables] Figure captions and table legends should explicitly state whether the reported costs include only inference or also human oversight time.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify the presentation of our results on LLM-based formal proof search. We address each major comment below and describe the revisions we will incorporate to improve transparency and verifiability.

read point-by-point responses

Referee: [Abstract and results section] Abstract and results section: the headline claim that the agent 'autonomously resolved 9 of 353 open Erdős problems' is load-bearing for the autonomy and cost assertions, yet the manuscript supplies neither per-problem literature-search logs confirming the problems were previously unsolved nor proof-size statistics or artifact links demonstrating that the Lean outputs are complete, hole-free formal proofs rather than partial derivations requiring human post-editing. Without these, the success counts cannot be fully assessed.

Authors: We agree that explicit documentation strengthens the autonomy and correctness claims. In the revised manuscript we will add a dedicated subsection (and supplementary table) that lists, for each of the nine solved Erdős problems, the pre-experiment literature references establishing that the problem remained open, together with links to the corresponding Lean proof files in a public repository. We will also report proof-size statistics (number of tactics, lines of code, and kernel-verification time) for every success. All reported proofs were produced and checked end-to-end by the agent with no human post-editing; the added artifacts will make this verifiable. revision: yes
Referee: [Methods and evaluation sections] Methods and evaluation sections: the reported success counts (9/353 Erdős, 44/492 OEIS) are presented without error bars, failure-mode breakdowns, or independent audit of the Lean artifacts. This omission directly affects the reliability of the comparative claim that the basic agent 'replicated the Erdős successes but proved costlier on the hardest problems.'

Authors: We concur that quantitative uncertainty measures and failure analysis improve interpretability. The revision will include binomial 95 % confidence intervals for both success rates and a breakdown of the dominant failure modes (timeout, tactic failure, search exhaustion) across the full set of attempts. We will also expand the agent-comparison section with per-problem cost figures for the hardest instances to support the cost-effectiveness claim. All Lean artifacts and run logs will be released publicly to enable independent audit; performing a third-party audit ourselves lies outside the scope of the present study. revision: partial

Circularity Check

0 steps flagged

No significant circularity; results are empirical evaluations on external benchmarks

full rationale

The paper reports counts of resolved open Erdős problems and OEIS conjectures obtained by running LLM-based agents and verifying outputs in Lean. These success metrics are generated by direct application to independently defined external problems rather than by fitting parameters to subsets of the target data or by self-referential definitions. No equations, ansatzes, or uniqueness theorems are invoked that reduce the headline claims to the inputs by construction. The methodology relies on external verification and open problem lists, making the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that Lean correctly certifies the generated proofs and that the chosen problem sets were open before the experiments; no new mathematical axioms or entities are introduced.

axioms (1)

domain assumption The Lean theorem prover provides sound verification of formal proofs
All reported successes depend on Lean accepting the generated statements as valid proofs.

pith-pipeline@v0.9.0 · 5761 in / 1216 out tokens · 42052 ms · 2026-05-22T05:05:27.994522+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · 13 internal anchors

[1]

Aristotle: IMO-level Automated Theorem Proving

TudorAchim, AlexBest, AlbertoBietti, KevinDer, MathïsFédérico, SergeiGukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, et al. Aristotle: IMO-level automated theorem proving.arXiv preprint arXiv:2510.01346, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Primitive sets and von mangoldt chains: Erdős problem #1196 and beyond, 2026

BorisAlexeev,KevinBarreto,YanyangLi,JaredDukerLichtman,LiamPrice,JibranIqbal Shah, Quanyu Tang, and Terence Tao. Primitive sets and von mangoldt chains: Erdős problem #1196 and beyond, 2026

work page 2026
[3]

Short proofs in combinatorics and number theory.arXiv preprint arXiv:2603.29961, 2026

Boris Alexeev, Moe Putterman, Mehtaab Sawhney, Mark Sellke, and Gregory Valiant. Short proofs in combinatorics and number theory.arXiv preprint arXiv:2603.29961, 2026

work page arXiv 2026
[4]

Short proofs in combinatorics, probability and number theory II

Boris Alexeev, Moe Putterman, Mehtaab Sawhney, Mark Sellke, and Gregory Valiant. Short proofs in combinatorics, probability and number theory ii.arXiv preprint arXiv:2604.06609, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

Pantograph: A machine-to-machine interaction interface for advanced theorem proving, high level reasoning, and data extraction in lean 4, 2025

LeniAniva,ChuyueSun,BrandoMiranda,ClarkBarrett,andSanmiKoyejo. Pantograph: A machine-to-machine interaction interface for advanced theorem proving, high level reasoning, and data extraction in lean 4, 2025

work page 2025
[6]

AXLE: Axiom lean engine, 2025

Axiom. AXLE: Axiom lean engine, 2025. Accessed: 2025

work page 2025
[7]

A note on p-sets

Stephan Baier. A note on p-sets. 2004

work page 2004
[8]

Erdosproblems.com

Thomas Bloom. Erdosproblems.com. https://www.erdosproblems.com, 2026

work page 2026
[9]

Migliore, Rosa M

Mats Boij, Juan C. Migliore, Rosa M. Miró-Roig, Uwe Nagel, and Fabrizio Zanello.On the Shape of a Pure O-Sequence, volume 218 ofMemoirs of the American Mathematical Society. American Mathematical Society, Providence, RI, 2012

work page 2012
[10]

J. A. Bondy and R. L. Hemminger. Graph reconstruction–a survey.Journal of Graph Theory, 1(3):227–268, 1977

work page 1977
[11]

The motivic class of the space of genus0maps to the flag variety.arXiv preprint arXiv:2601.07222, 2026

Jim Bryan, Balázs Elek, Freddie Manners, George Salafatinos, and Ravi Vakil. The motivic class of the space of genus0maps to the flag variety.arXiv preprint arXiv:2601.07222, 2026

work page arXiv 2026
[12]

S. A. Burr, P. Erdős, R. L. Graham, and W. Wen-Ching Li. Complete sequences of sets of integer powers.Acta Arithmetica, 77(2):133–138, 1996

work page 1996
[13]

Last-Iterate Convergence of Anchored Gradient Descent

Yang Cai and Weiqiang Zheng. Last-iterate convergence of anchored gradient descent. arXiv preprint arXiv:2604.12235, 2026. 11 Advancing Mathematics Research with AI-Driven Formal Proof Search

work page internal anchor Pith review Pith/arXiv arXiv 2026
[14]

Efficient Bayesian inference for generalized Bradley–Terry models.Journal of Computational and Graphical Statistics, 21(1):174– 196, 2012

François Caron and Arnaud Doucet. Efficient Bayesian inference for generalized Bradley–Terry models.Journal of Computational and Graphical Statistics, 21(1):174– 196, 2012

work page 2012
[15]

Seed-prover 1.5: Mastering undergraduate-level theorem proving via learning from experience.arXiv preprint arXiv:2512.17260, 2025

Jiangjie Chen, Wenxiang Chen, Jiacheng Du, Jinyi Hu, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Wenlei Shi, et al. Seed-prover 1.5: Mastering undergraduate-level theorem proving via learning from experience.arXiv preprint arXiv:2512.17260, 2025

work page arXiv 2025
[16]

Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model, 2024

DeepSeek-AI. Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model, 2024

work page 2024
[17]

On Erd\H{o}s and S\'ark\"ozy's sequences with Property P

Christian Elsholtz and Stefan Planitzer. On erdős and sárközy’s sequences with property p.arXiv preprint arXiv:1609.07935, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[18]

On the divisibility properties of sequences of integers

Paul Erdős and Alice Sárközi. On the divisibility properties of sequences of integers. Proceedings of The London Mathematical Society, pages 97–101, 1970

work page 1970
[19]

On conjectures of graffiti

Siemion Fajtlowicz. On conjectures of graffiti. In J. Akiyama, Y. Egawa, and H. Enomoto, editors,Graph Theory and Applications, volume 38 ofAnnals of Discrete Mathematics, pages 113–118. Elsevier, 1988

work page 1988
[20]

Aletheia tackles firstproof autonomously.arXiv preprint arXiv:2602.21201, 2026

Tony Feng, Junehyuk Jung, Sang-hyun Kim, Carlo Pagano, Sergei Gukov, Chiang- Chiang Tsai, David Woodruff, Adel Javanmard, Aryan Mokhtari, Dawsen Hwang, et al. Aletheia tackles firstproof autonomously.arXiv preprint arXiv:2602.21201, 2026

work page arXiv 2026
[21]

Trinh, Garrett Bingham, Dawsen Hwang, Yuri Chervonyi, Junehyuk Jung,JoonkyungLee,CarloPagano,SanghyunKim,FedericoPasqualotto,SergeiGukov, Jonathan N

Tony Feng, Trieu H. Trinh, Garrett Bingham, Dawsen Hwang, Yuri Chervonyi, Junehyuk Jung,JoonkyungLee,CarloPagano,SanghyunKim,FedericoPasqualotto,SergeiGukov, Jonathan N. Lee, Junsu Kim, Kaiying Hou, Golnaz Ghiasi, Yi Tay, YaGuang Li, Chenkai Kuang, Yuan Liu, Hanzhao Lin, Evan Zheran Liu, Nigamaa Nayakanti, Xiaomeng Yang, Heng-Tze Cheng, Demis Hassabis, Ko...

work page arXiv 2026
[22]

A strict separation between two notions of quadratically structured functions

Moritz Firsching and Bogdan Georgiev. A strict separation between two notions of quadratically structured functions. In preparation, 2026

work page 2026
[23]

Horváth, Yaël Dillies, Calle Sönne, Eric Wieser, Fred Zhang, Thomas Hubert, Blaise Agüera y Arcas, and Pushmeet Kohli

Moritz Firsching, Paul Lezeau, Salvatore Mercuri, Miklós Z. Horváth, Yaël Dillies, Calle Sönne, Eric Wieser, Fred Zhang, Thomas Hubert, Blaise Agüera y Arcas, and Pushmeet Kohli. Formal conjectures: An open and evolving benchmark for verified discovery in mathematics, 2026

work page 2026
[24]

Safeverify.https://github.com/GasStationManager/Saf eVerify, 2025

GasStationManager. Safeverify.https://github.com/GasStationManager/Saf eVerify, 2025. [Accessed: 2026-05-12]

work page 2025
[25]

Mathematical exploration and discovery at scale

Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, and Adam Zsolt Wagner. Mathe- matical exploration and discovery at scale.arXiv preprint arXiv:2511.02864, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[26]

Gemini 3.1 deep think, 2026

Google DeepMind. Gemini 3.1 deep think, 2026. Accessed: 2026-04-30

work page 2026
[27]

100 open problems

Ben Green. 100 open problems. https://people.maths.ox.ac.uk/greenbj/papers/open- problems.pdf, 2024. 12 Advancing Mathematics Research with AI-Driven Formal Proof Search

work page 2024
[28]

A Milestone in Formalization: The Sphere Packing Problem in Dimension 8

Sidharth Hariharan, Christopher Birkbeck, Seewoo Lee, Ho Kiu Gareth Ma, Bhavik Mehta, Auguste Poiroux, and Maryna Viazovska. A milestone in formalization: The sphere packing problem in dimension 8.arXiv preprint arXiv:2604.23468, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[29]

Olympiad-level formal mathematical reasoning with reinforcement learning.Nature, pages 1–3, 2025

Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z Horváth, Goran Žužić, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Masoom, Ottavia Bertolli, Tom Zahavy, Amol Mandhane, Jessica Yung, Iuliya Beloshapka, Borja Ibarz, Vivek Veeriah, Lei Yu, Oliver Nash, Paul Lezeau, Salvatore Mercuri, Calle Sönne, Bhavik Mehta, Alex Davies, ...

work page 2025
[30]

Learning and planning in complex action spaces

Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Mohammadamin Barekatain, Simon Schmitt, and David Silver. Learning and planning in complex action spaces. In International Conference on Machine Learning, 2021

work page 2021
[31]

software engineer

Geoffrey Huntley. Ralph wiggum as a "software engineer".https://ghuntley.com /ralph, 2025. Blog post

work page 2025
[32]

Equality in Fill's spectral gap problem

Vishesh Jain and Clayton Mizgerd. Equality in fill’s spectral gap problem.arXiv preprint arXiv:2604.03937, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[33]

Point convergence of nesterov’s accelerated gradient method: An ai-assisted proof.arXiv preprint arXiv:2510.23513, 2025

Uijeong Jang and Ernest K Ryu. Point convergence of nesterov’s accelerated gradient method: An ai-assisted proof.arXiv preprint arXiv:2510.23513, 2025

work page arXiv 2025
[34]

Draft, sketch, and prove: Guiding formal theorem provers with informal proofs.arXiv preprint arXiv:2210.12283, 2022

Albert Q Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, and Guillaume Lample. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs.arXiv preprint arXiv:2210.12283, 2022

work page arXiv 2022
[35]

Thor: Wielding hammers to integrate language models and automated theorem provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022

Albert Qiaochu Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, and Mateja Jamnik. Thor: Wielding hammers to integrate language models and automated theorem provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022

work page 2022
[36]

Paul J. Kelly. A congruence theorem for trees.Pacific Journal of Mathematics, 7(1):961– 968, 1957

work page 1957
[37]

John R. Koza. Genetic programming as a means for programming computers by natural selection.Statistics and Computing, 4(2):87–112, 1994

work page 1994
[38]

A Tensor-Algebraic No-Go Theorem for High-Dimensional Photonic GHZ States

Mario Krenn, Moritz Firsching, George Tsoukalas, Rishikesh Gajjala, Xuemei Gu, and Swarat Chaudhuri. A Tensor-Algebraic No-Go Theorem for High-Dimensional Photonic GHZ States. In preparation, 2026

work page 2026
[39]

Quantum experiments and graphs: Multiparty states as coherent superpositions of perfect matchings.Physical Review Letters, 119(24), December 2017

Mario Krenn, Xuemei Gu, and Anton Zeilinger. Quantum experiments and graphs: Multiparty states as coherent superpositions of perfect matchings.Physical Review Letters, 119(24), December 2017. 13 Advancing Mathematics Research with AI-Driven Formal Proof Search

work page 2017
[40]

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, et al. Goedel-prover-v2: Scaling formal theorem proving with scaffolded data synthesis and self-correction.arXiv preprint arXiv:2508.03613, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[41]

Luce.Individual Choice Behavior: A Theoretical Analysis

R.D. Luce.Individual Choice Behavior: A Theoretical Analysis. Wiley, 1959

work page 1959
[42]

Gauss: An agent for autoformalization, 2026

Math Inc. Gauss: An agent for autoformalization, 2026. Accessed: 2026

work page 2026
[43]

Thelean4theoremproverandprogramming language

LeonardodeMouraandSebastianUllrich. Thelean4theoremproverandprogramming language. InInternational Conference on Automated Deduction, pages 625–635. Springer, 2021

work page 2021
[44]

Reinforced generation of combinatorial structures: Hardness of approximation.arXiv preprint arXiv:2509.18057, 2025

Ansh Nagda, Prabhakar Raghavan, and Abhradeep Thakurta. Reinforced generation of combinatorial structures: Hardness of approximation.arXiv preprint arXiv:2509.18057, 2025

work page arXiv 2025
[45]

Reinforced Generation of Combinatorial Structures: Ramsey Numbers

Ansh Nagda, Prabhakar Raghavan, and Abhradeep Thakurta. Reinforced generation of combinatorial structures: Ramsey numbers.arXiv preprint arXiv:2603.09172, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[46]

Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. Alphaevolve: A coding agent for scientific and algor...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[47]

R. L. Plackett. The analysis of permutations.Journal of the Royal Statistical Society. Series C (Applied Statistics), 24(2):193–202, 1975

work page 1975
[48]

Generative language modeling for automated theorem proving

StanislasPoluandIlyaSutskever. Generativelanguagemodelingforautomatedtheorem proving.arXiv preprint arXiv:2009.03393, 2020

work page arXiv 2009
[49]

On infinite sets with no3on a line.arXiv preprint arXiv:2602.21275, 2026

Moe Putterman, Mehtaab Sawhney, and Gregory Valiant. On infinite sets with no3on a line.arXiv preprint arXiv:2602.21275, 2026

work page arXiv 2026
[50]

Colouring versus density in integers and hales–jewett cubes.Journal of the London Mathematical Society, 110(5):e12987, 2024

Christian Reiher, Vojtěch Rödl, and Marcelo Sales. Colouring versus density in integers and hales–jewett cubes.Journal of the London Mathematical Society, 110(5):e12987, 2024

work page 2024
[51]

Pawan Kumar, Emilien Dupont, Francisco J

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli, and Alhussein Fawzi. Mathematical discoveries from program search with large language models.Nature, 625(7995):468– 475, 2024

work page 2024
[52]

Ryu, Kun Yuan, and Wotao Yin

Ernest K. Ryu, Kun Yuan, and Wotao Yin. Ode analysis of stochastic gradient methods with optimism and anchoring for minimax problems.arXiv preprint arXiv:1905.10899, 2019. 14 Advancing Mathematics Research with AI-Driven Formal Proof Search

work page arXiv 1905
[53]

Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with ai.arXiv preprint arXiv:2512.14575, 2025

Johannes Schmitt. Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with ai.arXiv preprint arXiv:2512.14575, 2025

work page arXiv 2025
[54]

On a problem of erdős and sárközy.Journal of Combinatorial Theory, Series A, 94(1):191–195, 2001

Tomasz Schoen. On a problem of erdős and sárközy.Journal of Combinatorial Theory, Series A, 94(1):191–195, 2001

work page 2001
[55]

A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, 2018

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, 2018

work page 2018
[56]

Neil J. Sloane. The on-line encyclopedia of integer sequences. InProceedings of the 14th Symposium on Towards Mechanized Mathematical Assistants: 6th International Conference, Calculemus ’07 / MKM ’07, page 130, Berlin, Heidelberg, 2007. Springer- Verlag

work page 2007
[57]

Richard P. Stanley. Hilbert functions of graded algebras.Advances in Mathematics, 28(1):57–83, 1978

work page 1978
[58]

An Improved Last-Iterate Convergence Rate for Anchored Gradient Descent Ascent

Anja Surina, Arun Suggala, George Tsoukalas, Anton Kovsharov, Sergey Shirobokov, Francisco JR Ruiz, Pushmeet Kohli, and Swarat Chaudhuri. An improved last-iterate convergencerateforanchoredgradientdescentascent.arXivpreprintarXiv:2604.03782, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[59]

Ai contributions to Erdős problems.https://github .com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-p roblems, 2026

Terence Tao and contributors. Ai contributions to Erdős problems.https://github .com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-p roblems, 2026. Accessed: 2026-04-23

work page 2026
[60]

The Lean Mathematical Library

The Mathlib Community. The Lean Mathematical Library. InProceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2020, New Orleans, LA, USA, 2020. ACM

work page 2020
[61]

Stanislaw M. Ulam. A collection of mathematical problems.New York and London: Interscience Publishers, 1960

work page 1960
[62]

Ulam.A Collection of Mathematical Problems, volume 8 ofInterscience Tracts in Pure and Applied Mathematics

Stanislaw M. Ulam.A Collection of Mathematical Problems, volume 8 ofInterscience Tracts in Pure and Applied Mathematics. Interscience Publishers, New York, 1960

work page 1960
[63]

Hilbert: Recursively building formal proofs with informal reasoning.arXiv preprint arXiv:2509.22819, 2025

Sumanth Varambally, Thomas Voice, Yanchao Sun, Zhifeng Chen, Rose Yu, and Ke Ye. Hilbert: Recursively building formal proofs with informal reasoning.arXiv preprint arXiv:2509.22819, 2025

work page arXiv 2025
[64]

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Haiming Wang, Mert Unsal, Xiaohan Lin, Mantas Baksys, Junqi Liu, Marco Dos Santos, Flood Sung, Marina Vinyes, Zhenzhe Ying, Zekai Zhu, et al. Kimina-prover preview: Towards large formal reasoning models with reinforcement learning.arXiv preprint arXiv:2504.11354, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[65]

15 Advancing Mathematics Research with AI-Driven Formal Proof Search Accelerating scientific research with gemini: Case studies and common techniques

David P Woodruff, Vincent Cohen-Addad, Lalit Jain, Jieming Mao, Song Zuo, Moham- madHossein Bateni, Simina Branzei, Michael P Brenner, Lin Chen, Ying Feng, et al. 15 Advancing Mathematics Research with AI-Driven Formal Proof Search Accelerating scientific research with gemini: Case studies and common techniques. arXiv preprint arXiv:2602.03837, 2026

work page arXiv 2026
[66]

Formal mathematical reasoning: A new frontier in ai.arXiv preprint 2412.16075, 2024

Kaiyu Yang, Gabriel Poesia, Jingxuan He, Wenda Li, Kristin Lauter, Swarat Chaudhuri, and Dawn Song. Formal mathematical reasoning: A new frontier in ai.arXiv preprint 2412.16075, 2024

work page arXiv 2024
[67]

Leandojo: Theorem proving with retrieval-augmented language models.Advances in Neural Information Processing Systems, 36:21573–21612, 2023

Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan J Prenger, and Animashree Anandkumar. Leandojo: Theorem proving with retrieval-augmented language models.Advances in Neural Information Processing Systems, 36:21573–21612, 2023

work page 2023
[68]

Log-concavity of level Hilbert functions and pure o-sequences.Journal of Commutative Algebra, 16(2):245–256, 2024

Fabrizio Zanello. Log-concavity of level Hilbert functions and pure o-sequences.Journal of Commutative Algebra, 16(2):245–256, 2024

work page 2024
[69]

Daniel Zheng, Ingrid von Glehn, Yori Zwols, Iuliya Beloshapka, Lars Buesing, Daniel M. Roy, Martin Wattenberg, Bogdan Georgiev, Tatiana Schmidt, Andrew Cowie, Fernanda Viegas, Dimitri Kanevsky, Vineet Kahlon, Hartmut Maennel, Sophia Alj, George Holland, Alex Davies, and Pushmeet Kohli. AI Co-Mathematician: Accelerating mathematicians with agentic AI.arXiv...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[70]

Population Database and Matchmaking

Database sampling:The controller selects a root proof sketch𝑆root by sampling from the database, along with𝑀= 2auxiliary inspiration sketches 𝑆insp. The selection strat- egy balances exploitation of high-rated sketches with exploration of diverse candidates (see “Population Database and Matchmaking”)

work page
[71]

decompose unsolved goals,

Prompt construction:A prompt X is assembled to guide the LLM. It integrates the formal problem specification, the Lean source code and natural language plan of𝑆root, and structured feedback derived from AlphaProof’s previous attempts on{𝑆insp}. As in AlphaEvolve, the controller encourages diversity by stochastically injecting instructions such as “decompo...

work page
[72]

To scale to large Lean files, the subagent outputs mutations via asearch_replace tool in a compact diff format rather than rewriting the entire file

Prover subagent:The assembled prompt X is dispatched to the LLM (Gemini 3.1 Pro), initiating a multi-turn episode. To scale to large Lean files, the subagent outputs mutations via asearch_replace tool in a compact diff format rather than rewriting the entire file. The subagent can also query AlphaProof to test specific subgoals mid- episode; the feedback ...

work page
[73]

Global GoalCaching

Validation:Once the candidate sketch𝑆′ passes the sandbox check, it undergoes formal validation. The system extracts all remainingsorry subgoals and cross-references them against a global goal cache using a deep hash of their exact Lean state (see “Global GoalCaching”). Ifa subgoal waspreviously resolved, theproof isretrieved immediately; otherwise, it is...

work page
[74]

matches

Database registration:The candidate 𝑆′, along with per-subgoal feedback from Al- phaProof, is registered in the database. Its fitness is then determined asynchronously via Elo matchmaking. AlphaProof has a Test-Time Reinforcement Learning (TTRL) mode in which it learns to solve a problem by solving its AI-generated variants at inference time; however, we ...

work page 2026
[75]

4𝑁1(𝑋) +𝑁 3(𝑋) ≤ 3|𝑋| + 2𝑁2(𝑋), which follows from writing𝑁𝑘(𝑥)= Í 𝑥∈ℤ 1𝑋 (𝑥) 1𝑋 (𝑥+ 𝑘) and a pointwise check of the indicator function1𝑋 across all4-point windows (𝑥, 𝑥+ 1, 𝑥+ 2, 𝑥+ 3); that is, summing the local inequality1𝑋 (𝑥) + 1𝑋 (𝑥+ 1) + 1𝑋 (𝑥+

work page
[76]

+ 1𝑋 (𝑥) 1𝑋 (𝑥+ 2) + 1𝑋 (𝑥+ 1)1𝑋 (𝑥+ 3) ≥ 1𝑋 (𝑥) 1𝑋 (𝑥+ 1) + 21𝑋 (𝑥+ 1)1𝑋 (𝑥+ 2) + 1𝑋 (𝑥+2)1 𝑋 (𝑥+3) +1 𝑋 (𝑥)1 𝑋 (𝑥+3)and then summing over all𝑥∈ℤ

work page
[77]

Consider each pair(𝑥, 𝑥+ 2) ∈𝑋 2 and we proceed by cases on𝑥+ 1

2𝑁2(𝑋) ≤𝑁 3(𝑋) + 2𝑉2(𝑋) + 2𝐼(𝑋) . Consider each pair(𝑥, 𝑥+ 2) ∈𝑋 2 and we proceed by cases on𝑥+ 1. Let 𝐺={𝑥∈𝑋|𝑥+ 1 ∉𝑋∧𝑥+ 2 ∈𝑋} . If 𝑥+ 1 ∈𝑋 , then the pair is counted exactly by𝑉2, so𝑁 2(𝑋)=𝑉 2(𝑋) + |𝐺|. If𝑥+1∉𝑋, then •If𝑥−1∈𝑋, then(𝑥−1, 𝑥+2)is a pair counted in𝑁 3(𝑋). •If𝑥+3∈𝑋, then(𝑥, 𝑥+3)is a pair counted in𝑁 3(𝑋). 41 Advancing Mathematics Research wit...

work page
[78]

For any 𝛿∉{ 0,−𝑘} , the Sidon property ensures that there is at most one pair(𝑎, 𝑐) with 𝑎−𝑐=𝛿 and at most one pair(𝑑, 𝑏) with 𝑑−𝑏=𝛿+𝑘

The cases𝛿= 0and 𝛿=−𝑘 each contribute≤ |𝐴| (one coordinate determines the rest). For any 𝛿∉{ 0,−𝑘} , the Sidon property ensures that there is at most one pair(𝑎, 𝑐) with 𝑎−𝑐=𝛿 and at most one pair(𝑑, 𝑏) with 𝑑−𝑏=𝛿+𝑘 . Consequently, each gap of size𝑘in𝐷identifies exactly one quadruple. Thus|quad 𝑘| ≤𝑁 𝑘(𝐷) +2𝑛

work page
[79]

Combining these bounds shows that𝑁𝑘(𝐷) and 𝑁𝑘(𝑆) are related up to𝑂(𝑛) error

Each "good" element 𝑠∈𝑆 with 𝑠+𝑘∈𝑆 (excluding ≤ 2𝑛 doubles of the form2𝑎) produces≥4quadruples, giving4𝑁 𝑘(𝑆) ≤ |quad 𝑘| +8𝑛. Combining these bounds shows that𝑁𝑘(𝐷) and 𝑁𝑘(𝑆) are related up to𝑂(𝑛) error. Apply- ing fact (2) to𝐷=𝐴−𝐴gives 4𝑁1(𝐷) +𝑁 3(𝐷) ≤3|𝐷| +2𝑁 2(𝐷) and then substitute the above two quadruple transfer bounds for𝑘= 1, 2, 3and collecting al...

work page 2002
[80]

Í 𝑅𝑛,𝑘 ≤2𝑛2 𝑛/2 =𝑜 2𝑛+1 𝑛6 . 44 Advancing Mathematics Research with AI-Driven Formal Proof Search Since all residual terms are bounded by𝑜 2𝑛+1 𝑛6 , we conclude that 𝑎𝑛 =𝑓 𝑛 +𝑜 2𝑛+1 𝑛6 = 2𝑛+1 𝑛 1+ 1 𝑛 + 3 𝑛2 + 13 𝑛3 + 75 𝑛4 + 541 𝑛5 +𝑜 1 𝑛5 This establishes the conjectured asymptotic expansion.□ Theorem.(A conjecture of OEIS A228143, 2018) Let𝑠𝑚 = Í𝑚 𝑘=0 ...

work page 2018

Showing first 80 references.

[1] [1]

Aristotle: IMO-level Automated Theorem Proving

TudorAchim, AlexBest, AlbertoBietti, KevinDer, MathïsFédérico, SergeiGukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, et al. Aristotle: IMO-level automated theorem proving.arXiv preprint arXiv:2510.01346, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Primitive sets and von mangoldt chains: Erdős problem #1196 and beyond, 2026

BorisAlexeev,KevinBarreto,YanyangLi,JaredDukerLichtman,LiamPrice,JibranIqbal Shah, Quanyu Tang, and Terence Tao. Primitive sets and von mangoldt chains: Erdős problem #1196 and beyond, 2026

work page 2026

[3] [3]

Short proofs in combinatorics and number theory.arXiv preprint arXiv:2603.29961, 2026

Boris Alexeev, Moe Putterman, Mehtaab Sawhney, Mark Sellke, and Gregory Valiant. Short proofs in combinatorics and number theory.arXiv preprint arXiv:2603.29961, 2026

work page arXiv 2026

[4] [4]

Short proofs in combinatorics, probability and number theory II

Boris Alexeev, Moe Putterman, Mehtaab Sawhney, Mark Sellke, and Gregory Valiant. Short proofs in combinatorics, probability and number theory ii.arXiv preprint arXiv:2604.06609, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[5] [5]

Pantograph: A machine-to-machine interaction interface for advanced theorem proving, high level reasoning, and data extraction in lean 4, 2025

LeniAniva,ChuyueSun,BrandoMiranda,ClarkBarrett,andSanmiKoyejo. Pantograph: A machine-to-machine interaction interface for advanced theorem proving, high level reasoning, and data extraction in lean 4, 2025

work page 2025

[6] [6]

AXLE: Axiom lean engine, 2025

Axiom. AXLE: Axiom lean engine, 2025. Accessed: 2025

work page 2025

[7] [7]

A note on p-sets

Stephan Baier. A note on p-sets. 2004

work page 2004

[8] [8]

Erdosproblems.com

Thomas Bloom. Erdosproblems.com. https://www.erdosproblems.com, 2026

work page 2026

[9] [9]

Migliore, Rosa M

Mats Boij, Juan C. Migliore, Rosa M. Miró-Roig, Uwe Nagel, and Fabrizio Zanello.On the Shape of a Pure O-Sequence, volume 218 ofMemoirs of the American Mathematical Society. American Mathematical Society, Providence, RI, 2012

work page 2012

[10] [10]

J. A. Bondy and R. L. Hemminger. Graph reconstruction–a survey.Journal of Graph Theory, 1(3):227–268, 1977

work page 1977

[11] [11]

The motivic class of the space of genus0maps to the flag variety.arXiv preprint arXiv:2601.07222, 2026

Jim Bryan, Balázs Elek, Freddie Manners, George Salafatinos, and Ravi Vakil. The motivic class of the space of genus0maps to the flag variety.arXiv preprint arXiv:2601.07222, 2026

work page arXiv 2026

[12] [12]

S. A. Burr, P. Erdős, R. L. Graham, and W. Wen-Ching Li. Complete sequences of sets of integer powers.Acta Arithmetica, 77(2):133–138, 1996

work page 1996

[13] [13]

Last-Iterate Convergence of Anchored Gradient Descent

Yang Cai and Weiqiang Zheng. Last-iterate convergence of anchored gradient descent. arXiv preprint arXiv:2604.12235, 2026. 11 Advancing Mathematics Research with AI-Driven Formal Proof Search

work page internal anchor Pith review Pith/arXiv arXiv 2026

[14] [14]

Efficient Bayesian inference for generalized Bradley–Terry models.Journal of Computational and Graphical Statistics, 21(1):174– 196, 2012

François Caron and Arnaud Doucet. Efficient Bayesian inference for generalized Bradley–Terry models.Journal of Computational and Graphical Statistics, 21(1):174– 196, 2012

work page 2012

[15] [15]

Seed-prover 1.5: Mastering undergraduate-level theorem proving via learning from experience.arXiv preprint arXiv:2512.17260, 2025

Jiangjie Chen, Wenxiang Chen, Jiacheng Du, Jinyi Hu, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Wenlei Shi, et al. Seed-prover 1.5: Mastering undergraduate-level theorem proving via learning from experience.arXiv preprint arXiv:2512.17260, 2025

work page arXiv 2025

[16] [16]

Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model, 2024

DeepSeek-AI. Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model, 2024

work page 2024

[17] [17]

On Erd\H{o}s and S\'ark\"ozy's sequences with Property P

Christian Elsholtz and Stefan Planitzer. On erdős and sárközy’s sequences with property p.arXiv preprint arXiv:1609.07935, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[18] [18]

On the divisibility properties of sequences of integers

Paul Erdős and Alice Sárközi. On the divisibility properties of sequences of integers. Proceedings of The London Mathematical Society, pages 97–101, 1970

work page 1970

[19] [19]

On conjectures of graffiti

Siemion Fajtlowicz. On conjectures of graffiti. In J. Akiyama, Y. Egawa, and H. Enomoto, editors,Graph Theory and Applications, volume 38 ofAnnals of Discrete Mathematics, pages 113–118. Elsevier, 1988

work page 1988

[20] [20]

Aletheia tackles firstproof autonomously.arXiv preprint arXiv:2602.21201, 2026

Tony Feng, Junehyuk Jung, Sang-hyun Kim, Carlo Pagano, Sergei Gukov, Chiang- Chiang Tsai, David Woodruff, Adel Javanmard, Aryan Mokhtari, Dawsen Hwang, et al. Aletheia tackles firstproof autonomously.arXiv preprint arXiv:2602.21201, 2026

work page arXiv 2026

[21] [21]

Trinh, Garrett Bingham, Dawsen Hwang, Yuri Chervonyi, Junehyuk Jung,JoonkyungLee,CarloPagano,SanghyunKim,FedericoPasqualotto,SergeiGukov, Jonathan N

Tony Feng, Trieu H. Trinh, Garrett Bingham, Dawsen Hwang, Yuri Chervonyi, Junehyuk Jung,JoonkyungLee,CarloPagano,SanghyunKim,FedericoPasqualotto,SergeiGukov, Jonathan N. Lee, Junsu Kim, Kaiying Hou, Golnaz Ghiasi, Yi Tay, YaGuang Li, Chenkai Kuang, Yuan Liu, Hanzhao Lin, Evan Zheran Liu, Nigamaa Nayakanti, Xiaomeng Yang, Heng-Tze Cheng, Demis Hassabis, Ko...

work page arXiv 2026

[22] [22]

A strict separation between two notions of quadratically structured functions

Moritz Firsching and Bogdan Georgiev. A strict separation between two notions of quadratically structured functions. In preparation, 2026

work page 2026

[23] [23]

Horváth, Yaël Dillies, Calle Sönne, Eric Wieser, Fred Zhang, Thomas Hubert, Blaise Agüera y Arcas, and Pushmeet Kohli

Moritz Firsching, Paul Lezeau, Salvatore Mercuri, Miklós Z. Horváth, Yaël Dillies, Calle Sönne, Eric Wieser, Fred Zhang, Thomas Hubert, Blaise Agüera y Arcas, and Pushmeet Kohli. Formal conjectures: An open and evolving benchmark for verified discovery in mathematics, 2026

work page 2026

[24] [24]

Safeverify.https://github.com/GasStationManager/Saf eVerify, 2025

GasStationManager. Safeverify.https://github.com/GasStationManager/Saf eVerify, 2025. [Accessed: 2026-05-12]

work page 2025

[25] [25]

Mathematical exploration and discovery at scale

Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, and Adam Zsolt Wagner. Mathe- matical exploration and discovery at scale.arXiv preprint arXiv:2511.02864, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[26] [26]

Gemini 3.1 deep think, 2026

Google DeepMind. Gemini 3.1 deep think, 2026. Accessed: 2026-04-30

work page 2026

[27] [27]

100 open problems

Ben Green. 100 open problems. https://people.maths.ox.ac.uk/greenbj/papers/open- problems.pdf, 2024. 12 Advancing Mathematics Research with AI-Driven Formal Proof Search

work page 2024

[28] [28]

A Milestone in Formalization: The Sphere Packing Problem in Dimension 8

Sidharth Hariharan, Christopher Birkbeck, Seewoo Lee, Ho Kiu Gareth Ma, Bhavik Mehta, Auguste Poiroux, and Maryna Viazovska. A milestone in formalization: The sphere packing problem in dimension 8.arXiv preprint arXiv:2604.23468, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[29] [29]

Olympiad-level formal mathematical reasoning with reinforcement learning.Nature, pages 1–3, 2025

Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z Horváth, Goran Žužić, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Masoom, Ottavia Bertolli, Tom Zahavy, Amol Mandhane, Jessica Yung, Iuliya Beloshapka, Borja Ibarz, Vivek Veeriah, Lei Yu, Oliver Nash, Paul Lezeau, Salvatore Mercuri, Calle Sönne, Bhavik Mehta, Alex Davies, ...

work page 2025

[30] [30]

Learning and planning in complex action spaces

Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Mohammadamin Barekatain, Simon Schmitt, and David Silver. Learning and planning in complex action spaces. In International Conference on Machine Learning, 2021

work page 2021

[31] [31]

software engineer

Geoffrey Huntley. Ralph wiggum as a "software engineer".https://ghuntley.com /ralph, 2025. Blog post

work page 2025

[32] [32]

Equality in Fill's spectral gap problem

Vishesh Jain and Clayton Mizgerd. Equality in fill’s spectral gap problem.arXiv preprint arXiv:2604.03937, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[33] [33]

Point convergence of nesterov’s accelerated gradient method: An ai-assisted proof.arXiv preprint arXiv:2510.23513, 2025

Uijeong Jang and Ernest K Ryu. Point convergence of nesterov’s accelerated gradient method: An ai-assisted proof.arXiv preprint arXiv:2510.23513, 2025

work page arXiv 2025

[34] [34]

Draft, sketch, and prove: Guiding formal theorem provers with informal proofs.arXiv preprint arXiv:2210.12283, 2022

Albert Q Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, and Guillaume Lample. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs.arXiv preprint arXiv:2210.12283, 2022

work page arXiv 2022

[35] [35]

Thor: Wielding hammers to integrate language models and automated theorem provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022

Albert Qiaochu Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, and Mateja Jamnik. Thor: Wielding hammers to integrate language models and automated theorem provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022

work page 2022

[36] [36]

Paul J. Kelly. A congruence theorem for trees.Pacific Journal of Mathematics, 7(1):961– 968, 1957

work page 1957

[37] [37]

John R. Koza. Genetic programming as a means for programming computers by natural selection.Statistics and Computing, 4(2):87–112, 1994

work page 1994

[38] [38]

A Tensor-Algebraic No-Go Theorem for High-Dimensional Photonic GHZ States

Mario Krenn, Moritz Firsching, George Tsoukalas, Rishikesh Gajjala, Xuemei Gu, and Swarat Chaudhuri. A Tensor-Algebraic No-Go Theorem for High-Dimensional Photonic GHZ States. In preparation, 2026

work page 2026

[39] [39]

Quantum experiments and graphs: Multiparty states as coherent superpositions of perfect matchings.Physical Review Letters, 119(24), December 2017

Mario Krenn, Xuemei Gu, and Anton Zeilinger. Quantum experiments and graphs: Multiparty states as coherent superpositions of perfect matchings.Physical Review Letters, 119(24), December 2017. 13 Advancing Mathematics Research with AI-Driven Formal Proof Search

work page 2017

[40] [40]

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, et al. Goedel-prover-v2: Scaling formal theorem proving with scaffolded data synthesis and self-correction.arXiv preprint arXiv:2508.03613, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[41] [41]

Luce.Individual Choice Behavior: A Theoretical Analysis

R.D. Luce.Individual Choice Behavior: A Theoretical Analysis. Wiley, 1959

work page 1959

[42] [42]

Gauss: An agent for autoformalization, 2026

Math Inc. Gauss: An agent for autoformalization, 2026. Accessed: 2026

work page 2026

[43] [43]

Thelean4theoremproverandprogramming language

LeonardodeMouraandSebastianUllrich. Thelean4theoremproverandprogramming language. InInternational Conference on Automated Deduction, pages 625–635. Springer, 2021

work page 2021

[44] [44]

Reinforced generation of combinatorial structures: Hardness of approximation.arXiv preprint arXiv:2509.18057, 2025

Ansh Nagda, Prabhakar Raghavan, and Abhradeep Thakurta. Reinforced generation of combinatorial structures: Hardness of approximation.arXiv preprint arXiv:2509.18057, 2025

work page arXiv 2025

[45] [45]

Reinforced Generation of Combinatorial Structures: Ramsey Numbers

Ansh Nagda, Prabhakar Raghavan, and Abhradeep Thakurta. Reinforced generation of combinatorial structures: Ramsey numbers.arXiv preprint arXiv:2603.09172, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[46] [46]

Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. Alphaevolve: A coding agent for scientific and algor...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[47] [47]

R. L. Plackett. The analysis of permutations.Journal of the Royal Statistical Society. Series C (Applied Statistics), 24(2):193–202, 1975

work page 1975

[48] [48]

Generative language modeling for automated theorem proving

StanislasPoluandIlyaSutskever. Generativelanguagemodelingforautomatedtheorem proving.arXiv preprint arXiv:2009.03393, 2020

work page arXiv 2009

[49] [49]

On infinite sets with no3on a line.arXiv preprint arXiv:2602.21275, 2026

Moe Putterman, Mehtaab Sawhney, and Gregory Valiant. On infinite sets with no3on a line.arXiv preprint arXiv:2602.21275, 2026

work page arXiv 2026

[50] [50]

Colouring versus density in integers and hales–jewett cubes.Journal of the London Mathematical Society, 110(5):e12987, 2024

Christian Reiher, Vojtěch Rödl, and Marcelo Sales. Colouring versus density in integers and hales–jewett cubes.Journal of the London Mathematical Society, 110(5):e12987, 2024

work page 2024

[51] [51]

Pawan Kumar, Emilien Dupont, Francisco J

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli, and Alhussein Fawzi. Mathematical discoveries from program search with large language models.Nature, 625(7995):468– 475, 2024

work page 2024

[52] [52]

Ryu, Kun Yuan, and Wotao Yin

Ernest K. Ryu, Kun Yuan, and Wotao Yin. Ode analysis of stochastic gradient methods with optimism and anchoring for minimax problems.arXiv preprint arXiv:1905.10899, 2019. 14 Advancing Mathematics Research with AI-Driven Formal Proof Search

work page arXiv 1905

[53] [53]

Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with ai.arXiv preprint arXiv:2512.14575, 2025

Johannes Schmitt. Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with ai.arXiv preprint arXiv:2512.14575, 2025

work page arXiv 2025

[54] [54]

On a problem of erdős and sárközy.Journal of Combinatorial Theory, Series A, 94(1):191–195, 2001

Tomasz Schoen. On a problem of erdős and sárközy.Journal of Combinatorial Theory, Series A, 94(1):191–195, 2001

work page 2001

[55] [55]

A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, 2018

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, 2018

work page 2018

[56] [56]

Neil J. Sloane. The on-line encyclopedia of integer sequences. InProceedings of the 14th Symposium on Towards Mechanized Mathematical Assistants: 6th International Conference, Calculemus ’07 / MKM ’07, page 130, Berlin, Heidelberg, 2007. Springer- Verlag

work page 2007

[57] [57]

Richard P. Stanley. Hilbert functions of graded algebras.Advances in Mathematics, 28(1):57–83, 1978

work page 1978

[58] [58]

An Improved Last-Iterate Convergence Rate for Anchored Gradient Descent Ascent

Anja Surina, Arun Suggala, George Tsoukalas, Anton Kovsharov, Sergey Shirobokov, Francisco JR Ruiz, Pushmeet Kohli, and Swarat Chaudhuri. An improved last-iterate convergencerateforanchoredgradientdescentascent.arXivpreprintarXiv:2604.03782, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[59] [59]

Ai contributions to Erdős problems.https://github .com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-p roblems, 2026

Terence Tao and contributors. Ai contributions to Erdős problems.https://github .com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-p roblems, 2026. Accessed: 2026-04-23

work page 2026

[60] [60]

The Lean Mathematical Library

The Mathlib Community. The Lean Mathematical Library. InProceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2020, New Orleans, LA, USA, 2020. ACM

work page 2020

[61] [61]

Stanislaw M. Ulam. A collection of mathematical problems.New York and London: Interscience Publishers, 1960

work page 1960

[62] [62]

Ulam.A Collection of Mathematical Problems, volume 8 ofInterscience Tracts in Pure and Applied Mathematics

Stanislaw M. Ulam.A Collection of Mathematical Problems, volume 8 ofInterscience Tracts in Pure and Applied Mathematics. Interscience Publishers, New York, 1960

work page 1960

[63] [63]

Hilbert: Recursively building formal proofs with informal reasoning.arXiv preprint arXiv:2509.22819, 2025

Sumanth Varambally, Thomas Voice, Yanchao Sun, Zhifeng Chen, Rose Yu, and Ke Ye. Hilbert: Recursively building formal proofs with informal reasoning.arXiv preprint arXiv:2509.22819, 2025

work page arXiv 2025

[64] [64]

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Haiming Wang, Mert Unsal, Xiaohan Lin, Mantas Baksys, Junqi Liu, Marco Dos Santos, Flood Sung, Marina Vinyes, Zhenzhe Ying, Zekai Zhu, et al. Kimina-prover preview: Towards large formal reasoning models with reinforcement learning.arXiv preprint arXiv:2504.11354, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[65] [65]

15 Advancing Mathematics Research with AI-Driven Formal Proof Search Accelerating scientific research with gemini: Case studies and common techniques

David P Woodruff, Vincent Cohen-Addad, Lalit Jain, Jieming Mao, Song Zuo, Moham- madHossein Bateni, Simina Branzei, Michael P Brenner, Lin Chen, Ying Feng, et al. 15 Advancing Mathematics Research with AI-Driven Formal Proof Search Accelerating scientific research with gemini: Case studies and common techniques. arXiv preprint arXiv:2602.03837, 2026

work page arXiv 2026

[66] [66]

Formal mathematical reasoning: A new frontier in ai.arXiv preprint 2412.16075, 2024

Kaiyu Yang, Gabriel Poesia, Jingxuan He, Wenda Li, Kristin Lauter, Swarat Chaudhuri, and Dawn Song. Formal mathematical reasoning: A new frontier in ai.arXiv preprint 2412.16075, 2024

work page arXiv 2024

[67] [67]

Leandojo: Theorem proving with retrieval-augmented language models.Advances in Neural Information Processing Systems, 36:21573–21612, 2023

Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan J Prenger, and Animashree Anandkumar. Leandojo: Theorem proving with retrieval-augmented language models.Advances in Neural Information Processing Systems, 36:21573–21612, 2023

work page 2023

[68] [68]

Log-concavity of level Hilbert functions and pure o-sequences.Journal of Commutative Algebra, 16(2):245–256, 2024

Fabrizio Zanello. Log-concavity of level Hilbert functions and pure o-sequences.Journal of Commutative Algebra, 16(2):245–256, 2024

work page 2024

[69] [69]

Daniel Zheng, Ingrid von Glehn, Yori Zwols, Iuliya Beloshapka, Lars Buesing, Daniel M. Roy, Martin Wattenberg, Bogdan Georgiev, Tatiana Schmidt, Andrew Cowie, Fernanda Viegas, Dimitri Kanevsky, Vineet Kahlon, Hartmut Maennel, Sophia Alj, George Holland, Alex Davies, and Pushmeet Kohli. AI Co-Mathematician: Accelerating mathematicians with agentic AI.arXiv...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[70] [70]

Population Database and Matchmaking

Database sampling:The controller selects a root proof sketch𝑆root by sampling from the database, along with𝑀= 2auxiliary inspiration sketches 𝑆insp. The selection strat- egy balances exploitation of high-rated sketches with exploration of diverse candidates (see “Population Database and Matchmaking”)

work page

[71] [71]

decompose unsolved goals,

Prompt construction:A prompt X is assembled to guide the LLM. It integrates the formal problem specification, the Lean source code and natural language plan of𝑆root, and structured feedback derived from AlphaProof’s previous attempts on{𝑆insp}. As in AlphaEvolve, the controller encourages diversity by stochastically injecting instructions such as “decompo...

work page

[72] [72]

To scale to large Lean files, the subagent outputs mutations via asearch_replace tool in a compact diff format rather than rewriting the entire file

Prover subagent:The assembled prompt X is dispatched to the LLM (Gemini 3.1 Pro), initiating a multi-turn episode. To scale to large Lean files, the subagent outputs mutations via asearch_replace tool in a compact diff format rather than rewriting the entire file. The subagent can also query AlphaProof to test specific subgoals mid- episode; the feedback ...

work page

[73] [73]

Global GoalCaching

Validation:Once the candidate sketch𝑆′ passes the sandbox check, it undergoes formal validation. The system extracts all remainingsorry subgoals and cross-references them against a global goal cache using a deep hash of their exact Lean state (see “Global GoalCaching”). Ifa subgoal waspreviously resolved, theproof isretrieved immediately; otherwise, it is...

work page

[74] [74]

matches

Database registration:The candidate 𝑆′, along with per-subgoal feedback from Al- phaProof, is registered in the database. Its fitness is then determined asynchronously via Elo matchmaking. AlphaProof has a Test-Time Reinforcement Learning (TTRL) mode in which it learns to solve a problem by solving its AI-generated variants at inference time; however, we ...

work page 2026

[75] [75]

4𝑁1(𝑋) +𝑁 3(𝑋) ≤ 3|𝑋| + 2𝑁2(𝑋), which follows from writing𝑁𝑘(𝑥)= Í 𝑥∈ℤ 1𝑋 (𝑥) 1𝑋 (𝑥+ 𝑘) and a pointwise check of the indicator function1𝑋 across all4-point windows (𝑥, 𝑥+ 1, 𝑥+ 2, 𝑥+ 3); that is, summing the local inequality1𝑋 (𝑥) + 1𝑋 (𝑥+ 1) + 1𝑋 (𝑥+

work page

[76] [76]

+ 1𝑋 (𝑥) 1𝑋 (𝑥+ 2) + 1𝑋 (𝑥+ 1)1𝑋 (𝑥+ 3) ≥ 1𝑋 (𝑥) 1𝑋 (𝑥+ 1) + 21𝑋 (𝑥+ 1)1𝑋 (𝑥+ 2) + 1𝑋 (𝑥+2)1 𝑋 (𝑥+3) +1 𝑋 (𝑥)1 𝑋 (𝑥+3)and then summing over all𝑥∈ℤ

work page

[77] [77]

Consider each pair(𝑥, 𝑥+ 2) ∈𝑋 2 and we proceed by cases on𝑥+ 1

2𝑁2(𝑋) ≤𝑁 3(𝑋) + 2𝑉2(𝑋) + 2𝐼(𝑋) . Consider each pair(𝑥, 𝑥+ 2) ∈𝑋 2 and we proceed by cases on𝑥+ 1. Let 𝐺={𝑥∈𝑋|𝑥+ 1 ∉𝑋∧𝑥+ 2 ∈𝑋} . If 𝑥+ 1 ∈𝑋 , then the pair is counted exactly by𝑉2, so𝑁 2(𝑋)=𝑉 2(𝑋) + |𝐺|. If𝑥+1∉𝑋, then •If𝑥−1∈𝑋, then(𝑥−1, 𝑥+2)is a pair counted in𝑁 3(𝑋). •If𝑥+3∈𝑋, then(𝑥, 𝑥+3)is a pair counted in𝑁 3(𝑋). 41 Advancing Mathematics Research wit...

work page

[78] [78]

For any 𝛿∉{ 0,−𝑘} , the Sidon property ensures that there is at most one pair(𝑎, 𝑐) with 𝑎−𝑐=𝛿 and at most one pair(𝑑, 𝑏) with 𝑑−𝑏=𝛿+𝑘

The cases𝛿= 0and 𝛿=−𝑘 each contribute≤ |𝐴| (one coordinate determines the rest). For any 𝛿∉{ 0,−𝑘} , the Sidon property ensures that there is at most one pair(𝑎, 𝑐) with 𝑎−𝑐=𝛿 and at most one pair(𝑑, 𝑏) with 𝑑−𝑏=𝛿+𝑘 . Consequently, each gap of size𝑘in𝐷identifies exactly one quadruple. Thus|quad 𝑘| ≤𝑁 𝑘(𝐷) +2𝑛

work page

[79] [79]

Combining these bounds shows that𝑁𝑘(𝐷) and 𝑁𝑘(𝑆) are related up to𝑂(𝑛) error

Each "good" element 𝑠∈𝑆 with 𝑠+𝑘∈𝑆 (excluding ≤ 2𝑛 doubles of the form2𝑎) produces≥4quadruples, giving4𝑁 𝑘(𝑆) ≤ |quad 𝑘| +8𝑛. Combining these bounds shows that𝑁𝑘(𝐷) and 𝑁𝑘(𝑆) are related up to𝑂(𝑛) error. Apply- ing fact (2) to𝐷=𝐴−𝐴gives 4𝑁1(𝐷) +𝑁 3(𝐷) ≤3|𝐷| +2𝑁 2(𝐷) and then substitute the above two quadruple transfer bounds for𝑘= 1, 2, 3and collecting al...

work page 2002

[80] [80]

Í 𝑅𝑛,𝑘 ≤2𝑛2 𝑛/2 =𝑜 2𝑛+1 𝑛6 . 44 Advancing Mathematics Research with AI-Driven Formal Proof Search Since all residual terms are bounded by𝑜 2𝑛+1 𝑛6 , we conclude that 𝑎𝑛 =𝑓 𝑛 +𝑜 2𝑛+1 𝑛6 = 2𝑛+1 𝑛 1+ 1 𝑛 + 3 𝑛2 + 13 𝑛3 + 75 𝑛4 + 541 𝑛5 +𝑜 1 𝑛5 This establishes the conjectured asymptotic expansion.□ Theorem.(A conjecture of OEIS A228143, 2018) Let𝑠𝑚 = Í𝑚 𝑘=0 ...

work page 2018