Using Aristotle API for AI-Assisted Theorem Proving in Lean 4: A Formalisation Case Study of the Grasshopper Problem
Pith reviewed 2026-05-20 05:03 UTC · model grok-4.3
The pith
An AI-generated Lean formalization verifies four local lemmas for the Grasshopper problem but leaves the main theorem open with a single sorry.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The generated artifact states a generalized Lean version of the theorem, contains four verified helper lemmas for local components of a maximality and adjacent-swap exchange strategy, and leaves the main theorem grasshopper closed directly by one unresolved sorry. The verified components establish that the final partial sum equals the total sum, that an adjacent transposition can affect only the relevant intermediate partial sum, that the changed partial sum has the expected form, and that maximality at a position admitting an adjacent successor swap forces a corresponding forbidden-set membership fact. The Aristotle output summary identifies the intended remaining mathematical step as the全球
What carries the argument
A maximality and adjacent-swap exchange strategy whose local steps are captured by four helper lemmas on partial sums and forbidden-set memberships.
If this is right
- Local aspects of the maximality strategy admit successful formal verification through AI assistance.
- The global counting argument required to obtain a contradiction from the forbidden-set facts remains unencoded in the artifact.
- Local proof search can succeed on individual components while the overall combinatorial structure stays incomplete.
- The case study supplies a reproducible Lean source that separates verified local content from the unresolved global step.
Where Pith is reading between the lines
- Similar AI-assisted formalizations of other olympiad problems may show the same split between easily automated local lemmas and harder global bookkeeping.
- Tools could be tested to see whether early injection of cardinality or counting lemmas reduces the number of unresolved sorries.
- The pattern suggests examining whether the same limitation appears when the target theorem relies on double-counting or pigeonhole arguments.
Load-bearing premise
The four helper lemmas are correctly verified in Lean without hidden errors or incorrect statements, and the output summary accurately identifies the missing global counting step as the precise piece needed to reach a contradiction.
What would settle it
Compile the Lean source to verify that the four helper lemmas type-check without errors, then test whether inserting an explicit counting lemma that derives at least n distinct forbidden values from the membership facts closes the main theorem's sorry.
Figures
read the original abstract
AI-assisted theorem proving can now generate substantial Lean developments for olympiad-level mathematics, but the evidential status of such developments depends on which declarations are actually verified. This paper reports a Lean 4 formalization case study of an Aristotle API proof attempt for the Grasshopper problem, originally posed as IMO 2009 Problem 6. The generated artifact states a generalized Lean version of the theorem, contains four verified helper lemmas for local components of a maximality and adjacent-swap exchange strategy, and leaves the main theorem grasshopper closed directly by one unresolved sorry. The verified components establish that the final partial sum equals the total sum, that an adjacent transposition can affect only the relevant intermediate partial sum, that the changed partial sum has the expected form, and that maximality at a position admitting an adjacent successor swap forces a corresponding forbidden-set membership fact. The Aristotle output summary identifies the intended remaining mathematical step as the global counting step needed to show that these membership facts produce at least n distinct forbidden values, contradicting the cardinality assumption |M| < n; the Lean source itself does not reduce the main theorem to a separately encoded counting lemma. This case study gives an inspectable example of a central limitation in AI-assisted formalization, namely that local proof search can succeed while the global combinatorial bookkeeping required for a theorem remains unresolved. The paper contributes a reproducible Lean artifact and a precise analysis of its verified and unverified proof content.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports a Lean 4 formalization case study of an Aristotle API proof attempt for the Grasshopper problem (IMO 2009 Problem 6). The generated artifact states a generalized version of the theorem, contains four verified helper lemmas addressing local aspects of a maximality and adjacent-swap exchange strategy (final partial sum equals total sum, effect of adjacent transposition on intermediate partial sums, form of the changed partial sum, and maximality implying forbidden-set membership), and closes the main theorem grasshopper directly by a single unresolved sorry. The analysis notes that the Aristotle summary identifies the missing global counting step needed to produce at least n distinct forbidden values, but the Lean source does not reduce the main theorem to a separately encoded counting lemma.
Significance. If the reported structure of the artifact holds, the work is significant for supplying a transparent, inspectable, and reproducible Lean artifact that concretely illustrates a recurring limitation in current AI-assisted formalization: local proof search can succeed on helper lemmas while the global combinatorial bookkeeping required for the theorem remains unresolved. The paper explicitly credits the machine-checked status of the four helper lemmas and the direct availability of the source for independent verification, which strengthens its contribution to the literature on the practical boundaries of AI theorem-proving tools.
minor comments (2)
- [Description of the Lean artifact] The description of the four helper lemmas would be clearer if each were tied to its exact Lean declaration name and file location rather than summarized by content alone.
- A short appendix or repository link containing the full Lean source (beyond the summary) would further aid reproducibility, although the paper already states that the artifact is supplied.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript, for the accurate summary of the Lean artifact, and for the recommendation to accept. The report correctly identifies the verified helper lemmas and the single unresolved sorry closing the main theorem.
Circularity Check
No significant circularity; transparent case study of AI-generated Lean artifact
full rationale
The paper reports a case study of an Aristotle API attempt to formalize the Grasshopper problem in Lean 4. It describes the generated artifact's structure (generalized theorem statement, four verified helper lemmas on partial sums and adjacent transpositions, and a single 'sorry' on the main theorem) and identifies the missing global counting step. This analysis rests on direct inspection of the external Lean source and Aristotle summary rather than any internal derivation chain. No step reduces by construction to fitted inputs, self-definitions, or self-citation load-bearing premises; the observations are independently verifiable against the provided artifact and do not invoke uniqueness theorems or ansatzes from prior author work.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aristotle: IMO-level Automated Theorem Proving
Tudor Achim, Alex Best, Alberto Bietti, Kevin Der, Math¨ ıs F´ ed´ erico, Sergei Gukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, Martin Michelsen, Riley Patterson, Eric Ro- driguez, Laura Scharff, Vikram Shanker, Vladmir Sicca, Hari Sowrirajan, Aidan Swope, Matyas Tamas, Vlad Tenev, Jonathan Thomm, Harold Williams...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.01346 2025
-
[2]
Aristotle API.https://aristotle.harmonic.fun/. Accessed 2026-05-19
work page 2026
-
[3]
Python library and command-line tool for the Aristotle API.https://pypi.org/project/ aristotlelib/
aristotlelib. Python library and command-line tool for the Aristotle API.https://pypi.org/project/ aristotlelib/. Accessed 2026-05-19
work page 2026
-
[4]
TheLean4theoremproverandprogramming language
Leonardo de Moura and Sebastian Ullrich. The Lean 4 Theorem Prover and Programming Language. In Andr´ e Platzer and Geoff Sutcliffe, editors,Automated Deduction – CADE 28, volume 12699 ofLecture Notes in Computer Science, pages 625–635, Cham, 2021. Springer.doi:10.1007/978-3-030-79876-5_37
-
[5]
Individual results.https://www.imo-official.org/resu lts/individual/year/2009/
50th International Mathematical Olympiad 2009. Individual results.https://www.imo-official.org/resu lts/individual/year/2009/. Accessed 2026-05-19
work page 2009
-
[6]
G´ eza K´ os. On the grasshopper problem with signed jumps.The American Mathematical Monthly, 118(10):877–886, 2011. doi:10.4169/amer.math.monthly.118.10.877
-
[7]
Terence Tao. IMO 2009 Q6 as a mini-polymath project.https://terrytao.wordpress.com/2009/07/20/ imo-2009-q6-as-a-mini-polymath-project/, 2009. Accessed 2026-05-19
work page 2009
-
[8]
Tactic Reference, section on the sorry tactic
The Lean Language Reference. Tactic Reference, section on the sorry tactic. https://lean-lang.org/doc/reference/latest/Tactic-Proofs/Tactic-Reference/. Accessed 2026- 05-19
work page 2026
-
[9]
Mathlib4: The Lean Mathematical Library.https://github.com/ leanprover-community/mathlib4
The mathlib Community. Mathlib4: The Lean Mathematical Library.https://github.com/ leanprover-community/mathlib4. Software project. Accessed 2026-05-19
work page 2026
-
[10]
Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625:476–482, 2024. doi:10.1038/s41586-023-06747-5
-
[11]
DeepSeek-Prover: Advancing theorem proving in LLMs through large-scale synthetic data
Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, and Xiaodan Liang. DeepSeek-Prover: Advancing theorem proving in LLMs through large-scale synthetic data. arXiv preprint arXiv:2405.14333, 2024. doi:10.48550/arXiv.2405.14333
-
[12]
MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics
Kunhao Zheng, Jesse Michael Han, and Stanislas Polu. miniF2F: A cross-system benchmark for formal olympiad-level mathematics. InInternational Conference on Learning Representations, 2022. doi:10.48550/arXiv.2109.00110
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2109.00110 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.