Using Aristotle API for AI-Assisted Theorem Proving in Lean 4: A Formalisation Case Study of the Grasshopper Problem

Gabriel Rongyang Lau

arxiv: 2605.20120 · v1 · pith:6N7CZ24Ynew · submitted 2026-05-19 · 💻 cs.AI · cs.LO

Using Aristotle API for AI-Assisted Theorem Proving in Lean 4: A Formalisation Case Study of the Grasshopper Problem

Gabriel Rongyang Lau This is my paper

Pith reviewed 2026-05-20 05:03 UTC · model grok-4.3

classification 💻 cs.AI cs.LO

keywords AI-assisted theorem provingLean 4 formalizationGrasshopper problemcase studypartial proofcombinatorial argumenthelper lemmasunresolved sorry

0 comments

The pith

An AI-generated Lean formalization verifies four local lemmas for the Grasshopper problem but leaves the main theorem open with a single sorry.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports a case study in which the Aristotle API produces a Lean 4 artifact for a generalized version of the Grasshopper problem. The development includes four verified helper lemmas that address local features of a maximality strategy built around adjacent swaps and partial sums. The main theorem, however, is closed directly by an unresolved sorry, with the generated summary pointing to an uncompleted global counting step that would produce the needed contradiction. A sympathetic reader cares because the example isolates a concrete boundary: local components can be automated and checked while the overall combinatorial bookkeeping stays unresolved.

Core claim

The generated artifact states a generalized Lean version of the theorem, contains four verified helper lemmas for local components of a maximality and adjacent-swap exchange strategy, and leaves the main theorem grasshopper closed directly by one unresolved sorry. The verified components establish that the final partial sum equals the total sum, that an adjacent transposition can affect only the relevant intermediate partial sum, that the changed partial sum has the expected form, and that maximality at a position admitting an adjacent successor swap forces a corresponding forbidden-set membership fact. The Aristotle output summary identifies the intended remaining mathematical step as the全球

What carries the argument

A maximality and adjacent-swap exchange strategy whose local steps are captured by four helper lemmas on partial sums and forbidden-set memberships.

If this is right

Local aspects of the maximality strategy admit successful formal verification through AI assistance.
The global counting argument required to obtain a contradiction from the forbidden-set facts remains unencoded in the artifact.
Local proof search can succeed on individual components while the overall combinatorial structure stays incomplete.
The case study supplies a reproducible Lean source that separates verified local content from the unresolved global step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar AI-assisted formalizations of other olympiad problems may show the same split between easily automated local lemmas and harder global bookkeeping.
Tools could be tested to see whether early injection of cardinality or counting lemmas reduces the number of unresolved sorries.
The pattern suggests examining whether the same limitation appears when the target theorem relies on double-counting or pigeonhole arguments.

Load-bearing premise

The four helper lemmas are correctly verified in Lean without hidden errors or incorrect statements, and the output summary accurately identifies the missing global counting step as the precise piece needed to reach a contradiction.

What would settle it

Compile the Lean source to verify that the four helper lemmas type-check without errors, then test whether inserting an explicit counting lemma that derives at least n distinct forbidden values from the membership facts closes the main theorem's sorry.

Figures

Figures reproduced from arXiv: 2605.20120 by Gabriel Rongyang Lau.

**Figure 1.** Figure 1: Aristotle run log for the reported partial formalisation attempt. 8. Implications for AI-Assisted Theorem Proving The case study separates two kinds of formalisation work. Aristotle successfully handled local reasoning involving finite sums, permutations, adjacent swaps, and a maximality lemma. These tasks are amenable to proof-assistant verification once the relevant definitions are in place. The generate… view at source ↗

read the original abstract

AI-assisted theorem proving can now generate substantial Lean developments for olympiad-level mathematics, but the evidential status of such developments depends on which declarations are actually verified. This paper reports a Lean 4 formalization case study of an Aristotle API proof attempt for the Grasshopper problem, originally posed as IMO 2009 Problem 6. The generated artifact states a generalized Lean version of the theorem, contains four verified helper lemmas for local components of a maximality and adjacent-swap exchange strategy, and leaves the main theorem grasshopper closed directly by one unresolved sorry. The verified components establish that the final partial sum equals the total sum, that an adjacent transposition can affect only the relevant intermediate partial sum, that the changed partial sum has the expected form, and that maximality at a position admitting an adjacent successor swap forces a corresponding forbidden-set membership fact. The Aristotle output summary identifies the intended remaining mathematical step as the global counting step needed to show that these membership facts produce at least n distinct forbidden values, contradicting the cardinality assumption |M| < n; the Lean source itself does not reduce the main theorem to a separately encoded counting lemma. This case study gives an inspectable example of a central limitation in AI-assisted formalization, namely that local proof search can succeed while the global combinatorial bookkeeping required for a theorem remains unresolved. The paper contributes a reproducible Lean artifact and a precise analysis of its verified and unverified proof content.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

This paper supplies a reproducible Lean artifact that verifies four local lemmas on partial sums and swaps for the Grasshopper problem but leaves the main theorem with a single sorry on the global counting step. The artifact states a generalized version of the theorem and breaks the attempted proof into helper results that check out plus one unresolved closing step. The verified lemmas cover the final partial sum matching the total, the limited effect of an adjacent transposition, the form of the changed sum, and the link from maximality to a forbidden-set membership. The Aristotle summary flags the missing piece as the global counting argument that would produce the needed contradiction with |M| < n. Having the actual source makes the split between local success and global gap easy to inspect and check independently. That concrete documentation is the main contribution. The work does a straightforward job of reporting exactly which declarations are verified and which are not, without inflating the results. The analysis stays grounded in the artifact rather than any fitted parameters or circular claims. The soft spots are limited. The approach builds on existing AI-assisted formalization efforts, so the novelty sits mainly in the specific case and the precise accounting of what succeeded versus what did not. The paper does not claim a new method or deeper mathematical insight into the Grasshopper problem itself. No load-bearing mismatch appears between the claimed lemma content and the reported verification status. This is the kind of paper that matters for researchers working on AI tools for theorem proving in Lean, especially those focused on olympiad-level problems. Readers who want concrete examples of where current systems hit combinatorial walls will get direct value from the artifact and the breakdown. It deserves a serious referee because the source is available for checking and the observations are tied to verifiable details rather than broad assertions. I would send it out for peer review as a useful, honest case study worth keeping in the record.

Referee Report

0 major / 2 minor

Summary. The paper reports a Lean 4 formalization case study of an Aristotle API proof attempt for the Grasshopper problem (IMO 2009 Problem 6). The generated artifact states a generalized version of the theorem, contains four verified helper lemmas addressing local aspects of a maximality and adjacent-swap exchange strategy (final partial sum equals total sum, effect of adjacent transposition on intermediate partial sums, form of the changed partial sum, and maximality implying forbidden-set membership), and closes the main theorem grasshopper directly by a single unresolved sorry. The analysis notes that the Aristotle summary identifies the missing global counting step needed to produce at least n distinct forbidden values, but the Lean source does not reduce the main theorem to a separately encoded counting lemma.

Significance. If the reported structure of the artifact holds, the work is significant for supplying a transparent, inspectable, and reproducible Lean artifact that concretely illustrates a recurring limitation in current AI-assisted formalization: local proof search can succeed on helper lemmas while the global combinatorial bookkeeping required for the theorem remains unresolved. The paper explicitly credits the machine-checked status of the four helper lemmas and the direct availability of the source for independent verification, which strengthens its contribution to the literature on the practical boundaries of AI theorem-proving tools.

minor comments (2)

[Description of the Lean artifact] The description of the four helper lemmas would be clearer if each were tied to its exact Lean declaration name and file location rather than summarized by content alone.
A short appendix or repository link containing the full Lean source (beyond the summary) would further aid reproducibility, although the paper already states that the artifact is supplied.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript, for the accurate summary of the Lean artifact, and for the recommendation to accept. The report correctly identifies the verified helper lemmas and the single unresolved sorry closing the main theorem.

Circularity Check

0 steps flagged

No significant circularity; transparent case study of AI-generated Lean artifact

full rationale

The paper reports a case study of an Aristotle API attempt to formalize the Grasshopper problem in Lean 4. It describes the generated artifact's structure (generalized theorem statement, four verified helper lemmas on partial sums and adjacent transpositions, and a single 'sorry' on the main theorem) and identifies the missing global counting step. This analysis rests on direct inspection of the external Lean source and Aristotle summary rather than any internal derivation chain. No step reduces by construction to fitted inputs, self-definitions, or self-citation load-bearing premises; the observations are independently verifiable against the provided artifact and do not invoke uniqueness theorems or ansatzes from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no free parameters, axioms, or invented entities; it relies on the existing Lean 4 system and the Aristotle API as standard external tools for formalization.

pith-pipeline@v0.9.0 · 5791 in / 1251 out tokens · 65925 ms · 2026-05-20T05:03:56.836915+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 2 internal anchors

[1]

Aristotle: IMO-level Automated Theorem Proving

Tudor Achim, Alex Best, Alberto Bietti, Kevin Der, Math¨ ıs F´ ed´ erico, Sergei Gukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, Martin Michelsen, Riley Patterson, Eric Ro- driguez, Laura Scharff, Vikram Shanker, Vladmir Sicca, Hari Sowrirajan, Aidan Swope, Matyas Tamas, Vlad Tenev, Jonathan Thomm, Harold Williams...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.01346 2025
[2]

Accessed 2026-05-19

Aristotle API.https://aristotle.harmonic.fun/. Accessed 2026-05-19

work page 2026
[3]

Python library and command-line tool for the Aristotle API.https://pypi.org/project/ aristotlelib/

aristotlelib. Python library and command-line tool for the Aristotle API.https://pypi.org/project/ aristotlelib/. Accessed 2026-05-19

work page 2026
[4]

TheLean4theoremproverandprogramming language

Leonardo de Moura and Sebastian Ullrich. The Lean 4 Theorem Prover and Programming Language. In Andr´ e Platzer and Geoff Sutcliffe, editors,Automated Deduction – CADE 28, volume 12699 ofLecture Notes in Computer Science, pages 625–635, Cham, 2021. Springer.doi:10.1007/978-3-030-79876-5_37

work page doi:10.1007/978-3-030-79876-5_37 2021
[5]

Individual results.https://www.imo-official.org/resu lts/individual/year/2009/

50th International Mathematical Olympiad 2009. Individual results.https://www.imo-official.org/resu lts/individual/year/2009/. Accessed 2026-05-19

work page 2009
[6]

On the grasshopper problem with signed jumps.The American Mathematical Monthly, 118(10):877–886, 2011

G´ eza K´ os. On the grasshopper problem with signed jumps.The American Mathematical Monthly, 118(10):877–886, 2011. doi:10.4169/amer.math.monthly.118.10.877

work page doi:10.4169/amer.math.monthly.118.10.877 2011
[7]

IMO 2009 Q6 as a mini-polymath project.https://terrytao.wordpress.com/2009/07/20/ imo-2009-q6-as-a-mini-polymath-project/, 2009

Terence Tao. IMO 2009 Q6 as a mini-polymath project.https://terrytao.wordpress.com/2009/07/20/ imo-2009-q6-as-a-mini-polymath-project/, 2009. Accessed 2026-05-19

work page 2009
[8]

Tactic Reference, section on the sorry tactic

The Lean Language Reference. Tactic Reference, section on the sorry tactic. https://lean-lang.org/doc/reference/latest/Tactic-Proofs/Tactic-Reference/. Accessed 2026- 05-19

work page 2026
[9]

Mathlib4: The Lean Mathematical Library.https://github.com/ leanprover-community/mathlib4

The mathlib Community. Mathlib4: The Lean Mathematical Library.https://github.com/ leanprover-community/mathlib4. Software project. Accessed 2026-05-19

work page 2026
[10]

Trinh, Yuhuai Wu, Quoc V

Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625:476–482, 2024. doi:10.1038/s41586-023-06747-5

work page doi:10.1038/s41586-023-06747-5 2024
[11]

DeepSeek-Prover: Advancing theorem proving in LLMs through large-scale synthetic data

Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, and Xiaodan Liang. DeepSeek-Prover: Advancing theorem proving in LLMs through large-scale synthetic data. arXiv preprint arXiv:2405.14333, 2024. doi:10.48550/arXiv.2405.14333

work page doi:10.48550/arxiv.2405.14333 2024
[12]

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

Kunhao Zheng, Jesse Michael Han, and Stanislas Polu. miniF2F: A cross-system benchmark for formal olympiad-level mathematics. InInternational Conference on Learning Representations, 2022. doi:10.48550/arXiv.2109.00110

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2109.00110 2022

[1] [1]

Aristotle: IMO-level Automated Theorem Proving

Tudor Achim, Alex Best, Alberto Bietti, Kevin Der, Math¨ ıs F´ ed´ erico, Sergei Gukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, Martin Michelsen, Riley Patterson, Eric Ro- driguez, Laura Scharff, Vikram Shanker, Vladmir Sicca, Hari Sowrirajan, Aidan Swope, Matyas Tamas, Vlad Tenev, Jonathan Thomm, Harold Williams...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.01346 2025

[2] [2]

Accessed 2026-05-19

Aristotle API.https://aristotle.harmonic.fun/. Accessed 2026-05-19

work page 2026

[3] [3]

Python library and command-line tool for the Aristotle API.https://pypi.org/project/ aristotlelib/

aristotlelib. Python library and command-line tool for the Aristotle API.https://pypi.org/project/ aristotlelib/. Accessed 2026-05-19

work page 2026

[4] [4]

TheLean4theoremproverandprogramming language

Leonardo de Moura and Sebastian Ullrich. The Lean 4 Theorem Prover and Programming Language. In Andr´ e Platzer and Geoff Sutcliffe, editors,Automated Deduction – CADE 28, volume 12699 ofLecture Notes in Computer Science, pages 625–635, Cham, 2021. Springer.doi:10.1007/978-3-030-79876-5_37

work page doi:10.1007/978-3-030-79876-5_37 2021

[5] [5]

Individual results.https://www.imo-official.org/resu lts/individual/year/2009/

50th International Mathematical Olympiad 2009. Individual results.https://www.imo-official.org/resu lts/individual/year/2009/. Accessed 2026-05-19

work page 2009

[6] [6]

On the grasshopper problem with signed jumps.The American Mathematical Monthly, 118(10):877–886, 2011

G´ eza K´ os. On the grasshopper problem with signed jumps.The American Mathematical Monthly, 118(10):877–886, 2011. doi:10.4169/amer.math.monthly.118.10.877

work page doi:10.4169/amer.math.monthly.118.10.877 2011

[7] [7]

IMO 2009 Q6 as a mini-polymath project.https://terrytao.wordpress.com/2009/07/20/ imo-2009-q6-as-a-mini-polymath-project/, 2009

Terence Tao. IMO 2009 Q6 as a mini-polymath project.https://terrytao.wordpress.com/2009/07/20/ imo-2009-q6-as-a-mini-polymath-project/, 2009. Accessed 2026-05-19

work page 2009

[8] [8]

Tactic Reference, section on the sorry tactic

The Lean Language Reference. Tactic Reference, section on the sorry tactic. https://lean-lang.org/doc/reference/latest/Tactic-Proofs/Tactic-Reference/. Accessed 2026- 05-19

work page 2026

[9] [9]

Mathlib4: The Lean Mathematical Library.https://github.com/ leanprover-community/mathlib4

The mathlib Community. Mathlib4: The Lean Mathematical Library.https://github.com/ leanprover-community/mathlib4. Software project. Accessed 2026-05-19

work page 2026

[10] [10]

Trinh, Yuhuai Wu, Quoc V

Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625:476–482, 2024. doi:10.1038/s41586-023-06747-5

work page doi:10.1038/s41586-023-06747-5 2024

[11] [11]

DeepSeek-Prover: Advancing theorem proving in LLMs through large-scale synthetic data

Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, and Xiaodan Liang. DeepSeek-Prover: Advancing theorem proving in LLMs through large-scale synthetic data. arXiv preprint arXiv:2405.14333, 2024. doi:10.48550/arXiv.2405.14333

work page doi:10.48550/arxiv.2405.14333 2024

[12] [12]

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

Kunhao Zheng, Jesse Michael Han, and Stanislas Polu. miniF2F: A cross-system benchmark for formal olympiad-level mathematics. InInternational Conference on Learning Representations, 2022. doi:10.48550/arXiv.2109.00110

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2109.00110 2022