pith. sign in

arxiv: 2605.20120 · v1 · pith:6N7CZ24Ynew · submitted 2026-05-19 · 💻 cs.AI · cs.LO

Using Aristotle API for AI-Assisted Theorem Proving in Lean 4: A Formalisation Case Study of the Grasshopper Problem

Pith reviewed 2026-05-20 05:03 UTC · model grok-4.3

classification 💻 cs.AI cs.LO
keywords AI-assisted theorem provingLean 4 formalizationGrasshopper problemcase studypartial proofcombinatorial argumenthelper lemmasunresolved sorry
0
0 comments X

The pith

An AI-generated Lean formalization verifies four local lemmas for the Grasshopper problem but leaves the main theorem open with a single sorry.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports a case study in which the Aristotle API produces a Lean 4 artifact for a generalized version of the Grasshopper problem. The development includes four verified helper lemmas that address local features of a maximality strategy built around adjacent swaps and partial sums. The main theorem, however, is closed directly by an unresolved sorry, with the generated summary pointing to an uncompleted global counting step that would produce the needed contradiction. A sympathetic reader cares because the example isolates a concrete boundary: local components can be automated and checked while the overall combinatorial bookkeeping stays unresolved.

Core claim

The generated artifact states a generalized Lean version of the theorem, contains four verified helper lemmas for local components of a maximality and adjacent-swap exchange strategy, and leaves the main theorem grasshopper closed directly by one unresolved sorry. The verified components establish that the final partial sum equals the total sum, that an adjacent transposition can affect only the relevant intermediate partial sum, that the changed partial sum has the expected form, and that maximality at a position admitting an adjacent successor swap forces a corresponding forbidden-set membership fact. The Aristotle output summary identifies the intended remaining mathematical step as the全球

What carries the argument

A maximality and adjacent-swap exchange strategy whose local steps are captured by four helper lemmas on partial sums and forbidden-set memberships.

If this is right

  • Local aspects of the maximality strategy admit successful formal verification through AI assistance.
  • The global counting argument required to obtain a contradiction from the forbidden-set facts remains unencoded in the artifact.
  • Local proof search can succeed on individual components while the overall combinatorial structure stays incomplete.
  • The case study supplies a reproducible Lean source that separates verified local content from the unresolved global step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar AI-assisted formalizations of other olympiad problems may show the same split between easily automated local lemmas and harder global bookkeeping.
  • Tools could be tested to see whether early injection of cardinality or counting lemmas reduces the number of unresolved sorries.
  • The pattern suggests examining whether the same limitation appears when the target theorem relies on double-counting or pigeonhole arguments.

Load-bearing premise

The four helper lemmas are correctly verified in Lean without hidden errors or incorrect statements, and the output summary accurately identifies the missing global counting step as the precise piece needed to reach a contradiction.

What would settle it

Compile the Lean source to verify that the four helper lemmas type-check without errors, then test whether inserting an explicit counting lemma that derives at least n distinct forbidden values from the membership facts closes the main theorem's sorry.

Figures

Figures reproduced from arXiv: 2605.20120 by Gabriel Rongyang Lau.

Figure 1
Figure 1. Figure 1: Aristotle run log for the reported partial formalisation attempt. 8. Implications for AI-Assisted Theorem Proving The case study separates two kinds of formalisation work. Aristotle successfully handled local reasoning involving finite sums, permutations, adjacent swaps, and a maximality lemma. These tasks are amenable to proof-assistant verification once the relevant definitions are in place. The generate… view at source ↗
read the original abstract

AI-assisted theorem proving can now generate substantial Lean developments for olympiad-level mathematics, but the evidential status of such developments depends on which declarations are actually verified. This paper reports a Lean 4 formalization case study of an Aristotle API proof attempt for the Grasshopper problem, originally posed as IMO 2009 Problem 6. The generated artifact states a generalized Lean version of the theorem, contains four verified helper lemmas for local components of a maximality and adjacent-swap exchange strategy, and leaves the main theorem grasshopper closed directly by one unresolved sorry. The verified components establish that the final partial sum equals the total sum, that an adjacent transposition can affect only the relevant intermediate partial sum, that the changed partial sum has the expected form, and that maximality at a position admitting an adjacent successor swap forces a corresponding forbidden-set membership fact. The Aristotle output summary identifies the intended remaining mathematical step as the global counting step needed to show that these membership facts produce at least n distinct forbidden values, contradicting the cardinality assumption |M| < n; the Lean source itself does not reduce the main theorem to a separately encoded counting lemma. This case study gives an inspectable example of a central limitation in AI-assisted formalization, namely that local proof search can succeed while the global combinatorial bookkeeping required for a theorem remains unresolved. The paper contributes a reproducible Lean artifact and a precise analysis of its verified and unverified proof content.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper reports a Lean 4 formalization case study of an Aristotle API proof attempt for the Grasshopper problem (IMO 2009 Problem 6). The generated artifact states a generalized version of the theorem, contains four verified helper lemmas addressing local aspects of a maximality and adjacent-swap exchange strategy (final partial sum equals total sum, effect of adjacent transposition on intermediate partial sums, form of the changed partial sum, and maximality implying forbidden-set membership), and closes the main theorem grasshopper directly by a single unresolved sorry. The analysis notes that the Aristotle summary identifies the missing global counting step needed to produce at least n distinct forbidden values, but the Lean source does not reduce the main theorem to a separately encoded counting lemma.

Significance. If the reported structure of the artifact holds, the work is significant for supplying a transparent, inspectable, and reproducible Lean artifact that concretely illustrates a recurring limitation in current AI-assisted formalization: local proof search can succeed on helper lemmas while the global combinatorial bookkeeping required for the theorem remains unresolved. The paper explicitly credits the machine-checked status of the four helper lemmas and the direct availability of the source for independent verification, which strengthens its contribution to the literature on the practical boundaries of AI theorem-proving tools.

minor comments (2)
  1. [Description of the Lean artifact] The description of the four helper lemmas would be clearer if each were tied to its exact Lean declaration name and file location rather than summarized by content alone.
  2. A short appendix or repository link containing the full Lean source (beyond the summary) would further aid reproducibility, although the paper already states that the artifact is supplied.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript, for the accurate summary of the Lean artifact, and for the recommendation to accept. The report correctly identifies the verified helper lemmas and the single unresolved sorry closing the main theorem.

Circularity Check

0 steps flagged

No significant circularity; transparent case study of AI-generated Lean artifact

full rationale

The paper reports a case study of an Aristotle API attempt to formalize the Grasshopper problem in Lean 4. It describes the generated artifact's structure (generalized theorem statement, four verified helper lemmas on partial sums and adjacent transpositions, and a single 'sorry' on the main theorem) and identifies the missing global counting step. This analysis rests on direct inspection of the external Lean source and Aristotle summary rather than any internal derivation chain. No step reduces by construction to fitted inputs, self-definitions, or self-citation load-bearing premises; the observations are independently verifiable against the provided artifact and do not invoke uniqueness theorems or ansatzes from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no free parameters, axioms, or invented entities; it relies on the existing Lean 4 system and the Aristotle API as standard external tools for formalization.

pith-pipeline@v0.9.0 · 5791 in / 1251 out tokens · 65925 ms · 2026-05-20T05:03:56.836915+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 2 internal anchors

  1. [1]

    Aristotle: IMO-level Automated Theorem Proving

    Tudor Achim, Alex Best, Alberto Bietti, Kevin Der, Math¨ ıs F´ ed´ erico, Sergei Gukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, Martin Michelsen, Riley Patterson, Eric Ro- driguez, Laura Scharff, Vikram Shanker, Vladmir Sicca, Hari Sowrirajan, Aidan Swope, Matyas Tamas, Vlad Tenev, Jonathan Thomm, Harold Williams...

  2. [2]

    Accessed 2026-05-19

    Aristotle API.https://aristotle.harmonic.fun/. Accessed 2026-05-19

  3. [3]

    Python library and command-line tool for the Aristotle API.https://pypi.org/project/ aristotlelib/

    aristotlelib. Python library and command-line tool for the Aristotle API.https://pypi.org/project/ aristotlelib/. Accessed 2026-05-19

  4. [4]

    TheLean4theoremproverandprogramming language

    Leonardo de Moura and Sebastian Ullrich. The Lean 4 Theorem Prover and Programming Language. In Andr´ e Platzer and Geoff Sutcliffe, editors,Automated Deduction – CADE 28, volume 12699 ofLecture Notes in Computer Science, pages 625–635, Cham, 2021. Springer.doi:10.1007/978-3-030-79876-5_37

  5. [5]

    Individual results.https://www.imo-official.org/resu lts/individual/year/2009/

    50th International Mathematical Olympiad 2009. Individual results.https://www.imo-official.org/resu lts/individual/year/2009/. Accessed 2026-05-19

  6. [6]

    On the grasshopper problem with signed jumps.The American Mathematical Monthly, 118(10):877–886, 2011

    G´ eza K´ os. On the grasshopper problem with signed jumps.The American Mathematical Monthly, 118(10):877–886, 2011. doi:10.4169/amer.math.monthly.118.10.877

  7. [7]

    IMO 2009 Q6 as a mini-polymath project.https://terrytao.wordpress.com/2009/07/20/ imo-2009-q6-as-a-mini-polymath-project/, 2009

    Terence Tao. IMO 2009 Q6 as a mini-polymath project.https://terrytao.wordpress.com/2009/07/20/ imo-2009-q6-as-a-mini-polymath-project/, 2009. Accessed 2026-05-19

  8. [8]

    Tactic Reference, section on the sorry tactic

    The Lean Language Reference. Tactic Reference, section on the sorry tactic. https://lean-lang.org/doc/reference/latest/Tactic-Proofs/Tactic-Reference/. Accessed 2026- 05-19

  9. [9]

    Mathlib4: The Lean Mathematical Library.https://github.com/ leanprover-community/mathlib4

    The mathlib Community. Mathlib4: The Lean Mathematical Library.https://github.com/ leanprover-community/mathlib4. Software project. Accessed 2026-05-19

  10. [10]

    Trinh, Yuhuai Wu, Quoc V

    Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625:476–482, 2024. doi:10.1038/s41586-023-06747-5

  11. [11]

    DeepSeek-Prover: Advancing theorem proving in LLMs through large-scale synthetic data

    Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, and Xiaodan Liang. DeepSeek-Prover: Advancing theorem proving in LLMs through large-scale synthetic data. arXiv preprint arXiv:2405.14333, 2024. doi:10.48550/arXiv.2405.14333

  12. [12]

    MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

    Kunhao Zheng, Jesse Michael Han, and Stanislas Polu. miniF2F: A cross-system benchmark for formal olympiad-level mathematics. InInternational Conference on Learning Representations, 2022. doi:10.48550/arXiv.2109.00110