arxiv: 2604.07455 · v1 · submitted 2026-04-08 · 💻 cs.AI · cs.LG· cs.LO

Recognition: no theorem link

Munkres' General Topology Autoformalized in Isabelle/HOL

Dustin Bryant , Jonathan Juli\'an Huerta y Munive , Cezary Kaliszyk , Josef Urban

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:24 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.LO

keywords autoformalizationIsabelle/HOLMunkres topologyLLM-assisted formalizationformal proofgeneral topologytheorem provingsledgehammer

0 comments

The pith

LLM-assisted formalization has produced a complete verified Isabelle/HOL version of Munkres' General Topology with all 806 results proved.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports an experiment in which LLM coding agents generated more than 85,000 lines of Isabelle/HOL code that covers every section of Munkres' General Topology. The resulting library contains 806 theorems, all fully proved with no unproven steps remaining. The process combined human supervision with a workflow that lets the model sketch proofs first and then relies on automated tools to finish them. If this approach works reliably, standard mathematical textbooks could be converted into machine-checkable form on a timescale of weeks instead of years of manual labor.

Core claim

The experiment produced a full formalization of Munkres' Topology in Isabelle/HOL covering topological spaces through dimension theory across all 39 sections. LLM agents completed the work in 24 active days, yielding over 85,000 lines of code in which every one of the 806 formal results is completely proved. The formalization includes the Tychonoff theorem, the Baire category theorem, the Nagata-Smirnov and Smirnov metrization theorems, the Stone-Cech compactification, Ascoli's theorem, and the space-filling curve. The methodology centered on a sorry-first declarative workflow paired with repeated use of sledgehammer to let the prover supply missing details after the model provides the high-

What carries the argument

The sorry-first declarative proof workflow combined with bulk use of sledgehammer, which allows the LLM to generate high-level proof outlines while the prover automatically completes the details under human guidance.

If this is right

Major results such as the Tychonoff theorem and Baire category theorem can be fully verified inside the same library.
The entire sequence from basic topological spaces to dimension theory is now available in machine-checked form.
Textbook-scale formalization can be completed in roughly three weeks of active effort with current LLM assistance.
The sorry-first plus sledgehammer pattern proved effective enough to eliminate all remaining unproved steps.
Session logs from the human-LLM exchanges provide data on effective interaction patterns for future efforts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same workflow could be applied to other standard textbooks in algebra or analysis to build larger verified libraries.
Reduced human oversight might become possible if the model learns to request clarification only on ambiguous points.
A complete formal version of a core text could serve as a stable base for checking new research results in topology.
The speed achieved suggests that formalization effort for undergraduate-level mathematics may drop by an order of magnitude.

Load-bearing premise

That the LLM-generated statements and proofs, after human review, faithfully capture the intended meaning and logical structure of the original theorems in Munkres' textbook.

What would settle it

A concrete mismatch between any formal statement or proof in the Isabelle library and the corresponding theorem as written in Munkres' book, or a failure of one of the 806 proved results to hold when checked against the informal source.

Figures

Figures reproduced from arXiv: 2604.07455 by Cezary Kaliszyk, Dustin Bryant, Jonathan Juli\'an Huerta y Munive, Josef Urban.

read the original abstract

We describe an experiment in LLM-assisted autoformalization that produced over 85,000 lines of Isabelle/HOL code covering all 39 sections of Munkres' Topology (general topology, Chapters 2--8), from topological spaces through dimension theory. The LLM-based coding agents (initially ChatGPT 5.2 and then Claude Opus 4.6) used 24 active days for that. The formalization is complete: all 806 formal results are fully proved with zero sorry's. Proved results include the Tychonoff theorem, the Baire category theorem, the Nagata--Smirnov and Smirnov metrization theorems, the Stone--\v{C}ech compactification, Ascoli's theorem, the space-filling curve, and others. The methodology is based on a "sorry-first" declarative proof workflow combined with bulk use of sledgehammer - two of Isabelle major strengths. This leads to relatively fast autoformalization progress. We analyze the resulting formalization in detail, analyze the human--LLM interaction patterns from the session log, and briefly compare with related autoformalization efforts in Megalodon, HOL Light, and Naproche. The results indicate that LLM-assisted formalization of standard mathematical textbooks in Isabelle/HOL is quite feasible, cheap and fast, even if some human supervision is useful.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They produced a complete LLM-assisted formalization of Munkres' topology textbook in Isabelle/HOL, with all 806 results machine-proved and zero sorrys.

read the letter

The main point is that an LLM pipeline turned the full Munkres general topology text into 85,000 lines of verified Isabelle code covering every section from basic spaces through dimension theory. They proved every one of the 806 results, including Tychonoff, Baire category, the metrization theorems, Stone-Čech, and Ascoli, all without leaving any sorrys. The sorry-first workflow plus heavy sledgehammer use kept the process moving in roughly 24 active days, and they switched from ChatGPT to Claude when needed. The session-log analysis and comparison to Megalodon, HOL Light, and Naproche add practical detail on how the human-LLM loop actually worked.

Referee Report

1 major / 2 minor

Summary. The paper reports an LLM-assisted autoformalization of all 39 sections of Munkres' Topology (Chapters 2-8) in Isabelle/HOL. Using ChatGPT 5.2 and Claude Opus 4.6 over 24 active days, the authors produced >85,000 lines of code containing 806 fully machine-checked results (zero sorry statements) that include the Tychonoff theorem, Baire category theorem, Nagata-Smirnov and Smirnov metrization theorems, Stone-Čech compactification, Ascoli's theorem, and the space-filling curve. The workflow relies on a sorry-first declarative style combined with heavy use of sledgehammer; the paper also analyzes human-LLM interaction patterns and briefly compares the effort to prior autoformalization projects.

Significance. If the formal statements faithfully reproduce Munkres' definitions and theorems, the work supplies the first complete machine-checked library for a standard general-topology textbook. The zero-sorry, fully proved status of all 806 results and the modest human time investment demonstrate that LLM-assisted formalization of textbook mathematics in Isabelle/HOL is now practical. Such a library could serve as a reproducible benchmark for future autoformalization systems and as a verified foundation for further developments in topology.

major comments (1)

[Abstract, §1] Abstract and §1 (methodology): the central claim that the 806 results constitute a formalization of Munkres' textbook rests on the unquantified assertion of 'human supervision.' No protocol, sample side-by-side comparisons, or count of statements edited for fidelity versus provability is supplied. Because small differences in the choice of basis, filter, or compactification construction can alter the scope of theorems such as Nagata-Smirnov metrization or Stone-Čech compactification while still admitting a formal proof, the absence of documented fidelity checks is load-bearing for the completeness claim.

minor comments (2)

[final section] The comparison with Megalodon, HOL Light, and Naproche in the final section is only one paragraph long; expanding it with concrete metrics (lines of code, number of theorems, human time) would strengthen the positioning of the contribution.
[§3] The paper states that the formalization covers 'all 39 sections' but does not list which specific theorems from each section were included or omitted; an appendix table would improve transparency.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and for identifying an important area where the manuscript can be strengthened. We address the major comment below and will revise the paper accordingly to improve transparency.

read point-by-point responses

Referee: [Abstract, §1] Abstract and §1 (methodology): the central claim that the 806 results constitute a formalization of Munkres' textbook rests on the unquantified assertion of 'human supervision.' No protocol, sample side-by-side comparisons, or count of statements edited for fidelity versus provability is supplied. Because small differences in the choice of basis, filter, or compactification construction can alter the scope of theorems such as Nagata-Smirnov metrization or Stone-Čech compactification while still admitting a formal proof, the absence of documented fidelity checks is load-bearing for the completeness claim.

Authors: We agree that a more detailed account of the human supervision process is necessary to substantiate the fidelity claim. The current manuscript notes that 'some human supervision is useful' but does not describe the protocol, provide examples, or quantify edits. In the revised version we will expand §1 with a new subsection on the supervision workflow. This will include: (i) the iterative prompting strategy used to align LLM-generated statements with Munkres' definitions and theorems; (ii) two or three concrete side-by-side comparisons (e.g., the basis definition in §13 and the statement of Tychonoff's theorem in §37); and (iii) an approximate breakdown of the 806 results according to whether the primary human intervention concerned fidelity to the textbook or merely provability. We will also note that the formalizations of the Nagata-Smirnov metrization theorem and Stone-Čech compactification follow the constructions given in Munkres (using the specific basis and filter choices stated in the text) rather than alternative formulations that might change the theorem's scope. These additions will make the completeness claim verifiable without requiring readers to inspect the full Isabelle sources. revision: yes

Circularity Check

0 steps flagged

No circularity: external textbook translated into independent formal system

full rationale

The paper describes an LLM-assisted translation of the external Munkres Topology textbook (Chapters 2-8) into Isabelle/HOL, yielding 806 fully proved results with zero sorry's. The central claim is the existence and completeness of this formalization, verified inside Isabelle rather than derived from any of the paper's own statements, fitted parameters, or self-referential definitions. The workflow (sorry-first + sledgehammer) and human supervision are methodological details that support the translation process but do not create a loop where a result is presupposed by its own inputs. Comparisons to prior autoformalization projects are external references and do not bear the load of the main claim. The fidelity of generated statements to the textbook is an external correctness question, not a circularity issue within the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The effort rests on the existing Isabelle/HOL logic and libraries plus human oversight to ensure fidelity to the source text; no new mathematical axioms or entities are introduced.

axioms (1)

standard math Soundness of Isabelle/HOL and its standard libraries
All proofs are checked inside this system.

pith-pipeline@v0.9.0 · 5559 in / 1070 out tokens · 50945 ms · 2026-05-10T18:24:36.594728+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 1 canonical work pages

[1]

P. W. Abrahams: Machine verification of mathematical proof. Sc.D. thesis, Mas- sachusetts Institute of Technology (1966)

1966
[2]

Bancerek et al

G. Bancerek et al. The role of the Mizar Mathematical Library for interactive proof development in Mizar. In: Journal of Automated Reasoning. Springer (2017)

2017
[3]

Blanchette, C

J.C. Blanchette, C. Kaliszyk, L.C. Paulson, J. Urban: Hammering towards QED. J. Formaliz. Reason.9(1), 101–148 (2016)

2016
[4]

D. G. Bobrow: Natural language input for a computer problem solving system. PhD thesis, Massachusetts Institute of Technology (1964)

1964
[5]

Paulson, W.Li: Simple Type Theory is not too Simple: Grothendieck’s Schemes Without Dependent Types In:Exp

A.Bordg, L.C. Paulson, W.Li: Simple Type Theory is not too Simple: Grothendieck’s Schemes Without Dependent Types In:Exp. Math.31(2), 364–382 (2022)

2022
[6]

Brown, C

C.E. Brown, C. Kaliszyk, M. Suda, J. Urban: Hammering higher order set theory. In: CICM 2025, LNCS 16136, pp. 3–20. Springer (2025)

2025
[7]

Ganesalingam: The language of mathematics: A linguistic and philosophical investigation

M. Ganesalingam: The language of mathematics: A linguistic and philosophical investigation. LNCS 7805. Springer, Berlin Heidelberg (2013)

2013
[8]

A. Hahn, A. De Lon: Understandable Autoformalization with Felix. Submitted to ITP 2026

2026
[9]

Hölzl, F

J. Hölzl, F. Immler, B. Huffman: Type Classes and Filters for Mathematical Anal- ysis in Isabelle/HOL. In: ITP 2013, LNCS, pp. 279–294. Springer (2013)

2013
[10]

A. Q. Jiang, S. Welleck, J. P. Zhou, T. Lacroix, J. Liu, W. Li, M. Jamnik, G. Lam- ple, Y. Wu: Draft, Sketch, and Prove: Guiding Formal Theorem Provers with In- formal Proofs. In: ICLR 2023, Kigali, Rwanda, May 1–5. OpenReview.net (2023)

2023
[11]

Kaliszyk, J

C. Kaliszyk, J. Urban, J. Vyskocil, H. Geuvers: Developing corpus-based transla- tion methods between informal and formal mathematics. In: CICM 2014, LNCS 8543, pp. 435–439. Springer (2014)

2014
[12]

Kaliszyk, J

C. Kaliszyk, J. Urban, J. Vyskocil: Learning to parse on aligned corpora (rough diamond). In: ITP 2015, LNCS 9236, pp. 227–233. Springer (2015)

2015
[13]

Paulson: Isabelle/Copilot: An AI-driven assistant for Is- abelle/HOL (2026).https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/ 2026-02/msg00039.html

D.Mulligan, L.C. Paulson: Isabelle/Copilot: An AI-driven assistant for Is- abelle/HOL (2026).https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/ 2026-02/msg00039.html

2026
[14]

Becker, D.Mulligan, et

H. Becker, D.Mulligan, et. al: Isabelle/Assistant (2026).https://github.com/ awslabs/AutoCorrode/tree/main/isabelle-assistant

2026
[15]

Munkres:Topology, 2nd edn

J.R. Munkres:Topology, 2nd edn. Prentice Hall (2000)

2000
[16]

LNCS 2283

T.Nipkow,L.C.Paulson,M.Wenzel:Isabelle/HOL—A Proof Assistant for Higher- Order Logic. LNCS 2283. Springer (2002)

2002
[17]

L.C.Paulson:Isabelle:TheNextSevenHundredTheoremProvers.In:CADE1988, LNCS, pp. 772–773. Springer (1988)

1988
[18]

Paulson: Porting the HOL Light analysis library: some lessons

L. Paulson: Porting the HOL Light analysis library: some lessons. In: CPP 2017, pp. 1. ACM (2017)

2017
[19]

Simon: Checking Number Theory Proofs in Natural Language

D. Simon: Checking Number Theory Proofs in Natural Language. Ph. D. Disser- tation. University of Texas at Austin. (1990) 12 https://ai4reason.eu/sponsors.html 22 D. Bryant, J.J. Huerta y Munive, C. Kaliszyk, J. Urban

1990
[20]

Sutcliffe: The TPTP Problem Library and Associated Infrastructure - From CNF to TH0

G. Sutcliffe: The TPTP Problem Library and Associated Infrastructure - From CNF to TH0. In: J. Autom. Reason., pp. 483–502. Springer (2017)

2017
[21]

Floris van Doorn, Patrick Massot, and Oliver Nash

J. Urban: 130k Lines of Formal Topology in Two Weeks: Simple and Cheap Auto- formalization for Everyone? arXiv:2601.03298 (2026)

work page arXiv 2026
[22]

Q. Wang, C. Kaliszyk, J. Urban: First experiments with neural translation of in- formal to formal mathematics. In: CICM 2018, LNCS 11006, pp. 255–270. Springer (2018)

2018
[23]

D.Williams:Probability with Martingales.Cambridge:CambridgeUniversityPress. (1991)

1991
[24]

Y. Wu, A. Q. Jiang, W. Li, M. N. Rabe, C. Staats, M. Jamnik, C. Szegedy: Aut- oformalization with Large Language Models. In: Advances in Neural Information Processing Systems 35 (NeurIPS 2022), New Orleans, LA, USA, November 28– December 9. (2022)

2022
[25]

Zinn: Understanding informal mathematical discourse

C. Zinn: Understanding informal mathematical discourse. PhD thesis, Institut für Informatik, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Ger- many (2004)

2004