Recognition: unknown
Just Type It in Isabelle! AI Agents Drafting, Mechanizing, and Generalizing from Human Hints
Pith reviewed 2026-05-10 08:23 UTC · model grok-4.3
The pith
A metatheoretical specification and Isabelle/HOL formalization of minimal type annotations for rank-one polymorphic terms is given, shown via human-AI collaborative workflows for drafting and mechanizing proofs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We give a metatheoretical account of the problem, with a full formal specification and proofs, and formalize it in Isabelle/HOL. Our development is a series of experiments featuring human-driven and AI-driven formalization workflows: a human and an LLM-powered AI agent independently produce pen-and-paper proofs, and the AI agent autoformalizes both in Isabelle, with further human-hinted AI interventions refining and generalizing the development.
Load-bearing premise
That an LLM-powered AI agent can independently generate correct pen-and-paper proofs for the type-annotation problem and then successfully autoformalize and generalize them in Isabelle/HOL when given only limited human hints.
Figures
read the original abstract
Type annotations are essential when printing terms in a way that preserves their meaning under reparsing and type inference. We study the problem of complete and minimal type annotations for rank-one polymorphic $\lambda$-calculus terms, as used in Isabelle. Building on prior work by Smolka, Blanchette et al., we give a metatheoretical account of the problem, with a full formal specification and proofs, and formalize it in Isabelle/HOL. Our development is a series of experiments featuring human-driven and AI-driven formalization workflows: a human and an LLM-powered AI agent independently produce pen-and-paper proofs, and the AI agent autoformalizes both in Isabelle, with further human-hinted AI interventions refining and generalizing the development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to provide a metatheoretical account of complete and minimal type annotations for rank-1 polymorphic λ-calculus terms (as used in Isabelle), including a full formal specification and proofs, all mechanized in Isabelle/HOL. It reports a series of experiments comparing human-driven and LLM-powered AI-agent workflows: independent production of pen-and-paper proofs by a human and by an AI agent, followed by AI autoformalization of both, with further human-hinted AI interventions for refinement and generalization.
Significance. If the formalization holds, the work supplies a machine-checked foundation for an important practical problem in proof-assistant term printing and type inference, extending prior results by Smolka, Blanchette et al. The explicit machine-checked proofs constitute a verifiable artifact that strengthens the metatheoretic claims. The documented hybrid human-AI formalization workflow is a timely contribution that may inform future interactive theorem-proving practice.
minor comments (3)
- The abstract states that 'a full formal specification and proofs' are given, yet the manuscript would benefit from an explicit summary (e.g., in §3 or §4) listing the principal theorems, their Isabelle names, and the scope of the mechanized development so that readers can assess completeness without inspecting the source files.
- The description of the AI agent's 'independent' pen-and-paper proof generation should be clarified with respect to the subsequent human-hinted interventions; a short table or enumerated list contrasting the human-only, AI-only, and hybrid workflows (including success metrics or number of interventions required) would improve readability.
- All citations to prior work (Smolka, Blanchette et al.) should include precise bibliographic details and, where relevant, pointers to the specific theorems being extended.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our manuscript on the metatheoretical account of complete and minimal type annotations for rank-1 polymorphic terms, the Isabelle/HOL formalization, and the hybrid human-AI workflows. We are pleased with the recommendation for minor revision.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard properties of rank-one polymorphic lambda-calculus and type inference hold as described in prior literature.
Reference graph
Works this paper leans on
-
[1]
Archive of Formal Proofs (AFP),https://www.isa-afp.org, proof library collec- tion for Isabelle, accessed Apr 03, 2026
2026
-
[2]
Abdulaziz, M., Ammer, T., Meenakshisundaram, S., Rimpapa, A.: A Formal Anal- ysis of Algorithms for Matroids and Greedoids. In: The 16th International Confer- ence on Interactive Theorem Proving (ITP) (2025).https://doi.org/10.48550 /arXiv.2505.19816
-
[3]
GitHub repository; version 1.2.24
Anomaly: Opencode (2026),https://github.com/anomalyco/opencode, open- source AI coding environment. GitHub repository; version 1.2.24
2026
-
[4]
Anthropic: Introducing claude opus 4.6 (Feb 2026),https://www.anthropic.co m/news/claude-opus-4-6, accessed Apr 03, 2026
2026
-
[5]
GitHub repository; commit 72160e2
AWS Labs: Isabelle/Q (2026),https://github.com/awslabs/AutoCorrode/tree/ 72160e202330fc050732f40648ec16b7628bee15/iq, MCP server for Isabelle/jEdit. GitHub repository; commit 72160e2
2026
-
[6]
In: Berardi, S., Coppo, M., Damiani, F
Ballarin, C.: Locales and locale expressions in isabelle/isar. In: Berardi, S., Coppo, M., Damiani, F. (eds.) Types for Proofs and Programs. pp. 34–50. Springer Berlin Heidelberg, Berlin, Heidelberg (2004)
2004
- [7]
-
[8]
Journal of Automated Rea- soning56(2), 155–200 (2016).https://doi.org/10.1007/S10817-015-9335-3
Blanchette, J.C., Böhme, S., Fleury, M., Smolka, S.J., Steckermeier, A.: Semi- intelligible Isar proofs from machine-generated proofs. Journal of Automated Rea- soning56(2), 155–200 (2016).https://doi.org/10.1007/S10817-015-9335-3
-
[9]
Brecknell, M., Greenaway, D., Hölzl, J., Immler, F., Klein, G., Kolanski, R., Lim, J., Norrish, M., Schirmer, N., Sickert, S., Sewell, T., Tuch, H., Wimmer, S.: Auto- corres2. Arch. Formal Proofs2024(2024),https://www.isa-afp.org/entries/A utoCorres2.html
2024
-
[10]
In: The Fourteenth International Conference on Learning Representations (2026),https://openreview.net/forum?id=KfxRzCmRSX
Chen, G., Jing, W., Chen, X., Zhao, X., Song, R., Li, C., Fan, K., Liu, D., Liao, M.: Reform: Reflective autoformalization with prospective bounded sequence optimiza- tion. In: The Fourteenth International Conference on Learning Representations (2026),https://openreview.net/forum?id=KfxRzCmRSX
2026
-
[11]
Chen, L.T., Ko, H.S.: A Formal Treatment of Bidirectional Typing, pp. 115–142. Springer Nature Switzerland (2024).https://doi.org/10.1007/978-3-031-572 62-3_5
-
[12]
Damas, L.: Type Assignment in Programming Languages. Ph.D. thesis, University of Edinburgh (1985), technical Report CST-33-85
1985
-
[13]
In: Pro- ceedings of the 9th ACM SIGPLAN-SIGACT Symposium on Principles of Pro- gramming Languages (POPL)
Damas, L., Milner, R.: Principal type-schemes for functional programs. In: Pro- ceedings of the 9th ACM SIGPLAN-SIGACT Symposium on Principles of Pro- gramming Languages (POPL). pp. 207–212. ACM (1982)
1982
-
[14]
In: Proceedings of the 2013 ACM SIGPLAN workshop on Dependently-typed programming
Danielsson, N.A.: Correct-by-construction pretty-printing. In: Proceedings of the 2013 ACM SIGPLAN workshop on Dependently-typed programming. pp. 1–12. ICFP’13, ACM (Sep 2013).https://doi.org/10.1145/2502409.2502410 22 Kappelmann et al
-
[15]
Nature600(7887), 70–74 (2021) https://doi.org/10.1038/s41586-021-04086-x
Davies, A., Veličković, P., Buesing, L., Blackwell, S., Zheng, D., Tomašev, N., Tanburn, R., Battaglia, P., Blundell, C., Juhász, A., Lackenby, M., Williamson, G., Hassabis, D., Kohli, P.: Advancing mathematics by guiding human intuition with AI. Nature (2021).https://doi.org/10.1038/s41586-021-04086-x
-
[16]
In: The Fourteenth International Conference on Learning Representations (2026), https://openreview.net/forum?id=a2XmC7rHIU
Dekoninck, J., Petrov, I., Minchev, K., Marinov, M., Drencheva, M., Konova, L., Shumanov, M.M., Tsvetkov, K., Drenchev, N., Todorov, L.D., Nikolova, K., Georgiev, N., Kalinkova, V., Ismoldayev, M., Balunovic, M., Vechev, M.: The open proof corpus: A large-scale study of LLM-generated mathematical proofs. In: The Fourteenth International Conference on Lear...
2026
-
[17]
In: Andronick, J., de Moura, L
Desharnais, M., Vukmirovic, P., Blanchette, J., Wenzel, M.: Seventeen provers under the hammer. In: Andronick, J., de Moura, L. (eds.) ITP 2022. pp. 8:1– 8:18. LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022).https: //doi.org/10.4230/LIPICS.ITP.2022.8
-
[18]
ACM Computing Surveys 54(5), 1–38 (May 2021).https://doi.org/10.1145/3450952
Dunfield, J., Krishnaswami, N.: Bidirectional typing. ACM Computing Surveys 54(5), 1–38 (May 2021).https://doi.org/10.1145/3450952
-
[19]
Dunfield,J.,Krishnaswami,N.R.:Completeandeasybidirectionaltypecheckingfor higher-rank polymorphism. In: Morrisett, G., Uustalu, T. (eds.) ACM SIGPLAN International Conference on Functional Programming, ICFP’13, Boston, MA, USA - September 25 - 27, 2013. pp. 429–442. ACM (2013).https://doi.org/10.1145/ 2500365.2500582,https://doi.org/10.1145/2500365.2500582
-
[20]
accessed Mar 30, 2026 (2019),https://isab elle.in.tum.de/dist-Isabelle2025-2/library/HOL/HOL-ex/Sketch_and_Exp lore.html
Haftmann, F.: Sketch and Explore. accessed Mar 30, 2026 (2019),https://isab elle.in.tum.de/dist-Isabelle2025-2/library/HOL/HOL-ex/Sketch_and_Exp lore.html
2026
-
[21]
Transactions of the American Mathematical Society146, 29–60 (1969).https: //doi.org/10.2307/1995158
Hindley, J.R.: The principal type-scheme of an object in combinatory logic. Transactions of the American Mathematical Society146, 29–60 (1969).https: //doi.org/10.2307/1995158
-
[22]
In: Proceedings of the 30th SymposiumonImplementationandApplicationofFunctionalLanguages.p.37–48
Jenkins, C., Stump, A.: Spine-local type inference. In: Proceedings of the 30th SymposiumonImplementationandApplicationofFunctionalLanguages.p.37–48. IFL ’18, Association for Computing Machinery, New York, NY, USA (2018).http s://doi.org/10.1145/3310232.3310233
-
[23]
In: The Eleventh International Conference on Learning Represen- tations (2023),https://openreview.net/forum?id=SMa9EAovKMC
Jiang, A.Q., Welleck, S., Zhou, J.P., Lacroix, T., Liu, J., Li, W., Jamnik, M., Lam- ple, G., Wu, Y.: Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. In: The Eleventh International Conference on Learning Represen- tations (2023),https://openreview.net/forum?id=SMa9EAovKMC
2023
-
[24]
AI for Mathematics: Progress, Challenges, and Prospects
Ju, H., Dong, B.: Ai for mathematics: Progress, challenges, and prospects (2026). https://doi.org/10.48550/ARXIV.2601.13209
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.13209 2026
-
[25]
(Apr 2026).https: //doi.org/10.5281/zenodo.19406624
Kappelmann, K., Schäffeler, M., Stevens, L., Abdulaziz, M., Popescu, A., Traytel, D.: Supplementary material associated to this submission. (Apr 2026).https: //doi.org/10.5281/zenodo.19406624
-
[26]
Knuth, D.: Claude’s cycles (2026),https://www-cs-faculty.stanford.edu/~k nuth/papers/claude-cycles.pdf
2026
-
[27]
Korte, B., Vygen, J.: Combinatorial Optimization. Springer (2012).https://doi. org/10.1007/978-3-642-24488-9
-
[28]
In: Proceedings of the Eighth ACM SIGPLAN International Conference on Functional Programming
Le Botlan, D., Rémy, D.: Mlf: raising ml to the power of system f. In: Proceedings of the Eighth ACM SIGPLAN International Conference on Functional Programming. p. 27–38. ICFP ’03, Association for Computing Machinery, New York, NY, USA (2003).https://doi.org/10.1145/944705.944709
-
[29]
In: The Fourteenth International Conference on Learning Representations (2026),https://openreview.net/forum ?id=j4C0nALrgK
Lin, Y., Tang, S., Lyu, B., Yang, Z., Chung, J.H., Zhao, H., Jiang, L., Geng, Y., Ge, J., Sun, J., Wu, J., Gesi, J., Lu, X., Acuna, D., Yang, K., Lin, H., Choi, Just Type It in Isabelle! 23 Y., Chen, D., Arora, S., Jin, C.: Goedel-prover-v2: Scaling formal theorem proving with scaffolded data synthesis and self-correction. In: The Fourteenth International...
2026
-
[30]
Megdiche, Y., Huch, F., Stevens, L.: A linter for isabelle: Implementation and evaluation (2022).https://doi.org/10.48550/ARXIV.2207.10424
-
[31]
Milehins, M.: An extension of the framework types-to-sets for Isabelle/HOL. In: Popescu, A., Zdancewic, S. (eds.) CPP 2022. pp. 180–196. ACM (2022).https: //doi.org/10.1145/3497775.3503674
-
[32]
Milner, R.: A theory of type polymorphism in programming. Journal of Computer and System Sciences17(3), 348–375 (1978).https://doi.org/10.1016/0022-0 000(78)90014-4
-
[33]
Paulson, L.C.: ML for the working programmer (2. ed.). Cambridge University Press (1996)
1996
-
[34]
In: IWIL@LPAR
Paulson, L.C., Blanchette, J.C.: Three years of experience with sledgeham- mer, a practical link between automatic and interactive theorem provers. In: IWIL@LPAR. pp. 1–11. EPiC Series in Computing, EasyChair (2010)
2010
-
[35]
Pierce, B.C., Turner, D.N.: Local type inference. ACM Transactions on Program- ming Languages and Systems22(1), 1–44 (Jan 2000).https://doi.org/10.114 5/345099.345100
-
[36]
In: Proceedings of the third ACM Haskell symposium on Haskell
Rendel, T., Ostermann, K.: Invertible syntax descriptions: unifying parsing and pretty printing. In: Proceedings of the third ACM Haskell symposium on Haskell. pp. 1–12. ICFP ’10, ACM (Sep 2010).https://doi.org/10.1145/1863523.1863 525
-
[37]
Sergey, I.: Verifying Move borrow checker in Lean: an experiment in AI-assisted PL metatheory (Mar 2026),https://proofsandintuitions.net/2026/03/18/m ove-borrow-checker-lean/, archive snapshot:https://web.archive.org/web/ 20260330164922/https://proofsandintuitions.net/2026/03/18/move-borro w-checker-lean/
2026
-
[38]
Bachelor’s thesis, Technical University of Munich (2013),https://smolka.st/pa pers/bsc-thesis.pdf
Smolka, S.J.: Synthesis of Robust, Semi-Intelligble Isar Proofs from ATP Proofs. Bachelor’s thesis, Technical University of Munich (2013),https://smolka.st/pa pers/bsc-thesis.pdf
2013
-
[39]
Smolka, S.J., Blanchette, J.C.: Robust, semi-intelligible Isabelle proofs from ATP proofs.In:Blanchette,J.C.,Urban,J.(eds.)PxTP2013.ThirdInternationalWork- shop on Proof Exchange for Theorem Proving. EPiC Series in Computing, vol. 14, pp. 117–132 (2013).https://doi.org/10.29007/zbdb
-
[40]
In: FormaliSE
Tan, C., Donaldson, A.F., y Munive, J.J.H., Wickerson, J.: The burden of proof: Automated tooling for rapid iteration on large mechanised proofs. In: FormaliSE
-
[41]
pp. 34–45. IEEE (2025).https://doi.org/10.1109/FORMALISE66629.202 5.00010
-
[42]
Nature (2024).https://doi.org/10.1038/s415 86-023-06747-5
Trinh, T.H., Wu, Y., Le, Q.V., He, H., Luong, T.: Solving olympiad geometry without human demonstrations. Nature (2024).https://doi.org/10.1038/s415 86-023-06747-5
- [43]
-
[44]
GitHub repository; commit a9fe8b5 24 Kappelmann et al
Urban, J.: Rules for working on the Isabelle/HOL formalization (2026),https: //github.com/JUrban/isa_top_autoform1/blob/a9fe8b592cf2918e236edb0 2dce05d6fb5bd7a37/CLAUDE.md, CLAUDE.md file. GitHub repository; commit a9fe8b5 24 Kappelmann et al
2026
-
[45]
In: The Twelfth International Conference on Learning Representations (2024),https://openreview.net/forum?id=3f5PALef5B
Wang, H., Xin, H., Zheng, C., Liu, Z., Cao, Q., Huang, Y., Xiong, J., Shi, H., Xie, E., Yin, J., Li, Z., Liang, X.: LEGO-prover: Neural theorem proving with growing libraries. In: The Twelfth International Conference on Learning Representations (2024),https://openreview.net/forum?id=3f5PALef5B
2024
-
[46]
Wells, J.: Typability and type checking in system f are equivalent and undecidable. Annals of Pure and Applied Logic98(1), 111–156 (1999).https://doi.org/http s://doi.org/10.1016/S0168-0072(98)00047-5
-
[47]
In: TPHOLs
Wenzel, M.: Isar - A generic interpretative approach to readable formal proof doc- uments. In: TPHOLs. pp. 167–184. Lecture Notes in Computer Science, Springer (1999)
1999
-
[48]
Wu,Y.,Huang,D.,Wan,R.,Peng,Y.,Shang,S.,Cao,C.,Qi,L.,Zhang,R.,Zhang, X., Du, Z., Yan, J., Hu, X.: Stepfun-formalizer: Unlocking the autoformalization potential of llms through knowledge-reasoning fusion. Proceedings of the AAAI Conference on Artificial Intelligence40(40), 33980–33988 (Mar 2026).https://do i.org/10.1609/aaai.v40i40.40691
-
[49]
In: The 4th Workshop on Mathematical Reasoning and AI at NeurIPS’24 (2024), https://openreview.net/forum?id=TPtXLihkny
Xin, H., Guo, D., Shao, Z., Ren, Z., Zhu, Q., Liu, B., Ruan, C., Li, W., Liang, X.: Advancing theorem proving in LLMs through large-scale synthetic data. In: The 4th Workshop on Mathematical Reasoning and AI at NeurIPS’24 (2024), https://openreview.net/forum?id=TPtXLihkny
2024
-
[50]
Xu, Y., Odersky, M.: Agentic proof automation: A case study (2026).https: //doi.org/10.48550/ARXIV.2601.03768
-
[51]
Proceedings of the ACM on Programming Languages10(POPL) (Jan 2026).http s://doi.org/10.1145/3776653
Xue, X., Cui, C., Jiang, S., Oliveira, B.C.d.S.: Local contextual type inference. Proceedings of the ACM on Programming Languages10(POPL) (Jan 2026).http s://doi.org/10.1145/3776653
-
[52]
Xue, X., Oliveira, B.C.d.S.: Contextual typing. Proceedings of the ACM on Pro- gramming Languages8(ICFP) (Aug 2024).https://doi.org/10.1145/3674655 Just Type It in Isabelle! 25 APPENDIX This appendix contains further details on the concepts and statements from the main paper. In case of acceptance, the appendix will be replaced by a tech- nical report cit...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.