Hypothesis-Disciplined Multi-Agent Automated Formalization of Asymptotic Statistical Theory

Ethan X. Fang; Junwei Lu; Tingzhou Wei; Zeyu Zheng

arxiv: 2606.20642 · v1 · pith:MO5UBLYOnew · submitted 2026-06-03 · 💻 cs.AI · cs.LG· cs.LO· math.ST· stat.TH

Hypothesis-Disciplined Multi-Agent Automated Formalization of Asymptotic Statistical Theory

Tingzhou Wei , Zeyu Zheng , Ethan X. Fang , Junwei Lu This is my paper

Pith reviewed 2026-06-28 05:49 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.LOmath.STstat.TH

keywords asymptotic statistical theoryLean formalizationmulti-agent workflowparametric modelssemi-parametric modelshypothesis auditformal verificationsource faithfulness

0 comments

The pith

A multi-agent pipeline with hypothesis audits produces an axiom-clean formalization of asymptotic statistical theory in Lean.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a workflow that coordinates seven specialized agents to translate asymptotic statistical results into Lean 4. The central discipline requires the Auditor agent to verify that every hypothesis and concept either matches the original mathematical text exactly, is explicitly justified as a Lean encoding choice, or is rejected. This produces a library containing checked proofs of asymptotic distribution and efficiency results for parametric and semi-parametric models. The organization keeps general infrastructure separate from theorem-specific parts. A reader would care because the method addresses the gap between informal convergence statements and formal verification while preserving source faithfulness.

Core claim

A hypothesis-disciplined Lean 4 formalization pipeline built from multiple agents produces an axiom-clean and source-faithful development containing Lean-checked and human-audited proofs of core parametric and semi-parametric theorems on asymptotic distributions and efficiency, with theorem-agnostic infrastructure and statistical concept definitions separated from theorem-specific assembly.

What carries the argument

The hypothesis-disciplined audit implemented by the Auditor agent, which requires every main-theorem hypothesis and concept-layer field to be anchored in the source mathematical prose, justified as a Lean encoding adapter, marked as source-implied, or rejected as an unsupported strengthening.

If this is right

Core results on asymptotic normality and efficiency bounds for parametric and semi-parametric models become machine-verified.
The formalization remains free of extra axioms beyond those in Lean.
Infrastructure for statistical concepts can be reused across multiple theorems without duplication.
Human audit combined with agent roles produces proofs that are both checked and traceable to source text.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same audit structure could be tested on other areas that mix analysis and regularity conditions, such as stochastic calculus.
The separation of general infrastructure from specific theorems suggests a template for scaling formal libraries in statistics.
If the workflow succeeds here, it indicates a route for formalizing results that currently sit outside existing Lean libraries due to their mix of limits and functional-analytic assumptions.

Load-bearing premise

The audit correctly forces every hypothesis to trace back to the source prose or be rejected.

What would settle it

Discovery of a theorem in the released Lean code whose stated hypotheses include a condition absent from the cited source paper and not marked as an explicit encoding adapter.

Figures

Figures reproduced from arXiv: 2606.20642 by Ethan X. Fang, Junwei Lu, Tingzhou Wei, Zeyu Zheng.

**Figure 2.** Figure 2: Triage of a blocked Executor return: a missing or suspect assumption routes to the Auditor’s drift control, while a doubted sub-lemma is truth-verified by a Scout. per-row proof outline tying each formal sub-lemma to the corresponding source-proof step. The Scaffolder then commits a single build-clean skeleton in which every sub-lemma body is sorry, locking signatures for Executor body-filling. The Auditor… view at source ↗

**Figure 3.** Figure 3: The Hypothesis-Disciplined Auditor’s source-anchor classification: each audited maintheorem hypothesis, instance constraint, and definition field is tied to a book-reference row, classified by its source anchor, and either accepted (directly or by user consent) or removed as drift. # Check Kind Trigger 1 no stray sorry mechanical per build 2 axioms usage clean mechanical per build 3 each main-sig hypothe… view at source ↗

**Figure 4.** Figure 4: Dependency graph of the five cornerstone theorems with approximate per-cornerstone [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

read the original abstract

Asymptotic statistical theory is a challenging domain for AI-assisted formalization: its central results mix convergence statements, asymptotic expansions, functional analysis, and regularity conditions that have a large gap from existing infrastructure in Lean 4 formalization. To address these challenges, we propose a hypothesis-disciplined Lean 4 formalization pipeline built from multiple agents: a manager that coordinates seven specialist roles for proof planning, skeleton scaffolding, Mathlib reconnaissance, proof construction, integration, independent review, and audit. The main methodological discipline is the hypothesis-disciplined audit, implemented by the Auditor agent: every main-theorem hypothesis and concept-layer field must be anchored in the source mathematical prose, justified as a Lean encoding adapter, marked as source-implied, or rejected as an unsupported strengthening. Using this workflow, we build a systematic formalization of asymptotic statistical theory, especially the parametric and semi-parametric models' asymptotic distribution and efficiency results. The resulting Lean development is axiom-clean and source-faithful, with Lean-checked and human-audited proofs of core parametric and semi-parametric theorems organized so that theorem-agnostic infrastructure and statistical concept definitions are separated from theorem-specific assembly. The formalization results are available at https://github.com/junwei-lu/Lean-Asymptotic-Statistical-Theory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper ships a multi-agent Lean workflow with a hypothesis audit to formalize asymptotic stats results and backs it with a public repo.

read the letter

The core thing to know is that this work describes a seven-agent pipeline for Lean 4 formalization of asymptotic statistical theory, centered on an Auditor role that forces every hypothesis and concept back to the source prose or flags it as an unsupported addition. They apply it to parametric and semi-parametric distribution and efficiency results and release the development on GitHub.

What is new is the explicit combination of those agent roles with the hypothesis-disciplined audit step; prior formalization efforts in stats have not emphasized this anchoring mechanism in the same way. The paper does well by separating theorem-agnostic infrastructure and definitions from the theorem-specific parts, which makes the code easier to maintain and extend. Delivering actual Lean-checked proofs rather than just a method description is also concrete progress.

The soft spots are minor but worth noting. The abstract and stress-test point to the repo as evidence for axiom-clean and source-faithful claims, yet without a close read of the specific proofs one cannot yet judge how fully the regularity conditions or functional-analytic parts were handled. If the formalizations required any implicit strengthenings that the audit was meant to catch, that would need clearer documentation in the paper itself.

This is aimed at people working on formal methods for statistics or AI-assisted theorem proving. A reader who wants to see how multi-agent setups can bridge the gap between statistical prose and Lean code will find usable material here. The presence of the repo makes the contribution verifiable enough that it deserves a serious referee rather than a desk reject.

Referee Report

0 major / 1 minor

Summary. The manuscript introduces a hypothesis-disciplined multi-agent pipeline for automated formalization of asymptotic statistical theory in Lean 4. The pipeline consists of a manager coordinating seven specialist agents, with the Auditor agent enforcing that all main-theorem hypotheses and concept definitions are anchored in the source mathematical prose, justified as encoding adapters, marked as source-implied, or rejected. The authors apply this to produce an axiom-clean, source-faithful Lean development with Lean-checked and human-audited proofs of core results on asymptotic distributions and efficiency in parametric and semi-parametric models. The development separates theorem-agnostic infrastructure from theorem-specific content and is available at the provided GitHub repository.

Significance. If the claims hold, the work is significant as a methodological contribution to AI-assisted formalization of domains with intricate convergence statements, expansions, and regularity conditions that exceed current Lean Mathlib coverage. The hypothesis-disciplined audit provides a concrete mechanism for source-faithfulness, and the separation of infrastructure from theorem-specific assembly improves reusability. The open repository directly enables verification of the axiom-clean and source-faithful properties, strengthening the paper's value as both a case study and a reusable artifact.

minor comments (1)

[Abstract] The abstract and introduction would benefit from naming one or two specific theorems (e.g., the asymptotic normality result for MLE or the semi-parametric efficiency bound) that were formalized, to make the scope of the Lean development immediately clear.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review, detailed summary of the contribution, and recommendation to accept the manuscript.

Circularity Check

0 steps flagged

No significant circularity in methodological workflow

full rationale

The paper describes a multi-agent workflow and hypothesis-disciplined audit for producing a Lean 4 formalization of asymptotic statistical theory, with the resulting development hosted on an external GitHub repository that can be independently verified for axiom-cleanliness and source-faithfulness. No mathematical derivations, predictions, or first-principles results are claimed that reduce to fitted parameters, self-definitions, or self-citation chains by construction. The central claim rests on the process description plus the repository evidence rather than any internal equation or theorem that loops back to its own inputs. This is a standard non-circular methodological contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract states the formalization is axiom-clean and relies on existing Lean 4 and Mathlib infrastructure; no new free parameters, domain axioms, or invented entities are introduced.

pith-pipeline@v0.9.1-grok · 5774 in / 1160 out tokens · 32245 ms · 2026-06-28T05:49:46.573374+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 1 canonical work pages

[1]

Formal- izing concentration inequalities in Rocq: infrastructure and automation

Reynald Affeldt, Alessandro Bruni, Cyril Cohen, Pierre Roux, and Takafumi Saikawa. Formal- izing concentration inequalities in Rocq: infrastructure and automation. In 16th International Conference on Interactive Theorem Proving (ITP 2025) , volume 352, pages 21:1–21:20. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2025

2025
[2]

Infotheo: A Rocq formalization of information theory and linear error-correcting codes

Reynald Affeldt, Manabu Hagiwara, Jonas Sénizergues, Jacques Garrigue, Kazuhiko Sakaguchi, Taku Asai, Takafumi Saikawa, Naruomi Obata, and Alessandro Bruni. Infotheo: A Rocq formalization of information theory and linear error-correcting codes. https://github.com/ affeldt-aist/infotheo, 2026. Latest stable release 0.9.7

2026
[3]

A formally verified proof of the central limit theorem

Jeremy A vigad, Johannes Hölzl, and Luke Serafin. A formally verified proof of the central limit theorem. Journal of Automated Reasoning , 59(4):389–423, 2017

2017
[4]

Prover agent: An agent-based framework for formal mathematical proofs

Kaito Baba, Chaoran Liu, Shuhei Kurita, and Akiyoshi Sannai. Prover agent: An agent-based framework for formal mathematical proofs. arXiv preprint arXiv:2506.19923 , 2025

arXiv 2025
[5]

AX-Prover: A deep reasoning agentic framework for theorem proving in mathematics and quantum physics

Benjamin Breen, Marco Del Tredici, Jacob McCarran, Javier Aspuru Mijares, Weichen Win- ston Yin, Kfir Sulimany, Jacob M Taylor, Frank HL Koppens, and Dirk Englund. AX-Prover: A deep reasoning agentic framework for theorem proving in mathematics and quantum physics. arXiv preprint arXiv:2510.12787 , 2025

Pith/arXiv arXiv 2025
[6]

Archon: Toward fully autonomous formalization of FirstProof’s research-level problems

Guoxiong Gao, Bin Wu, Zeming Sun, Jiedong Jiang, Wanyi He, Zichen Wang, Yutong Wang, Peihao Wu, and Bin Dong. Archon: Toward fully autonomous formalization of FirstProof’s research-level problems. https://github.com/frenzymath/Archon; announcement at https: //frenzymath.com/news/archon-firstproof/, March 2026

2026
[7]

Automatic textbook formalization

Fabian Gloeckle, Ahmad Rammal, Charles Arnal, Remi Munos, Vivien Cabannes, Gabriel Synnaeve, and Amaury Hayat. Automatic textbook formalization. arXiv preprint arXiv:2604.03071, 2026

Pith/arXiv arXiv 2026
[8]

A milestone in formalization: The sphere packing problem in dimension 8

Sidharth Hariharan, Christopher Birkbeck, Seewoo Lee, Ho Kiu Gareth Ma, Bhavik Mehta, Auguste Poiroux, and Maryna Viazovska. A milestone in formalization: The sphere packing problem in dimension 8. arXiv preprint arXiv:2604.23468 , 2026

Pith/arXiv arXiv 2026
[9]

Formalization of continuous probability distributions

Osman Hasan and Sofiene Tahar. Formalization of continuous probability distributions. In International Conference on Automated Deduction , pages 3–18. 2007

2007
[10]

Formalization of the standard uniform random variable

Osman Hasan and Sofiene Tahar. Formalization of the standard uniform random variable. Theoretical Computer Science , 382(1):71–83, 2007

2007
[11]

A formalization of the Lévy-Prokhorov metric in Isabelle/HOL

Michikazu Hirata. A formalization of the Lévy-Prokhorov metric in Isabelle/HOL. In 15th International Conference on Interactive Theorem Proving (ITP 2024) , volume 309, pages 21:1– 21:18. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2024. 22

2024
[12]

I. A. Ibragimov and R. Z. Has’minskii. Statistical Estimation: Asymptotic Theory , volume 16 of Stochastic Modelling and Applied Probability . Springer, New York, 1981

1981
[13]

Semi-autonomous formalization of the Vlasov-Maxwell-Landau equilibrium

Vasily Ilin. Semi-autonomous formalization of the Vlasov-Maxwell-Landau equilibrium. arXiv preprint arXiv:2603.15929 , 2026

arXiv 2026
[14]

Do LLMs game formalization? evalu- ating faithfulness in logical reasoning

Kyuhee Kim, Auguste Poiroux, and Antoine Bosselut. Do LLMs game formalization? evalu- ating faithfulness in logical reasoning. arXiv preprint arXiv:2604.19459 , 2026

Pith/arXiv arXiv 2026
[15]

LeanAgent: Lifelong learning for formal theorem proving

Adarsh Kumarappan, Mo Tiwari, Peiyang Song, Robert Joseph George, Chaowei Xiao, and Anima Anandkumar. LeanAgent: Lifelong learning for formal theorem proving. In Interna- tional Conference on Learning Representations , volume 2025, pages 73525–73564, 2025

2025
[16]

Asymptotic Methods in Statistical Decision Theory

Lucien Le Cam. Asymptotic Methods in Statistical Decision Theory . Springer Series in Statis- tics. Springer, New York, 1986

1986
[17]

Asymptotics in statistics: some basic concepts

Lucien Le Cam and Grace Lo Yang. Asymptotics in statistics: some basic concepts . Springer Science & Business Media, 2000

2000
[18]

Autoformalize mathematical statements by symbolic equivalence and semantic consistency

Zenan Li, Yifan Wu, Zhaoyu Li, Xinming Wei, Fan Yang, Xian Zhang, and Xiaoxing Ma. Autoformalize mathematical statements by symbolic equivalence and semantic consistency. In Advances in Neural Information Processing Systems , volume 37, pages 53598–53625, 2024

2024
[19]

Goedel-Prover-V2: Scaling formal theorem proving with scaffolded data synthesis and self-correction

Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, et al. Goedel-Prover-V2: Scaling formal theorem proving with scaffolded data synthesis and self-correction. arXiv preprint arXiv:2508.03613 , 2025

Pith/arXiv arXiv 2025
[20]

Numina-Lean-Agent: An open and general agentic reasoning system for formal mathematics

Junqi Liu, Zihao Zhou, Zekai Zhu, Marco Dos Santos, Weikun He, Jiawei Liu, Ran Wang, Yunzhou Xie, Junqiao Zhao, Qiufeng Wang, et al. Numina-Lean-Agent: An open and general agentic reasoning system for formal mathematics. arXiv preprint arXiv:2601.14027 , 2026

arXiv 2026
[21]

FormalScience: Scalable human-in-the- loop autoformalisation of science with agentic code generation in Lean

Jordan Meadows, Lan Zhang, and Andre Freitas. FormalScience: Scalable human-in-the- loop autoformalisation of science with agentic code generation in Lean. arXiv preprint arXiv:2604.23002, 2026

Pith/arXiv arXiv 2026
[22]

The Lean 4 theorem prover and programming language

Leonardo de Moura and Sebastian Ullrich. The Lean 4 theorem prover and programming language. In International Conference on Automated Deduction , pages 625–635. Springer, 2021

2021
[23]

Paulson, and Markus Wenzel

Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL: a proof assistant for higher-order logic . Springer, 2002

2002
[24]

Apollo: Automated LLM and Lean collaboration for advanced formal reasoning

Azim Ospanov, Farzan Farnia, and Roozbeh Yousefzadeh. Apollo: Automated LLM and Lean collaboration for advanced formal reasoning. In Advances in Neural Information Processing Systems, volume 38, pages 41599–41633, 2025

2025
[25]

DeepSeek-Prover-V2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition

ZZ Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, et al. DeepSeek-Prover-V2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition. arXiv preprint arXiv:2504.21801, 2025

Pith/arXiv arXiv 2025
[26]

Lean copilot: Large language models as copilots for theorem proving in Lean

Peiyang Song, Kaiyu Yang, and Anima Anandkumar. Lean copilot: Large language models as copilots for theorem proving in Lean. arXiv preprint arXiv:2404.12534 , 2024. 23

arXiv 2024
[27]

Lean formal- ization of generalization error bound by Rademacher complexity and Dudley’s entropy integral

Sho Sonoda, Kazumi Kasaura, Yuma Mizuno, Kei Tsukamoto, and Naoto Onda. Lean formal- ization of generalization error bound by Rademacher complexity and Dudley’s entropy integral. arXiv preprint arXiv:2503.19605 , 2025

Pith/arXiv arXiv 2025
[28]

The Lean mathematical library

The mathlib Community. The Lean mathematical library. In Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs , CPP 2020, pages 367– 381, New York, NY, USA, 2020. Association for Computing Machinery

2020
[29]

The Rocq prover, version 9.2.0

The Rocq Development Team. The Rocq prover, version 9.2.0. Zenodo, 2026. https://doi. org/10.5281/zenodo.19256047

work page doi:10.5281/zenodo.19256047 2026
[30]

van der Vaart

Aad W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998

1998
[31]

van der Vaart and Jon A

Aad W. van der Vaart and Jon A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics . Springer Series in Statistics. Springer, New York, 1996

1996
[32]

Hilbert: Recursively building formal proofs with informal reasoning

Sumanth Varambally, Thomas Voice, Yanchao Sun, Zhifeng Chen, Rose Yu, and Ke Ye. Hilbert: Recursively building formal proofs with informal reasoning. arXiv preprint arXiv:2509.22819 , 2025

arXiv 2025
[33]

MA-LoT: Model-collaboration Lean-based long chain-of-thought reasoning enhances formal theorem proving

Ruida Wang, Rui Pan, Yuxin Li, Jipeng Zhang, Yizhen Jia, Shizhe Diao, Renjie Pi, Junjie Hu, and Tong Zhang. MA-LoT: Model-collaboration Lean-based long chain-of-thought reasoning enhances formal theorem proving. In Proceedings of the 42nd International Conference on Machine Learning (ICML) , volume 267 of Proceedings of Machine Learning Research , pages 6...

2025
[34]

LeanDojo: Theorem proving with retrieval-augmented language models

Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan J Prenger, and Animashree Anandkumar. LeanDojo: Theorem proving with retrieval-augmented language models. In Advances in Neural Information Processing Systems , volume 36, pages 21573–21612, 2023

2023
[35]

MASA: LLM-driven multi-agent systems for autoformalization

Lan Zhang, Marco Valentino, and André Freitas. MASA: LLM-driven multi-agent systems for autoformalization. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages 615–624, 2025

2025
[36]

Statistical learning theory in Lean 4: Empirical processes from scratch

Yuanhe Zhang, Jason D Lee, and Fanghui Liu. Statistical learning theory in Lean 4: Empirical processes from scratch. arXiv preprint arXiv:2602.02285 , 2026. A Evaluation In this appendix, together with the two case studies presented in Appendix B, we answer three questions about the methodological contributions of Section 1.1: (i) is the closed artifact a...

Pith/arXiv arXiv 2026
[37]

In our case a git worktree on the same host, with the Lake package directory symlinked and the build cache copied for fast rebuilds

Agent dispatch into an isolated execution context. In our case a git worktree on the same host, with the Lake package directory symlinked and the build cache copied for fast rebuilds
[38]

Durable task cards and self-identified terminal-state returns for each dispatched agent (the recorded wave outcomes of Table 1)
[39]

bowl-shaped

Tool-mediated build verification through the platform’s shell primitive ( lake build invoked from inside the worktree by both Executors and the gating Reviewer). These primitives are common in current coding-agent environments, but this paper evaluates only one implementation: every experiment used Claude Code (Opus 4.7) for both the Manager session and a...

[1] [1]

Formal- izing concentration inequalities in Rocq: infrastructure and automation

Reynald Affeldt, Alessandro Bruni, Cyril Cohen, Pierre Roux, and Takafumi Saikawa. Formal- izing concentration inequalities in Rocq: infrastructure and automation. In 16th International Conference on Interactive Theorem Proving (ITP 2025) , volume 352, pages 21:1–21:20. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2025

2025

[2] [2]

Infotheo: A Rocq formalization of information theory and linear error-correcting codes

Reynald Affeldt, Manabu Hagiwara, Jonas Sénizergues, Jacques Garrigue, Kazuhiko Sakaguchi, Taku Asai, Takafumi Saikawa, Naruomi Obata, and Alessandro Bruni. Infotheo: A Rocq formalization of information theory and linear error-correcting codes. https://github.com/ affeldt-aist/infotheo, 2026. Latest stable release 0.9.7

2026

[3] [3]

A formally verified proof of the central limit theorem

Jeremy A vigad, Johannes Hölzl, and Luke Serafin. A formally verified proof of the central limit theorem. Journal of Automated Reasoning , 59(4):389–423, 2017

2017

[4] [4]

Prover agent: An agent-based framework for formal mathematical proofs

Kaito Baba, Chaoran Liu, Shuhei Kurita, and Akiyoshi Sannai. Prover agent: An agent-based framework for formal mathematical proofs. arXiv preprint arXiv:2506.19923 , 2025

arXiv 2025

[5] [5]

AX-Prover: A deep reasoning agentic framework for theorem proving in mathematics and quantum physics

Benjamin Breen, Marco Del Tredici, Jacob McCarran, Javier Aspuru Mijares, Weichen Win- ston Yin, Kfir Sulimany, Jacob M Taylor, Frank HL Koppens, and Dirk Englund. AX-Prover: A deep reasoning agentic framework for theorem proving in mathematics and quantum physics. arXiv preprint arXiv:2510.12787 , 2025

Pith/arXiv arXiv 2025

[6] [6]

Archon: Toward fully autonomous formalization of FirstProof’s research-level problems

Guoxiong Gao, Bin Wu, Zeming Sun, Jiedong Jiang, Wanyi He, Zichen Wang, Yutong Wang, Peihao Wu, and Bin Dong. Archon: Toward fully autonomous formalization of FirstProof’s research-level problems. https://github.com/frenzymath/Archon; announcement at https: //frenzymath.com/news/archon-firstproof/, March 2026

2026

[7] [7]

Automatic textbook formalization

Fabian Gloeckle, Ahmad Rammal, Charles Arnal, Remi Munos, Vivien Cabannes, Gabriel Synnaeve, and Amaury Hayat. Automatic textbook formalization. arXiv preprint arXiv:2604.03071, 2026

Pith/arXiv arXiv 2026

[8] [8]

A milestone in formalization: The sphere packing problem in dimension 8

Sidharth Hariharan, Christopher Birkbeck, Seewoo Lee, Ho Kiu Gareth Ma, Bhavik Mehta, Auguste Poiroux, and Maryna Viazovska. A milestone in formalization: The sphere packing problem in dimension 8. arXiv preprint arXiv:2604.23468 , 2026

Pith/arXiv arXiv 2026

[9] [9]

Formalization of continuous probability distributions

Osman Hasan and Sofiene Tahar. Formalization of continuous probability distributions. In International Conference on Automated Deduction , pages 3–18. 2007

2007

[10] [10]

Formalization of the standard uniform random variable

Osman Hasan and Sofiene Tahar. Formalization of the standard uniform random variable. Theoretical Computer Science , 382(1):71–83, 2007

2007

[11] [11]

A formalization of the Lévy-Prokhorov metric in Isabelle/HOL

Michikazu Hirata. A formalization of the Lévy-Prokhorov metric in Isabelle/HOL. In 15th International Conference on Interactive Theorem Proving (ITP 2024) , volume 309, pages 21:1– 21:18. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2024. 22

2024

[12] [12]

I. A. Ibragimov and R. Z. Has’minskii. Statistical Estimation: Asymptotic Theory , volume 16 of Stochastic Modelling and Applied Probability . Springer, New York, 1981

1981

[13] [13]

Semi-autonomous formalization of the Vlasov-Maxwell-Landau equilibrium

Vasily Ilin. Semi-autonomous formalization of the Vlasov-Maxwell-Landau equilibrium. arXiv preprint arXiv:2603.15929 , 2026

arXiv 2026

[14] [14]

Do LLMs game formalization? evalu- ating faithfulness in logical reasoning

Kyuhee Kim, Auguste Poiroux, and Antoine Bosselut. Do LLMs game formalization? evalu- ating faithfulness in logical reasoning. arXiv preprint arXiv:2604.19459 , 2026

Pith/arXiv arXiv 2026

[15] [15]

LeanAgent: Lifelong learning for formal theorem proving

Adarsh Kumarappan, Mo Tiwari, Peiyang Song, Robert Joseph George, Chaowei Xiao, and Anima Anandkumar. LeanAgent: Lifelong learning for formal theorem proving. In Interna- tional Conference on Learning Representations , volume 2025, pages 73525–73564, 2025

2025

[16] [16]

Asymptotic Methods in Statistical Decision Theory

Lucien Le Cam. Asymptotic Methods in Statistical Decision Theory . Springer Series in Statis- tics. Springer, New York, 1986

1986

[17] [17]

Asymptotics in statistics: some basic concepts

Lucien Le Cam and Grace Lo Yang. Asymptotics in statistics: some basic concepts . Springer Science & Business Media, 2000

2000

[18] [18]

Autoformalize mathematical statements by symbolic equivalence and semantic consistency

Zenan Li, Yifan Wu, Zhaoyu Li, Xinming Wei, Fan Yang, Xian Zhang, and Xiaoxing Ma. Autoformalize mathematical statements by symbolic equivalence and semantic consistency. In Advances in Neural Information Processing Systems , volume 37, pages 53598–53625, 2024

2024

[19] [19]

Goedel-Prover-V2: Scaling formal theorem proving with scaffolded data synthesis and self-correction

Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, et al. Goedel-Prover-V2: Scaling formal theorem proving with scaffolded data synthesis and self-correction. arXiv preprint arXiv:2508.03613 , 2025

Pith/arXiv arXiv 2025

[20] [20]

Numina-Lean-Agent: An open and general agentic reasoning system for formal mathematics

Junqi Liu, Zihao Zhou, Zekai Zhu, Marco Dos Santos, Weikun He, Jiawei Liu, Ran Wang, Yunzhou Xie, Junqiao Zhao, Qiufeng Wang, et al. Numina-Lean-Agent: An open and general agentic reasoning system for formal mathematics. arXiv preprint arXiv:2601.14027 , 2026

arXiv 2026

[21] [21]

FormalScience: Scalable human-in-the- loop autoformalisation of science with agentic code generation in Lean

Jordan Meadows, Lan Zhang, and Andre Freitas. FormalScience: Scalable human-in-the- loop autoformalisation of science with agentic code generation in Lean. arXiv preprint arXiv:2604.23002, 2026

Pith/arXiv arXiv 2026

[22] [22]

The Lean 4 theorem prover and programming language

Leonardo de Moura and Sebastian Ullrich. The Lean 4 theorem prover and programming language. In International Conference on Automated Deduction , pages 625–635. Springer, 2021

2021

[23] [23]

Paulson, and Markus Wenzel

Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL: a proof assistant for higher-order logic . Springer, 2002

2002

[24] [24]

Apollo: Automated LLM and Lean collaboration for advanced formal reasoning

Azim Ospanov, Farzan Farnia, and Roozbeh Yousefzadeh. Apollo: Automated LLM and Lean collaboration for advanced formal reasoning. In Advances in Neural Information Processing Systems, volume 38, pages 41599–41633, 2025

2025

[25] [25]

DeepSeek-Prover-V2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition

ZZ Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, et al. DeepSeek-Prover-V2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition. arXiv preprint arXiv:2504.21801, 2025

Pith/arXiv arXiv 2025

[26] [26]

Lean copilot: Large language models as copilots for theorem proving in Lean

Peiyang Song, Kaiyu Yang, and Anima Anandkumar. Lean copilot: Large language models as copilots for theorem proving in Lean. arXiv preprint arXiv:2404.12534 , 2024. 23

arXiv 2024

[27] [27]

Lean formal- ization of generalization error bound by Rademacher complexity and Dudley’s entropy integral

Sho Sonoda, Kazumi Kasaura, Yuma Mizuno, Kei Tsukamoto, and Naoto Onda. Lean formal- ization of generalization error bound by Rademacher complexity and Dudley’s entropy integral. arXiv preprint arXiv:2503.19605 , 2025

Pith/arXiv arXiv 2025

[28] [28]

The Lean mathematical library

The mathlib Community. The Lean mathematical library. In Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs , CPP 2020, pages 367– 381, New York, NY, USA, 2020. Association for Computing Machinery

2020

[29] [29]

The Rocq prover, version 9.2.0

The Rocq Development Team. The Rocq prover, version 9.2.0. Zenodo, 2026. https://doi. org/10.5281/zenodo.19256047

work page doi:10.5281/zenodo.19256047 2026

[30] [30]

van der Vaart

Aad W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998

1998

[31] [31]

van der Vaart and Jon A

Aad W. van der Vaart and Jon A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics . Springer Series in Statistics. Springer, New York, 1996

1996

[32] [32]

Hilbert: Recursively building formal proofs with informal reasoning

Sumanth Varambally, Thomas Voice, Yanchao Sun, Zhifeng Chen, Rose Yu, and Ke Ye. Hilbert: Recursively building formal proofs with informal reasoning. arXiv preprint arXiv:2509.22819 , 2025

arXiv 2025

[33] [33]

MA-LoT: Model-collaboration Lean-based long chain-of-thought reasoning enhances formal theorem proving

Ruida Wang, Rui Pan, Yuxin Li, Jipeng Zhang, Yizhen Jia, Shizhe Diao, Renjie Pi, Junjie Hu, and Tong Zhang. MA-LoT: Model-collaboration Lean-based long chain-of-thought reasoning enhances formal theorem proving. In Proceedings of the 42nd International Conference on Machine Learning (ICML) , volume 267 of Proceedings of Machine Learning Research , pages 6...

2025

[34] [34]

LeanDojo: Theorem proving with retrieval-augmented language models

Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan J Prenger, and Animashree Anandkumar. LeanDojo: Theorem proving with retrieval-augmented language models. In Advances in Neural Information Processing Systems , volume 36, pages 21573–21612, 2023

2023

[35] [35]

MASA: LLM-driven multi-agent systems for autoformalization

Lan Zhang, Marco Valentino, and André Freitas. MASA: LLM-driven multi-agent systems for autoformalization. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages 615–624, 2025

2025

[36] [36]

Statistical learning theory in Lean 4: Empirical processes from scratch

Yuanhe Zhang, Jason D Lee, and Fanghui Liu. Statistical learning theory in Lean 4: Empirical processes from scratch. arXiv preprint arXiv:2602.02285 , 2026. A Evaluation In this appendix, together with the two case studies presented in Appendix B, we answer three questions about the methodological contributions of Section 1.1: (i) is the closed artifact a...

Pith/arXiv arXiv 2026

[37] [37]

In our case a git worktree on the same host, with the Lake package directory symlinked and the build cache copied for fast rebuilds

Agent dispatch into an isolated execution context. In our case a git worktree on the same host, with the Lake package directory symlinked and the build cache copied for fast rebuilds

[38] [38]

Durable task cards and self-identified terminal-state returns for each dispatched agent (the recorded wave outcomes of Table 1)

[39] [39]

bowl-shaped

Tool-mediated build verification through the platform’s shell primitive ( lake build invoked from inside the worktree by both Executors and the gating Reviewer). These primitives are common in current coding-agent environments, but this paper evaluates only one implementation: every experiment used Claude Code (Opus 4.7) for both the Manager session and a...