Verifiable Auto-Formalization of Mathematics Using a Relaxed Natural Formal Language

Lihan Xie; Qinxiang Cao; Xingzhi Qi; Yingjun Lan; Zhehao Li; Zhicheng Hui

arxiv: 2606.24443 · v1 · pith:XG2ETMB4new · submitted 2026-06-23 · 💻 cs.LO · cs.PL

Verifiable Auto-Formalization of Mathematics Using a Relaxed Natural Formal Language

Zhicheng Hui , Lihan Xie , Xingzhi Qi , Zhehao Li , Yingjun Lan , Qinxiang Cao This is my paper

Pith reviewed 2026-06-25 21:57 UTC · model grok-4.3

classification 💻 cs.LO cs.PL

keywords auto-formalizationnatural formal languagetheorem provingelaboration procedureverification conditionstactic languagerelaxed languageLLM-generated scripts

0 comments

The pith

A relaxed natural formal language serves as an intermediate target that preserves informal reasoning structure while allowing verifiable elaboration into core proofs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Relaxed NFL as an intermediate representation for translating informal mathematics into formal proofs. This language keeps the usual structure of human reasoning and permits partially specified expressions without fixing their precise meaning upfront. A subsequent elaboration stage converts Relaxed NFL into Core NFL using rule-based steps combined with LLM heuristics, all under explicit constraints that keep each transformation checkable. Verification conditions generated from the Core NFL are then discharged by LLM-written scripts in a domain-specific tactic language. The approach separates the initial formalization from ambiguity resolution to make semantic consistency easier to assess.

Core claim

The Relaxed NFL serves as an intermediate target that preserves the structure of informal reasoning and allows partially specified expressions, with remaining ambiguity resolved in a verifiable elaboration stage that produces Core NFL proofs whose gaps are discharged by LLM-generated scripts in a domain-specific tactic language.

What carries the argument

The Relaxed Natural Formal Language (Relaxed NFL), which functions as an intermediate representation between informal mathematical writing and fully formal Core NFL proofs with defined semantics.

If this is right

Auto-formalization targets can stay closer to informal writing without immediate loss of structure.
Verifiability is maintained by explicit constraints on each transformation rather than end-to-end checking.
Proof gaps become explicit verification conditions that a controlled tactic language can address.
Semantic consistency evaluation shifts to the Core NFL stage after elaboration.
The separation of stages allows partial specifications to be handled later without blocking initial formalization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This staged approach might lower the barrier for LLMs to produce usable formal content from existing mathematical texts.
If the tactic language remains domain-specific, the method could be tested on subfields where solvers already exist.
The explicit constraints on elaboration steps provide a possible template for verifiable LLM use in other translation tasks.
Applying the pipeline to a known informal proof and measuring the number of discharged gaps would give a direct test of coverage.

Load-bearing premise

The combination of rule-based transformations and LLM-generated heuristics in the elaboration procedure can be kept under explicit constraints that preserve verifiability while the tactic language and LLM scripts reliably discharge gaps without undetected errors.

What would settle it

A case in which an elaboration step produces a Core NFL proof whose verification conditions are discharged by the tactic scripts, yet the resulting formal statement fails to hold when checked independently in an existing theorem prover.

Figures

Figures reproduced from arXiv: 2606.24443 by Lihan Xie, Qinxiang Cao, Xingzhi Qi, Yingjun Lan, Zhehao Li, Zhicheng Hui.

**Figure 1.** Figure 1: The same proof snippet written in natural language, the Mizar proof language, and the Lean proof language [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of the auto-formalization pipeline. – We propose a novel approach to LLM-based auto-formalization based on an intermediate representation that resembles informal mathematical writing. – We study how to turn this intermediate representation into a semantically well-defined formal proof, and develop a verifiable elaboration procedure. – We study how proof gaps generated along this pipeline can be ef… view at source ↗

**Figure 3.** Figure 3: Elimination of the variable scope annotation in LLM’s output. The output of this elaboration step is then verified by a deterministic checker. The checker ensures that all the annotated free variables occur within a matching scope, and that no other part of the proof has been modified. Finally, we eliminate the explicit variable scope annotation in the Relaxed NFL as in [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗

read the original abstract

Auto-formalization aims to translate informal mathematical content into formal languages that can be processed by theorem provers. However, directly targeting existing theorem provers requires LLMs to bridge a substantial representational gap between informal mathematical writing and formal proof languages. This gap also makes semantic consistency difficult to evaluate. We address these difficulties by introducing a Relaxed Natural Formal Language (Relaxed NFL) as an intermediate target for auto-formalization. The Relaxed NFL is designed to remain close to informal mathematical writing: it preserves the usual structure of informal reasoning and allows partially specified expressions and propositions, without requiring their precise interpretation to be fixed at the auto-formalization stage. The remaining ambiguity and implicitness inherited from informal reasoning are resolved during a later elaboration stage, which transforms Relaxed NFL proofs into Core Natural Formal Language (Core NFL) proofs with formally defined semantics. The elaboration procedure combines rule-based transformations with LLM-generated heuristics, while maintaining verifiability through explicit constraints on each transformation step. The Core NFL is then used to generate proof gaps, namely verification conditions that must hold for the formalized proof to be correct. These gaps are discharged by LLM-generated proof scripts written in a domain-specific tactic language, which provides commands for invoking theorem libraries and domain-specific solvers implemented as part of our system.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a high-level architectural sketch for staged auto-formalization via a relaxed intermediate language, but it supplies no examples, semantics, or results to show the verifiability claims hold.

read the letter

The main takeaway is that the paper outlines a three-stage pipeline—Relaxed NFL as an ambiguous intermediate target close to informal math, constrained elaboration to Core NFL, and a custom tactic language for LLM-generated gap-filling scripts—but stops at the proposal stage with nothing concrete to evaluate.

What stands out as new is the deliberate choice to keep partial expressions and informal structure in the first layer, then resolve ambiguity through a mix of rule-based steps and LLM heuristics that are supposed to stay under explicit constraints. The tactic language for discharging verification conditions is also presented as a distinct piece rather than a direct reuse of existing prover tactics. This addresses the representational gap that direct auto-formalization hits, and the staging idea itself is a reasonable response to that problem.

The soft spots are exactly where the stress-test note flags them. The verifiability claim rests on those explicit constraints being sufficient to prevent undetected errors from the LLM components, yet the description gives no formal semantics for the tactic language, no mechanical check for the constraints, and no trace of even a small example. Without that, it is impossible to tell whether the constraints are syntactic only or actually catch semantic issues when LLMs choose heuristics or write scripts. The absence of any implementation details or experiments makes the soundness hard to assess.

This is for people already working on LLM-assisted formalization who are hunting for new intermediate representations. A reader looking for architectural options might pick up the high-level design, but it does not yet contain results or code that could be cited or reused directly.

It deserves peer review because the staged approach is distinct enough from prior direct-translation work to merit expert feedback, even if the current version needs substantial fleshing out on the constraint mechanism and validation.

Referee Report

3 major / 0 minor

Summary. The paper proposes Relaxed Natural Formal Language (Relaxed NFL) as an intermediate target for auto-formalization of mathematics. Relaxed NFL preserves the structure of informal reasoning while permitting partially specified expressions; an elaboration stage combining rule-based transformations with LLM-generated heuristics (under explicit constraints) produces Core NFL proofs with defined semantics. Verification conditions extracted from Core NFL are discharged by LLM-generated scripts in a domain-specific tactic language that invokes theorem libraries and domain-specific solvers.

Significance. If the staged architecture with mechanically enforceable constraints can be realized, the approach would address a recognized bottleneck in direct LLM-based auto-formalization by reducing the semantic gap and providing an explicit route to verifiable elaboration. The separation into Relaxed NFL, constrained elaboration, and tactic-based gap discharge is a coherent architectural idea that could be evaluated against existing intermediate-representation strategies in formal methods.

major comments (3)

[Abstract] Abstract: the central claim that 'the elaboration procedure combines rule-based transformations with LLM-generated heuristics, while maintaining verifiability through explicit constraints on each transformation step' is load-bearing, yet the manuscript supplies neither a definition of the constraints nor an argument that they are mechanically checkable and sufficient to exclude semantic errors introduced by the LLM heuristics.
[Abstract] Abstract: the domain-specific tactic language is described only at the level of 'commands for invoking theorem libraries and domain-specific solvers'; no syntax, semantics, or soundness argument is given, which is required to substantiate that LLM-generated scripts can discharge verification conditions without undetected errors.
[Abstract] Abstract: the claim that 'remaining ambiguity and implicitness ... are resolved during a later elaboration stage' that 'transforms Relaxed NFL proofs into Core NFL proofs with formally defined semantics' lacks any indication of how the formal semantics of Core NFL are defined or how the transformation preserves them under partial specifications.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and for identifying specific gaps between the claims in the abstract and the supporting material in the manuscript. We agree that each of the three points requires additional technical detail and will perform a major revision to supply it.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'the elaboration procedure combines rule-based transformations with LLM-generated heuristics, while maintaining verifiability through explicit constraints on each transformation step' is load-bearing, yet the manuscript supplies neither a definition of the constraints nor an argument that they are mechanically checkable and sufficient to exclude semantic errors introduced by the LLM heuristics.

Authors: We accept the criticism. The current manuscript states the existence of constraints but does not define them formally or prove their sufficiency. In the revised version we will add a new subsection (Section 3.2) that (i) gives an explicit syntax for the constraint predicates, (ii) describes a mechanical checker implemented as a small set of recursive functions, and (iii) contains a short argument that any elaboration step satisfying the constraints cannot introduce semantic divergence from the original Relaxed NFL statement. revision: yes
Referee: [Abstract] Abstract: the domain-specific tactic language is described only at the level of 'commands for invoking theorem libraries and domain-specific solvers'; no syntax, semantics, or soundness argument is given, which is required to substantiate that LLM-generated scripts can discharge verification conditions without undetected errors.

Authors: We agree that the present description is insufficient. The revision will include an appendix that defines the concrete syntax of the tactic language, gives its operational semantics as a small-step transition system over proof states, and sketches a soundness theorem showing that every successful tactic execution corresponds to a valid inference in the underlying Core NFL semantics. We will also supply a small example script together with its verification trace. revision: yes
Referee: [Abstract] Abstract: the claim that 'remaining ambiguity and implicitness ... are resolved during a later elaboration stage' that 'transforms Relaxed NFL proofs into Core NFL proofs with formally defined semantics' lacks any indication of how the formal semantics of Core NFL are defined or how the transformation preserves them under partial specifications.

Authors: The manuscript currently leaves the semantics of Core NFL implicit. We will revise by adding Section 4 that (i) defines the semantics of Core NFL via a translation into a simply-typed lambda calculus with dependent types and (ii) proves, by induction on the elaboration rules, that every well-constrained elaboration step preserves the set of models of the original partially specified statement. The proof will explicitly handle the case of unresolved sub-expressions by treating them as universally quantified parameters. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural proposal with independent components

full rationale

The paper presents a new system architecture for auto-formalization using Relaxed NFL as an intermediate language, followed by elaboration to Core NFL and LLM-generated tactic scripts for verification conditions. No derivation chain, equations, or predictions are claimed; the central claims are design choices and procedural descriptions rather than reductions of outputs to author-defined inputs or self-citations. The text contains no self-citation load-bearing steps, no fitted parameters renamed as predictions, and no uniqueness theorems imported from prior author work. The method is self-contained as a proposal whose verifiability claims rest on explicit constraints and external theorem provers, not on internal redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient detail to enumerate specific free parameters, axioms, or invented entities; the Relaxed NFL itself is the central new construct but is not accompanied by listed assumptions or fitted values.

pith-pipeline@v0.9.1-grok · 5774 in / 1085 out tokens · 29724 ms · 2026-06-25T21:57:01.443675+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 9 canonical work pages

[1]

Bancerek, G., Byliński, C., Grabowski, A., Korniłowicz, A., Matuszewski, R., Nau- mowicz, A., Pąk, K.: The role of the mizar mathematical library for interactive proof development in mizar. J. Autom. Reason. 61(1–4), 9–32 (Jun 2018). https:// doi.org/10.1007/s10817-017-9440-6 , https://doi.org/10.1007/s10817-017-9440-6

work page doi:10.1007/s10817-017-9440-6 2018
[2]

Springer Publishing Company, Incorporated, 1st edn

Bertot, Y., Castran, P.: Interactive Theorem Proving and Program Development: Coq’Art The Calculus of Inductive Constructions. Springer Publishing Company, Incorporated, 1st edn. (2010)

2010
[3]

In: Proceedings of the 1st ACM SIGPLAN International Workshop on Language Models and Programming Languages

Cao, Q., Xie, L., Yan, J.: The llm era demands natural-language-aligned theorem provers for mathematics. In: Proceedings of the 1st ACM SIGPLAN International Workshop on Language Models and Programming Languages. p. 46–50. LMPL ’25, Association for Computing Machinery, New York, NY, USA (2025). https: //doi.org/10.1145/3759425.3763384, https://doi.org/10....

work page doi:10.1145/3759425.3763384 2025
[4]

Chen, L., Gu, J., Huang, L., Huang, W., Jiang, Z., Jie, A., Jin, X., Jin, X., Li, C., Ma, K., Ren, C., Shen, J., Shi, W., Sun, T., Sun, H., Wang, J., Wang, S., Wang, Z., Wei, C., Wei, S., Wu, Y., Wu, Y., Xia, Y., Xin, H., Yang, F., Ying, H., Yuan, H., Yuan, Z., Zhan, T., Zhang, C., Zhang, Y., Zhang, G., Zhao, T., Zhao, J., Zhou, Y., Zhu, T.H.: Seed-prover...

arXiv 2025
[5]

The lean mathematical library

mathlib Community, T.: The lean mathematical library. In: Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs. p. 367–381. POPL ’20, ACM (Jan 2020). https://doi.org/10.1145/3372885.3373824, http://dx.doi.org/10.1145/3372885.3373824

work page doi:10.1145/3372885.3373824 2020
[6]

In: Automated Deduction – CADE 28: 28th International Conference on Automated Deduction, Virtual Event, July 12–15, 2021, Proceedings

De Lon, A., Koepke, P., Lorenzen, A., Marti, A., Schütz, M., Wenzel, M.: The isabelle/naproche natural language proof assistant. In: Automated Deduction – CADE 28: 28th International Conference on Automated Deduction, Virtual Event, July 12–15, 2021, Proceedings. p. 614–624. Springer-Verlag, Berlin, Heidelberg (2021). https://doi.org/10.1007/978-3-030-798...

work page doi:10.1007/978-3-030-79876-5_36 2021
[7]

Mir Publishers (1970)

Demidovich, B.P.: Problems in Mathematical Analysis. Mir Publishers (1970)

1970
[8]

Guo, Q., Wang, J., Zhang, J., Kong, D., Huang, X., Xi, X., Wang, W., Wang, J., Cai, X., Zhang, S., Ye, W.: Autoformalizer with tool feedback (2025), https: //arxiv.org/abs/2510.06857

arXiv 2025
[9]

org/abs/2106.09685

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: Lora: Low-rank adaptation of large language models (2021), https://arxiv. org/abs/2106.09685

Pith/arXiv arXiv 2021
[10]

Olympiad-level formal mathematical reasoning with reinforcement learning

Hubert, T., Mehta, R., Sartran, L., Horváth, M.Z., Žužić, G., Wieser, E., Huang, A., Schrittwieser, J., Schroecker, Y., Masoom, H., Bertolli, O., Zahavy, T., Mand- hane, A., Yung, J., Beloshapka, I., Ibarz, B., Veeriah, V., Yu, L., Nash, O., Lezeau, P., Mercuri, S., Sönne, C., Mehta, B., Davies, A., Zheng, D., Pedregosa, F., Li, Y., von Glehn, I., Rowland...

work page doi:10.1038/s41586-025-09833-y 2025
[11]

In: Proceedings of the 38th International Conference on Neural Information Processing Systems

Jiang, A.Q., Li, W., Jamnik, M.: Multi-language diversity benefits autoformaliza- tion. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. NIPS ’24, Curran Associates Inc., Red Hook, NY, USA (2024)

2024
[12]

Jiang, A.Q., Welleck, S., Zhou, J.P., Li, W., Liu, J., Jamnik, M., Lacroix, T., Wu, Y., Lample, G.: Draft, sketch, and prove: Guiding formal theorem provers with informal proofs (2023), https://arxiv.org/abs/2210.12283

arXiv 2023
[13]

NeMo guardrails: A toolkit for controllable and safe LLM applications with pro- grammable rails

Leang, J.O.J., Hong, G., Li, W., Cohen, S.B.: Theorem prover as a judge for synthetic data generation. In: Proceedings of the 63rd Annual Meeting of the As- sociation for Computational Linguistics (Volume 1: Long Papers). p. 29941–29977. Association for Computational Linguistics (2025). https://doi.org/10.18653/v1/ 2025.acl-long.1448, http://dx.doi.org/10...

work page doi:10.18653/v1/ 2025
[14]

Lin, Y., Tang, S., Lyu, B., Wu, J., Lin, H., Yang, K., Li, J., Xia, M., Chen, D., Arora, S., Jin, C.: Goedel-prover: A frontier model for open-source automated theorem proving (2025), https://arxiv.org/abs/2502.07640

arXiv 2025
[15]

In: The Thirteenth International Conference on Learning Representations (2025), https://openreview.net/forum?id=hUb2At2DsQ 18 Authors Suppressed Due to Excessive Length

Liu, Q., Zheng, X., Lu, X., Cao, Q., Yan, J.: Rethinking and improving autofor- malization: Towards a faithful metric and a dependency retrieval-based approach. In: The Thirteenth International Conference on Learning Representations (2025), https://openreview.net/forum?id=hUb2At2DsQ 18 Authors Suppressed Due to Excessive Length

2025
[16]

Liu, Q., Zheng, X., Xia, R., Qi, X., Cao, Q., Yan, J.: Beyond theorem proving: Formulation, framework and benchmark for formal problem-solving (2025), https: //arxiv.org/abs/2505.04528

arXiv 2025
[17]

Liu, Y., Zhu, T., Liu, X., Chen, Y., Liu, Z., Guo, Q., Zhang, J., Bao, K., Luo, T.: Generalized tree edit distance (gted): A faithful evaluation metric for statement autoformalization (2025), https://arxiv.org/abs/2507.07399

arXiv 2025
[18]

Lu, W., Du, L., Li, S., Weng, K., Sun, H., Liu, H., Yu, M., Zhang, T., Yu, G.: Automated formalization via conceptual retrieval-augmented llms (2026), https: //arxiv.org/abs/2508.06931

arXiv 2026
[19]

In: Bertot, Y., Kutsia, T., Norrish, M

Massot, P.: Teaching Mathematics Using Lean and Controlled Natural Language. In: Bertot, Y., Kutsia, T., Norrish, M. (eds.) 15th International Conference on In- teractive Theorem Proving (ITP 2024). Leibniz International Proceedings in Infor- matics (LIPIcs), vol. 309, pp. 27:1–27:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany (2...

work page doi:10.4230/lipics.itp.2024 2024
[20]

doi: 10.1007/978-3-030-79876-5_37

Moura, L.d., Ullrich, S.: The lean 4 theorem prover and programming language. In: Automated Deduction – CADE 28: 28th International Conference on Automated Deduction, Virtual Event, July 12–15, 2021, Proceedings. p. 625–635. Springer- Verlag, Berlin, Heidelberg (2021). https://doi.org/10.1007/978-3-030-79876-5_37 , https://doi.org/10.1007/978-3-030-79876-5_37

work page doi:10.1007/978-3-030-79876-5_37 2021
[21]

In: Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics

Naumowicz, A., Korniłowicz, A.: A brief overview of mizar. In: Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics. p. 67–72. TPHOLs ’09, Springer-Verlag, Berlin, Heidelberg (2009). https://doi.org/ 10.1007/978-3-642-03359-9_5 , https://doi.org/10.1007/978-3-642-03359-9_5

work page doi:10.1007/978-3-642-03359-9_5 2009
[22]

Springer-Verlag, Berlin, Heidelberg (2002)

Nipkow, T., Wenzel, M., Paulson, L.C.: Isabelle/HOL: a proof assistant for higher- order logic. Springer-Verlag, Berlin, Heidelberg (2002)

2002
[23]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2026), https://openreview.net/forum? id=KtaHv0YUyh

Ospanov, A., Farnia, F., Yousefzadeh, R.: minif2f-lean revisited: Reviewing lim- itations and charting a path forward. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2026), https://openreview.net/forum? id=KtaHv0YUyh

2026
[24]

Ren, Z.Z., Shao, Z., Song, J., Xin, H., Wang, H., Zhao, W., Zhang, L., Fu, Z., Zhu, Q., Yang, D., Wu, Z.F., Gou, Z., Ma, S., Tang, H., Liu, Y., Gao, W., Guo, D., Ruan, C.: Deepseek-prover-v2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition (2025), https://arxiv.org/abs/ 2504.21801

Pith/arXiv arXiv 2025
[25]

Wang, H., Unsal, M., Lin, X., Baksys, M., Liu, J., Santos, M.D., Sung, F., Vinyes, M., Ying, Z., Zhu, Z., Lu, J., de Saxcé, H., Bailey, B., Song, C., Xiao, C., Zhang, D., Zhang, E., Pu, F., Zhu, H., Liu, J., Bayer, J., Michel, J., Yu, L., Dreyfus-Schmidt, L., Tunstall, L., Pagani, L., Machado, M., Bourigault, P., Wang, R., Polu, S., Barroyer, T., Li, W.D....

Pith/arXiv arXiv 2025
[26]

Wang, H., Xin, H., Zheng, C., Li, L., Liu, Z., Cao, Q., Huang, Y., Xiong, J., Shi, H., Xie, E., Yin, J., Li, Z., Liao, H., Liang, X.: Lego-prover: Neural theorem proving with growing libraries (2023), https://arxiv.org/abs/2310.00656

arXiv 2023
[27]

In: Theorem Proving in Higher Order Logics: TPHOLs 1999

Wenzel, M.: Isabelle/isar: A versatile environment for human-readable formal proof documents. In: Theorem Proving in Higher Order Logics: TPHOLs 1999. Lecture Notes in Computer Science, vol. 1690. Springer (1999)

1999
[28]

In: Proceedings of the 20th International Conference on Theorem Proving in Higher Order Logics

Wiedijk, F.: Mizar’s soft type system. In: Proceedings of the 20th International Conference on Theorem Proving in Higher Order Logics. p. 383–399. TPHOLs’07, Springer-Verlag, Berlin, Heidelberg (2007) Title Suppressed Due to Excessive Length 19

2007
[29]

Xin, H., Guo, D., Shao, Z., Ren, Z., Zhu, Q., Liu, B., Ruan, C., Li, W., Liang, X.: Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data (2024), https://arxiv.org/abs/2405.14333

arXiv 2024
[30]

Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., Zheng, C., Liu, D., Zhou, F., Huang, F., Hu, F., Ge, H., Wei, H., Lin, H., Tang, J., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Zhou, J., Lin, J., Dang, K., Bao, K., Yang, K., Yu, L., Deng, L., Li, M., Xue, M., Li, M., Zhang, P., Wang, P., Zhu, Q...

Pith/arXiv arXiv 2025
[31]

Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., Zhou, C., Li, C., Li, C., Liu, D., Huang, F., Dong, G., Wei, H., Lin, H., Tang, J., Wang, J., Yang, J., Tu, J., Zhang, J., Ma, J., Yang, J., Xu, J., Zhou, J., Bai, J., He, J., Lin, J., Dang, K., Lu, K., Chen, K., Yang, K., Li, M., Xue, M., Ni, N., Zhang, P., Wang, P., Peng, R., Men, R., Gao, R., Lin, R., Wan...

Pith/arXiv arXiv 2024
[32]

The solution is adapted from the one available on the Art of Problem Solving website: https: //artofproblemsolving.com/wiki/index.php/1999_IMO_Problems/Problem_6

Zheng, K., Han, J.M., Polu, S.: Minif2f: a cross-system benchmark for formal olympiad-level mathematics (2022), https://arxiv.org/abs/2109.00110 20 Authors Suppressed Due to Excessive Length A An example proof written in the Relaxed NFL This appendix presents the Relaxed NFL formalization of a functional-equation problem from the 1999 International Mathem...

Pith/arXiv arXiv 2022

[1] [1]

Bancerek, G., Byliński, C., Grabowski, A., Korniłowicz, A., Matuszewski, R., Nau- mowicz, A., Pąk, K.: The role of the mizar mathematical library for interactive proof development in mizar. J. Autom. Reason. 61(1–4), 9–32 (Jun 2018). https:// doi.org/10.1007/s10817-017-9440-6 , https://doi.org/10.1007/s10817-017-9440-6

work page doi:10.1007/s10817-017-9440-6 2018

[2] [2]

Springer Publishing Company, Incorporated, 1st edn

Bertot, Y., Castran, P.: Interactive Theorem Proving and Program Development: Coq’Art The Calculus of Inductive Constructions. Springer Publishing Company, Incorporated, 1st edn. (2010)

2010

[3] [3]

In: Proceedings of the 1st ACM SIGPLAN International Workshop on Language Models and Programming Languages

Cao, Q., Xie, L., Yan, J.: The llm era demands natural-language-aligned theorem provers for mathematics. In: Proceedings of the 1st ACM SIGPLAN International Workshop on Language Models and Programming Languages. p. 46–50. LMPL ’25, Association for Computing Machinery, New York, NY, USA (2025). https: //doi.org/10.1145/3759425.3763384, https://doi.org/10....

work page doi:10.1145/3759425.3763384 2025

[4] [4]

Chen, L., Gu, J., Huang, L., Huang, W., Jiang, Z., Jie, A., Jin, X., Jin, X., Li, C., Ma, K., Ren, C., Shen, J., Shi, W., Sun, T., Sun, H., Wang, J., Wang, S., Wang, Z., Wei, C., Wei, S., Wu, Y., Wu, Y., Xia, Y., Xin, H., Yang, F., Ying, H., Yuan, H., Yuan, Z., Zhan, T., Zhang, C., Zhang, Y., Zhang, G., Zhao, T., Zhao, J., Zhou, Y., Zhu, T.H.: Seed-prover...

arXiv 2025

[5] [5]

The lean mathematical library

mathlib Community, T.: The lean mathematical library. In: Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs. p. 367–381. POPL ’20, ACM (Jan 2020). https://doi.org/10.1145/3372885.3373824, http://dx.doi.org/10.1145/3372885.3373824

work page doi:10.1145/3372885.3373824 2020

[6] [6]

In: Automated Deduction – CADE 28: 28th International Conference on Automated Deduction, Virtual Event, July 12–15, 2021, Proceedings

De Lon, A., Koepke, P., Lorenzen, A., Marti, A., Schütz, M., Wenzel, M.: The isabelle/naproche natural language proof assistant. In: Automated Deduction – CADE 28: 28th International Conference on Automated Deduction, Virtual Event, July 12–15, 2021, Proceedings. p. 614–624. Springer-Verlag, Berlin, Heidelberg (2021). https://doi.org/10.1007/978-3-030-798...

work page doi:10.1007/978-3-030-79876-5_36 2021

[7] [7]

Mir Publishers (1970)

Demidovich, B.P.: Problems in Mathematical Analysis. Mir Publishers (1970)

1970

[8] [8]

Guo, Q., Wang, J., Zhang, J., Kong, D., Huang, X., Xi, X., Wang, W., Wang, J., Cai, X., Zhang, S., Ye, W.: Autoformalizer with tool feedback (2025), https: //arxiv.org/abs/2510.06857

arXiv 2025

[9] [9]

org/abs/2106.09685

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: Lora: Low-rank adaptation of large language models (2021), https://arxiv. org/abs/2106.09685

Pith/arXiv arXiv 2021

[10] [10]

Olympiad-level formal mathematical reasoning with reinforcement learning

Hubert, T., Mehta, R., Sartran, L., Horváth, M.Z., Žužić, G., Wieser, E., Huang, A., Schrittwieser, J., Schroecker, Y., Masoom, H., Bertolli, O., Zahavy, T., Mand- hane, A., Yung, J., Beloshapka, I., Ibarz, B., Veeriah, V., Yu, L., Nash, O., Lezeau, P., Mercuri, S., Sönne, C., Mehta, B., Davies, A., Zheng, D., Pedregosa, F., Li, Y., von Glehn, I., Rowland...

work page doi:10.1038/s41586-025-09833-y 2025

[11] [11]

In: Proceedings of the 38th International Conference on Neural Information Processing Systems

Jiang, A.Q., Li, W., Jamnik, M.: Multi-language diversity benefits autoformaliza- tion. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. NIPS ’24, Curran Associates Inc., Red Hook, NY, USA (2024)

2024

[12] [12]

Jiang, A.Q., Welleck, S., Zhou, J.P., Li, W., Liu, J., Jamnik, M., Lacroix, T., Wu, Y., Lample, G.: Draft, sketch, and prove: Guiding formal theorem provers with informal proofs (2023), https://arxiv.org/abs/2210.12283

arXiv 2023

[13] [13]

NeMo guardrails: A toolkit for controllable and safe LLM applications with pro- grammable rails

Leang, J.O.J., Hong, G., Li, W., Cohen, S.B.: Theorem prover as a judge for synthetic data generation. In: Proceedings of the 63rd Annual Meeting of the As- sociation for Computational Linguistics (Volume 1: Long Papers). p. 29941–29977. Association for Computational Linguistics (2025). https://doi.org/10.18653/v1/ 2025.acl-long.1448, http://dx.doi.org/10...

work page doi:10.18653/v1/ 2025

[14] [14]

Lin, Y., Tang, S., Lyu, B., Wu, J., Lin, H., Yang, K., Li, J., Xia, M., Chen, D., Arora, S., Jin, C.: Goedel-prover: A frontier model for open-source automated theorem proving (2025), https://arxiv.org/abs/2502.07640

arXiv 2025

[15] [15]

In: The Thirteenth International Conference on Learning Representations (2025), https://openreview.net/forum?id=hUb2At2DsQ 18 Authors Suppressed Due to Excessive Length

Liu, Q., Zheng, X., Lu, X., Cao, Q., Yan, J.: Rethinking and improving autofor- malization: Towards a faithful metric and a dependency retrieval-based approach. In: The Thirteenth International Conference on Learning Representations (2025), https://openreview.net/forum?id=hUb2At2DsQ 18 Authors Suppressed Due to Excessive Length

2025

[16] [16]

Liu, Q., Zheng, X., Xia, R., Qi, X., Cao, Q., Yan, J.: Beyond theorem proving: Formulation, framework and benchmark for formal problem-solving (2025), https: //arxiv.org/abs/2505.04528

arXiv 2025

[17] [17]

Liu, Y., Zhu, T., Liu, X., Chen, Y., Liu, Z., Guo, Q., Zhang, J., Bao, K., Luo, T.: Generalized tree edit distance (gted): A faithful evaluation metric for statement autoformalization (2025), https://arxiv.org/abs/2507.07399

arXiv 2025

[18] [18]

Lu, W., Du, L., Li, S., Weng, K., Sun, H., Liu, H., Yu, M., Zhang, T., Yu, G.: Automated formalization via conceptual retrieval-augmented llms (2026), https: //arxiv.org/abs/2508.06931

arXiv 2026

[19] [19]

In: Bertot, Y., Kutsia, T., Norrish, M

Massot, P.: Teaching Mathematics Using Lean and Controlled Natural Language. In: Bertot, Y., Kutsia, T., Norrish, M. (eds.) 15th International Conference on In- teractive Theorem Proving (ITP 2024). Leibniz International Proceedings in Infor- matics (LIPIcs), vol. 309, pp. 27:1–27:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany (2...

work page doi:10.4230/lipics.itp.2024 2024

[20] [20]

doi: 10.1007/978-3-030-79876-5_37

Moura, L.d., Ullrich, S.: The lean 4 theorem prover and programming language. In: Automated Deduction – CADE 28: 28th International Conference on Automated Deduction, Virtual Event, July 12–15, 2021, Proceedings. p. 625–635. Springer- Verlag, Berlin, Heidelberg (2021). https://doi.org/10.1007/978-3-030-79876-5_37 , https://doi.org/10.1007/978-3-030-79876-5_37

work page doi:10.1007/978-3-030-79876-5_37 2021

[21] [21]

In: Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics

Naumowicz, A., Korniłowicz, A.: A brief overview of mizar. In: Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics. p. 67–72. TPHOLs ’09, Springer-Verlag, Berlin, Heidelberg (2009). https://doi.org/ 10.1007/978-3-642-03359-9_5 , https://doi.org/10.1007/978-3-642-03359-9_5

work page doi:10.1007/978-3-642-03359-9_5 2009

[22] [22]

Springer-Verlag, Berlin, Heidelberg (2002)

Nipkow, T., Wenzel, M., Paulson, L.C.: Isabelle/HOL: a proof assistant for higher- order logic. Springer-Verlag, Berlin, Heidelberg (2002)

2002

[23] [23]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2026), https://openreview.net/forum? id=KtaHv0YUyh

Ospanov, A., Farnia, F., Yousefzadeh, R.: minif2f-lean revisited: Reviewing lim- itations and charting a path forward. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2026), https://openreview.net/forum? id=KtaHv0YUyh

2026

[24] [24]

Ren, Z.Z., Shao, Z., Song, J., Xin, H., Wang, H., Zhao, W., Zhang, L., Fu, Z., Zhu, Q., Yang, D., Wu, Z.F., Gou, Z., Ma, S., Tang, H., Liu, Y., Gao, W., Guo, D., Ruan, C.: Deepseek-prover-v2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition (2025), https://arxiv.org/abs/ 2504.21801

Pith/arXiv arXiv 2025

[25] [25]

Wang, H., Unsal, M., Lin, X., Baksys, M., Liu, J., Santos, M.D., Sung, F., Vinyes, M., Ying, Z., Zhu, Z., Lu, J., de Saxcé, H., Bailey, B., Song, C., Xiao, C., Zhang, D., Zhang, E., Pu, F., Zhu, H., Liu, J., Bayer, J., Michel, J., Yu, L., Dreyfus-Schmidt, L., Tunstall, L., Pagani, L., Machado, M., Bourigault, P., Wang, R., Polu, S., Barroyer, T., Li, W.D....

Pith/arXiv arXiv 2025

[26] [26]

Wang, H., Xin, H., Zheng, C., Li, L., Liu, Z., Cao, Q., Huang, Y., Xiong, J., Shi, H., Xie, E., Yin, J., Li, Z., Liao, H., Liang, X.: Lego-prover: Neural theorem proving with growing libraries (2023), https://arxiv.org/abs/2310.00656

arXiv 2023

[27] [27]

In: Theorem Proving in Higher Order Logics: TPHOLs 1999

Wenzel, M.: Isabelle/isar: A versatile environment for human-readable formal proof documents. In: Theorem Proving in Higher Order Logics: TPHOLs 1999. Lecture Notes in Computer Science, vol. 1690. Springer (1999)

1999

[28] [28]

In: Proceedings of the 20th International Conference on Theorem Proving in Higher Order Logics

Wiedijk, F.: Mizar’s soft type system. In: Proceedings of the 20th International Conference on Theorem Proving in Higher Order Logics. p. 383–399. TPHOLs’07, Springer-Verlag, Berlin, Heidelberg (2007) Title Suppressed Due to Excessive Length 19

2007

[29] [29]

Xin, H., Guo, D., Shao, Z., Ren, Z., Zhu, Q., Liu, B., Ruan, C., Li, W., Liang, X.: Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data (2024), https://arxiv.org/abs/2405.14333

arXiv 2024

[30] [30]

Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., Zheng, C., Liu, D., Zhou, F., Huang, F., Hu, F., Ge, H., Wei, H., Lin, H., Tang, J., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Zhou, J., Lin, J., Dang, K., Bao, K., Yang, K., Yu, L., Deng, L., Li, M., Xue, M., Li, M., Zhang, P., Wang, P., Zhu, Q...

Pith/arXiv arXiv 2025

[31] [31]

Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., Zhou, C., Li, C., Li, C., Liu, D., Huang, F., Dong, G., Wei, H., Lin, H., Tang, J., Wang, J., Yang, J., Tu, J., Zhang, J., Ma, J., Yang, J., Xu, J., Zhou, J., Bai, J., He, J., Lin, J., Dang, K., Lu, K., Chen, K., Yang, K., Li, M., Xue, M., Ni, N., Zhang, P., Wang, P., Peng, R., Men, R., Gao, R., Lin, R., Wan...

Pith/arXiv arXiv 2024

[32] [32]

The solution is adapted from the one available on the Art of Problem Solving website: https: //artofproblemsolving.com/wiki/index.php/1999_IMO_Problems/Problem_6

Zheng, K., Han, J.M., Polu, S.: Minif2f: a cross-system benchmark for formal olympiad-level mathematics (2022), https://arxiv.org/abs/2109.00110 20 Authors Suppressed Due to Excessive Length A An example proof written in the Relaxed NFL This appendix presents the Relaxed NFL formalization of a functional-equation problem from the 1999 International Mathem...

Pith/arXiv arXiv 2022