Verifiable Auto-Formalization of Mathematics Using a Relaxed Natural Formal Language
Pith reviewed 2026-06-25 21:57 UTC · model grok-4.3
The pith
A relaxed natural formal language serves as an intermediate target that preserves informal reasoning structure while allowing verifiable elaboration into core proofs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Relaxed NFL serves as an intermediate target that preserves the structure of informal reasoning and allows partially specified expressions, with remaining ambiguity resolved in a verifiable elaboration stage that produces Core NFL proofs whose gaps are discharged by LLM-generated scripts in a domain-specific tactic language.
What carries the argument
The Relaxed Natural Formal Language (Relaxed NFL), which functions as an intermediate representation between informal mathematical writing and fully formal Core NFL proofs with defined semantics.
If this is right
- Auto-formalization targets can stay closer to informal writing without immediate loss of structure.
- Verifiability is maintained by explicit constraints on each transformation rather than end-to-end checking.
- Proof gaps become explicit verification conditions that a controlled tactic language can address.
- Semantic consistency evaluation shifts to the Core NFL stage after elaboration.
- The separation of stages allows partial specifications to be handled later without blocking initial formalization.
Where Pith is reading between the lines
- This staged approach might lower the barrier for LLMs to produce usable formal content from existing mathematical texts.
- If the tactic language remains domain-specific, the method could be tested on subfields where solvers already exist.
- The explicit constraints on elaboration steps provide a possible template for verifiable LLM use in other translation tasks.
- Applying the pipeline to a known informal proof and measuring the number of discharged gaps would give a direct test of coverage.
Load-bearing premise
The combination of rule-based transformations and LLM-generated heuristics in the elaboration procedure can be kept under explicit constraints that preserve verifiability while the tactic language and LLM scripts reliably discharge gaps without undetected errors.
What would settle it
A case in which an elaboration step produces a Core NFL proof whose verification conditions are discharged by the tactic scripts, yet the resulting formal statement fails to hold when checked independently in an existing theorem prover.
Figures
read the original abstract
Auto-formalization aims to translate informal mathematical content into formal languages that can be processed by theorem provers. However, directly targeting existing theorem provers requires LLMs to bridge a substantial representational gap between informal mathematical writing and formal proof languages. This gap also makes semantic consistency difficult to evaluate. We address these difficulties by introducing a Relaxed Natural Formal Language (Relaxed NFL) as an intermediate target for auto-formalization. The Relaxed NFL is designed to remain close to informal mathematical writing: it preserves the usual structure of informal reasoning and allows partially specified expressions and propositions, without requiring their precise interpretation to be fixed at the auto-formalization stage. The remaining ambiguity and implicitness inherited from informal reasoning are resolved during a later elaboration stage, which transforms Relaxed NFL proofs into Core Natural Formal Language (Core NFL) proofs with formally defined semantics. The elaboration procedure combines rule-based transformations with LLM-generated heuristics, while maintaining verifiability through explicit constraints on each transformation step. The Core NFL is then used to generate proof gaps, namely verification conditions that must hold for the formalized proof to be correct. These gaps are discharged by LLM-generated proof scripts written in a domain-specific tactic language, which provides commands for invoking theorem libraries and domain-specific solvers implemented as part of our system.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Relaxed Natural Formal Language (Relaxed NFL) as an intermediate target for auto-formalization of mathematics. Relaxed NFL preserves the structure of informal reasoning while permitting partially specified expressions; an elaboration stage combining rule-based transformations with LLM-generated heuristics (under explicit constraints) produces Core NFL proofs with defined semantics. Verification conditions extracted from Core NFL are discharged by LLM-generated scripts in a domain-specific tactic language that invokes theorem libraries and domain-specific solvers.
Significance. If the staged architecture with mechanically enforceable constraints can be realized, the approach would address a recognized bottleneck in direct LLM-based auto-formalization by reducing the semantic gap and providing an explicit route to verifiable elaboration. The separation into Relaxed NFL, constrained elaboration, and tactic-based gap discharge is a coherent architectural idea that could be evaluated against existing intermediate-representation strategies in formal methods.
major comments (3)
- [Abstract] Abstract: the central claim that 'the elaboration procedure combines rule-based transformations with LLM-generated heuristics, while maintaining verifiability through explicit constraints on each transformation step' is load-bearing, yet the manuscript supplies neither a definition of the constraints nor an argument that they are mechanically checkable and sufficient to exclude semantic errors introduced by the LLM heuristics.
- [Abstract] Abstract: the domain-specific tactic language is described only at the level of 'commands for invoking theorem libraries and domain-specific solvers'; no syntax, semantics, or soundness argument is given, which is required to substantiate that LLM-generated scripts can discharge verification conditions without undetected errors.
- [Abstract] Abstract: the claim that 'remaining ambiguity and implicitness ... are resolved during a later elaboration stage' that 'transforms Relaxed NFL proofs into Core NFL proofs with formally defined semantics' lacks any indication of how the formal semantics of Core NFL are defined or how the transformation preserves them under partial specifications.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for identifying specific gaps between the claims in the abstract and the supporting material in the manuscript. We agree that each of the three points requires additional technical detail and will perform a major revision to supply it.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'the elaboration procedure combines rule-based transformations with LLM-generated heuristics, while maintaining verifiability through explicit constraints on each transformation step' is load-bearing, yet the manuscript supplies neither a definition of the constraints nor an argument that they are mechanically checkable and sufficient to exclude semantic errors introduced by the LLM heuristics.
Authors: We accept the criticism. The current manuscript states the existence of constraints but does not define them formally or prove their sufficiency. In the revised version we will add a new subsection (Section 3.2) that (i) gives an explicit syntax for the constraint predicates, (ii) describes a mechanical checker implemented as a small set of recursive functions, and (iii) contains a short argument that any elaboration step satisfying the constraints cannot introduce semantic divergence from the original Relaxed NFL statement. revision: yes
-
Referee: [Abstract] Abstract: the domain-specific tactic language is described only at the level of 'commands for invoking theorem libraries and domain-specific solvers'; no syntax, semantics, or soundness argument is given, which is required to substantiate that LLM-generated scripts can discharge verification conditions without undetected errors.
Authors: We agree that the present description is insufficient. The revision will include an appendix that defines the concrete syntax of the tactic language, gives its operational semantics as a small-step transition system over proof states, and sketches a soundness theorem showing that every successful tactic execution corresponds to a valid inference in the underlying Core NFL semantics. We will also supply a small example script together with its verification trace. revision: yes
-
Referee: [Abstract] Abstract: the claim that 'remaining ambiguity and implicitness ... are resolved during a later elaboration stage' that 'transforms Relaxed NFL proofs into Core NFL proofs with formally defined semantics' lacks any indication of how the formal semantics of Core NFL are defined or how the transformation preserves them under partial specifications.
Authors: The manuscript currently leaves the semantics of Core NFL implicit. We will revise by adding Section 4 that (i) defines the semantics of Core NFL via a translation into a simply-typed lambda calculus with dependent types and (ii) proves, by induction on the elaboration rules, that every well-constrained elaboration step preserves the set of models of the original partially specified statement. The proof will explicitly handle the case of unresolved sub-expressions by treating them as universally quantified parameters. revision: yes
Circularity Check
No circularity: architectural proposal with independent components
full rationale
The paper presents a new system architecture for auto-formalization using Relaxed NFL as an intermediate language, followed by elaboration to Core NFL and LLM-generated tactic scripts for verification conditions. No derivation chain, equations, or predictions are claimed; the central claims are design choices and procedural descriptions rather than reductions of outputs to author-defined inputs or self-citations. The text contains no self-citation load-bearing steps, no fitted parameters renamed as predictions, and no uniqueness theorems imported from prior author work. The method is self-contained as a proposal whose verifiability claims rest on explicit constraints and external theorem provers, not on internal redefinitions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Bancerek, G., Byliński, C., Grabowski, A., Korniłowicz, A., Matuszewski, R., Nau- mowicz, A., Pąk, K.: The role of the mizar mathematical library for interactive proof development in mizar. J. Autom. Reason. 61(1–4), 9–32 (Jun 2018). https:// doi.org/10.1007/s10817-017-9440-6 , https://doi.org/10.1007/s10817-017-9440-6
-
[2]
Springer Publishing Company, Incorporated, 1st edn
Bertot, Y., Castran, P.: Interactive Theorem Proving and Program Development: Coq’Art The Calculus of Inductive Constructions. Springer Publishing Company, Incorporated, 1st edn. (2010)
2010
-
[3]
Cao, Q., Xie, L., Yan, J.: The llm era demands natural-language-aligned theorem provers for mathematics. In: Proceedings of the 1st ACM SIGPLAN International Workshop on Language Models and Programming Languages. p. 46–50. LMPL ’25, Association for Computing Machinery, New York, NY, USA (2025). https: //doi.org/10.1145/3759425.3763384, https://doi.org/10....
-
[4]
Chen, L., Gu, J., Huang, L., Huang, W., Jiang, Z., Jie, A., Jin, X., Jin, X., Li, C., Ma, K., Ren, C., Shen, J., Shi, W., Sun, T., Sun, H., Wang, J., Wang, S., Wang, Z., Wei, C., Wei, S., Wu, Y., Wu, Y., Xia, Y., Xin, H., Yang, F., Ying, H., Yuan, H., Yuan, Z., Zhan, T., Zhang, C., Zhang, Y., Zhang, G., Zhao, T., Zhao, J., Zhou, Y., Zhu, T.H.: Seed-prover...
arXiv 2025
-
[5]
mathlib Community, T.: The lean mathematical library. In: Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs. p. 367–381. POPL ’20, ACM (Jan 2020). https://doi.org/10.1145/3372885.3373824, http://dx.doi.org/10.1145/3372885.3373824
-
[6]
De Lon, A., Koepke, P., Lorenzen, A., Marti, A., Schütz, M., Wenzel, M.: The isabelle/naproche natural language proof assistant. In: Automated Deduction – CADE 28: 28th International Conference on Automated Deduction, Virtual Event, July 12–15, 2021, Proceedings. p. 614–624. Springer-Verlag, Berlin, Heidelberg (2021). https://doi.org/10.1007/978-3-030-798...
-
[7]
Mir Publishers (1970)
Demidovich, B.P.: Problems in Mathematical Analysis. Mir Publishers (1970)
1970
-
[8]
Guo, Q., Wang, J., Zhang, J., Kong, D., Huang, X., Xi, X., Wang, W., Wang, J., Cai, X., Zhang, S., Ye, W.: Autoformalizer with tool feedback (2025), https: //arxiv.org/abs/2510.06857
arXiv 2025
-
[9]
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: Lora: Low-rank adaptation of large language models (2021), https://arxiv. org/abs/2106.09685
Pith/arXiv arXiv 2021
-
[10]
Olympiad-level formal mathematical reasoning with reinforcement learning
Hubert, T., Mehta, R., Sartran, L., Horváth, M.Z., Žužić, G., Wieser, E., Huang, A., Schrittwieser, J., Schroecker, Y., Masoom, H., Bertolli, O., Zahavy, T., Mand- hane, A., Yung, J., Beloshapka, I., Ibarz, B., Veeriah, V., Yu, L., Nash, O., Lezeau, P., Mercuri, S., Sönne, C., Mehta, B., Davies, A., Zheng, D., Pedregosa, F., Li, Y., von Glehn, I., Rowland...
-
[11]
In: Proceedings of the 38th International Conference on Neural Information Processing Systems
Jiang, A.Q., Li, W., Jamnik, M.: Multi-language diversity benefits autoformaliza- tion. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. NIPS ’24, Curran Associates Inc., Red Hook, NY, USA (2024)
2024
-
[12]
Jiang, A.Q., Welleck, S., Zhou, J.P., Li, W., Liu, J., Jamnik, M., Lacroix, T., Wu, Y., Lample, G.: Draft, sketch, and prove: Guiding formal theorem provers with informal proofs (2023), https://arxiv.org/abs/2210.12283
arXiv 2023
-
[13]
NeMo guardrails: A toolkit for controllable and safe LLM applications with pro- grammable rails
Leang, J.O.J., Hong, G., Li, W., Cohen, S.B.: Theorem prover as a judge for synthetic data generation. In: Proceedings of the 63rd Annual Meeting of the As- sociation for Computational Linguistics (Volume 1: Long Papers). p. 29941–29977. Association for Computational Linguistics (2025). https://doi.org/10.18653/v1/ 2025.acl-long.1448, http://dx.doi.org/10...
-
[14]
Lin, Y., Tang, S., Lyu, B., Wu, J., Lin, H., Yang, K., Li, J., Xia, M., Chen, D., Arora, S., Jin, C.: Goedel-prover: A frontier model for open-source automated theorem proving (2025), https://arxiv.org/abs/2502.07640
arXiv 2025
-
[15]
In: The Thirteenth International Conference on Learning Representations (2025), https://openreview.net/forum?id=hUb2At2DsQ 18 Authors Suppressed Due to Excessive Length
Liu, Q., Zheng, X., Lu, X., Cao, Q., Yan, J.: Rethinking and improving autofor- malization: Towards a faithful metric and a dependency retrieval-based approach. In: The Thirteenth International Conference on Learning Representations (2025), https://openreview.net/forum?id=hUb2At2DsQ 18 Authors Suppressed Due to Excessive Length
2025
-
[16]
Liu, Q., Zheng, X., Xia, R., Qi, X., Cao, Q., Yan, J.: Beyond theorem proving: Formulation, framework and benchmark for formal problem-solving (2025), https: //arxiv.org/abs/2505.04528
arXiv 2025
-
[17]
Liu, Y., Zhu, T., Liu, X., Chen, Y., Liu, Z., Guo, Q., Zhang, J., Bao, K., Luo, T.: Generalized tree edit distance (gted): A faithful evaluation metric for statement autoformalization (2025), https://arxiv.org/abs/2507.07399
arXiv 2025
-
[18]
Lu, W., Du, L., Li, S., Weng, K., Sun, H., Liu, H., Yu, M., Zhang, T., Yu, G.: Automated formalization via conceptual retrieval-augmented llms (2026), https: //arxiv.org/abs/2508.06931
arXiv 2026
-
[19]
In: Bertot, Y., Kutsia, T., Norrish, M
Massot, P.: Teaching Mathematics Using Lean and Controlled Natural Language. In: Bertot, Y., Kutsia, T., Norrish, M. (eds.) 15th International Conference on In- teractive Theorem Proving (ITP 2024). Leibniz International Proceedings in Infor- matics (LIPIcs), vol. 309, pp. 27:1–27:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany (2...
-
[20]
doi: 10.1007/978-3-030-79876-5_37
Moura, L.d., Ullrich, S.: The lean 4 theorem prover and programming language. In: Automated Deduction – CADE 28: 28th International Conference on Automated Deduction, Virtual Event, July 12–15, 2021, Proceedings. p. 625–635. Springer- Verlag, Berlin, Heidelberg (2021). https://doi.org/10.1007/978-3-030-79876-5_37 , https://doi.org/10.1007/978-3-030-79876-5_37
-
[21]
In: Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics
Naumowicz, A., Korniłowicz, A.: A brief overview of mizar. In: Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics. p. 67–72. TPHOLs ’09, Springer-Verlag, Berlin, Heidelberg (2009). https://doi.org/ 10.1007/978-3-642-03359-9_5 , https://doi.org/10.1007/978-3-642-03359-9_5
-
[22]
Springer-Verlag, Berlin, Heidelberg (2002)
Nipkow, T., Wenzel, M., Paulson, L.C.: Isabelle/HOL: a proof assistant for higher- order logic. Springer-Verlag, Berlin, Heidelberg (2002)
2002
-
[23]
In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2026), https://openreview.net/forum? id=KtaHv0YUyh
Ospanov, A., Farnia, F., Yousefzadeh, R.: minif2f-lean revisited: Reviewing lim- itations and charting a path forward. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2026), https://openreview.net/forum? id=KtaHv0YUyh
2026
-
[24]
Ren, Z.Z., Shao, Z., Song, J., Xin, H., Wang, H., Zhao, W., Zhang, L., Fu, Z., Zhu, Q., Yang, D., Wu, Z.F., Gou, Z., Ma, S., Tang, H., Liu, Y., Gao, W., Guo, D., Ruan, C.: Deepseek-prover-v2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition (2025), https://arxiv.org/abs/ 2504.21801
Pith/arXiv arXiv 2025
-
[25]
Wang, H., Unsal, M., Lin, X., Baksys, M., Liu, J., Santos, M.D., Sung, F., Vinyes, M., Ying, Z., Zhu, Z., Lu, J., de Saxcé, H., Bailey, B., Song, C., Xiao, C., Zhang, D., Zhang, E., Pu, F., Zhu, H., Liu, J., Bayer, J., Michel, J., Yu, L., Dreyfus-Schmidt, L., Tunstall, L., Pagani, L., Machado, M., Bourigault, P., Wang, R., Polu, S., Barroyer, T., Li, W.D....
Pith/arXiv arXiv 2025
-
[26]
Wang, H., Xin, H., Zheng, C., Li, L., Liu, Z., Cao, Q., Huang, Y., Xiong, J., Shi, H., Xie, E., Yin, J., Li, Z., Liao, H., Liang, X.: Lego-prover: Neural theorem proving with growing libraries (2023), https://arxiv.org/abs/2310.00656
arXiv 2023
-
[27]
In: Theorem Proving in Higher Order Logics: TPHOLs 1999
Wenzel, M.: Isabelle/isar: A versatile environment for human-readable formal proof documents. In: Theorem Proving in Higher Order Logics: TPHOLs 1999. Lecture Notes in Computer Science, vol. 1690. Springer (1999)
1999
-
[28]
In: Proceedings of the 20th International Conference on Theorem Proving in Higher Order Logics
Wiedijk, F.: Mizar’s soft type system. In: Proceedings of the 20th International Conference on Theorem Proving in Higher Order Logics. p. 383–399. TPHOLs’07, Springer-Verlag, Berlin, Heidelberg (2007) Title Suppressed Due to Excessive Length 19
2007
-
[29]
Xin, H., Guo, D., Shao, Z., Ren, Z., Zhu, Q., Liu, B., Ruan, C., Li, W., Liang, X.: Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data (2024), https://arxiv.org/abs/2405.14333
arXiv 2024
-
[30]
Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., Zheng, C., Liu, D., Zhou, F., Huang, F., Hu, F., Ge, H., Wei, H., Lin, H., Tang, J., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Zhou, J., Lin, J., Dang, K., Bao, K., Yang, K., Yu, L., Deng, L., Li, M., Xue, M., Li, M., Zhang, P., Wang, P., Zhu, Q...
Pith/arXiv arXiv 2025
-
[31]
Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., Zhou, C., Li, C., Li, C., Liu, D., Huang, F., Dong, G., Wei, H., Lin, H., Tang, J., Wang, J., Yang, J., Tu, J., Zhang, J., Ma, J., Yang, J., Xu, J., Zhou, J., Bai, J., He, J., Lin, J., Dang, K., Lu, K., Chen, K., Yang, K., Li, M., Xue, M., Ni, N., Zhang, P., Wang, P., Peng, R., Men, R., Gao, R., Lin, R., Wan...
Pith/arXiv arXiv 2024
-
[32]
Zheng, K., Han, J.M., Polu, S.: Minif2f: a cross-system benchmark for formal olympiad-level mathematics (2022), https://arxiv.org/abs/2109.00110 20 Authors Suppressed Due to Excessive Length A An example proof written in the Relaxed NFL This appendix presents the Relaxed NFL formalization of a functional-equation problem from the 1999 International Mathem...
Pith/arXiv arXiv 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.