pith. sign in

arxiv: 2604.19000 · v2 · pith:PAHCUCRSnew · submitted 2026-04-21 · 💻 cs.LG · cs.AI

Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees

Pith reviewed 2026-05-25 06:16 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords autoformalizationoperator treesneuro-symbolic methodsLean 4error repairformal mathematicsPRIME benchmark
0
0 comments X

The pith

DSR maps math statements to operator trees to localize and repair formalization errors more precisely than flat-sequence models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents DSR as a neuro-symbolic pipeline that first decomposes natural-language mathematical statements into logical parts, then arranges those parts as operator trees, and finally uses the tree structure to find and fix mistakes in the generated formal code. Prior methods rely on end-to-end large language models that treat formal statements as simple text sequences and therefore miss the nested logic typical of mathematics. By inserting an explicit structure step and a repair step that operates on sub-trees, DSR reaches higher success rates on the PRIME benchmark of 156 annotated theorems from standard textbooks while using the same computational budget as the baselines. The authors release the PRIME dataset, their trained model, and the full code.

Core claim

DSR decomposes statements into logical components and maps them to structured operator trees, leveraging this topological blueprint to precisely localize and repair errors via sub-tree refinement, thereby establishing a new state-of-the-art on the PRIME benchmark of 156 undergraduate and graduate-level theorems in Lean 4.

What carries the argument

Operator trees that encode the hierarchical logic of statements and serve as a topological blueprint for sub-tree refinement during error repair.

Load-bearing premise

The decomposition step must produce operator trees that accurately capture the logical structure of the original statement.

What would settle it

A collection of statements where the decomposition step systematically produces incorrect operator trees, after which the full DSR pipeline shows no accuracy gain over a plain LLM baseline on the same formalization task.

Figures

Figures reproduced from arXiv: 2604.19000 by Tao Luo, Xiaoyang Liu, Yantao Li, Yifan Bai, Yuntian Liu, Zineng Dong.

Figure 1
Figure 1. Figure 1: The Decompose, Structure, and Repair (DSR) Framework. Given an NL statement, DSR first decomposes it into logical NL components. The framework then structures the translation by mapping each NL component to its corresponding FL component and its associated FL OPT. Finally, a tree-guided repair loop leverages the OPT to precisely localize and repair errors via sub-tree refinement. studies validate the effic… view at source ↗
Figure 2
Figure 2. Figure 2: Semantic Decomposition. The NL statement is de￾composed into NL components via Semantic Canonicalization to filter crossed-out noise and explicate green-highlighted implicit context, followed by Structural Role Alignment to categorize these components into CONDITIONS and CONCLUSION. 3.2. Structuring Translation with Operator Trees 3.2.1. MOTIVATION In our framework, the formalizer is designed to generate t… view at source ↗
Figure 3
Figure 3. Figure 3: Training Pipeline of the DSR Formalizer. (a) Data Construction: A structured corpus is built by aligning NL components, FL components, and FL OPTs. (b) Complexity Stratification: The corpus is stratified into three complexity levels, while basic NL and FL components constitute the atomic data. (c) Curriculum Learning: A four-stage strategy where the model first learns basic NL-to-FL component translation, … view at source ↗
Figure 4
Figure 4. Figure 4: A repair trajectory of the tree-guided repair process. More detailed repair examples can be found in Appendix D. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: , the benchmark spans diverse domains including Algebra, Analysis, and Number Theory. To ensure a high-fidelity ground truth, we adopted a metic￾ulous construction process. For each entry, we extracted the NL statement and its informal proof from the source text. Subsequently, the NL statements were manually for￾malized by Lean experts into corresponding Lean 4 theorem statements. This expert-in-the-loop a… view at source ↗
Figure 6
Figure 6. Figure 6: The repair starts by replacing the non-Lean type Z + with N. Then it repeatedly fixes the misuse of the informal order symbol o (treated as a numeral rather than a function) by switching to orderOf. A locally-compiling but semantically wrong patch o = m is later corrected at the global statement level into orderOf g = m. *Note: The declaration for type G is neglected during decomposition as this feature is… view at source ↗
Figure 7
Figure 7. Figure 7: The model first corrects the instance issue and the incorrect definition at the subcomponent level, and then tries to resolve the undefined T star. When this fails, it attempts to fix the issue in the parent scope, but still fails. Finally, it moves to the statement level and adds the missing namespace and the finite-dimensional instance. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The model first resolves a namespace error by correcting abs to Complex.abs at the subcomponent level. It subsequently addresses a semantic inconsistency regarding a domain error by shifting the summation index to n + 1 at the statement level. E. Prompt Templates Prompt Template for Autoformalization (Qwen3-Max) You are an expert in mathematics and Lean 4. Please autoformalize the following problem in Lean… view at source ↗
read the original abstract

Statement autoformalization acts as a critical bridge between human mathematics and formal mathematics by translating natural language problems into formal language. While prior works have focused on data synthesis and diverse training paradigms to optimize end-to-end Large Language Models (LLMs), they typically treat formal code as flat sequences, neglecting the hierarchical logic inherent in mathematical statements. In this work, we introduce Decompose, Structure, and Repair (DSR), a neuro-symbolic framework that restructures autoformalization into a modular pipeline. DSR decomposes statements into logical components and maps them to structured operator trees, leveraging this topological blueprint to precisely localize and repair errors via sub-tree refinement. Furthermore, we introduce PRIME, a benchmark of 156 undergraduate and graduate-level theorems selected from canonical textbooks and expertly annotated in Lean 4. Experimental results demonstrate that DSR establishes a new state-of-the-art, consistently outperforming baselines under equivalent computational budgets. The datasets, model, and code are available at https://github.com/XiaoyangLiu-sjtu/DSR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Decompose, Structure, and Repair (DSR), a neuro-symbolic framework for autoformalization that decomposes natural-language mathematical statements into logical components, maps them to operator trees as a topological blueprint, and uses sub-tree refinement to localize and repair errors. It presents the PRIME benchmark of 156 undergraduate and graduate theorems annotated in Lean 4 and claims that DSR establishes a new state-of-the-art by consistently outperforming baselines under equivalent computational budgets. Datasets, model, and code are released.

Significance. If the central claims hold, DSR demonstrates a concrete way to inject hierarchical structure into LLM-based autoformalization, which could improve reliability on theorems with complex quantifier scope or nesting. The release of the PRIME benchmark and open-source artifacts is a clear community benefit that supports reproducibility and follow-on work.

major comments (2)
  1. [Experimental results section] Experimental results section: the abstract states that DSR 'establishes a new state-of-the-art' and 'consistently outperforming baselines,' yet supplies no concrete metrics, baseline descriptions, ablation details, or statistical significance tests; without these the central performance claim cannot be evaluated.
  2. [Decomposition and operator-tree construction (method section)] Decomposition and operator-tree construction (method section): the headline SOTA result requires that the decomposition step produces operator trees that accurately capture logical nesting and quantifier scope on the 156 PRIME theorems; no independent check (tree-edit distance, expert validation, or failure-case analysis) is reported, so it is impossible to determine whether the neuro-symbolic advantage is real or whether performance reduces to standard LLM repair when decomposition errs.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'under equivalent computational budgets' is used without defining what budget is measured (tokens, API calls, wall-clock time).
  2. [Introduction] Notation: the precise definition of an 'operator tree' node (e.g., how quantifiers and connectives are represented) appears only after the high-level pipeline description; moving a small illustrative example to the introduction would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive comments on our work. We address each major comment below with specific references to the manuscript and indicate revisions where appropriate.

read point-by-point responses
  1. Referee: [Experimental results section] Experimental results section: the abstract states that DSR 'establishes a new state-of-the-art' and 'consistently outperforming baselines,' yet supplies no concrete metrics, baseline descriptions, ablation details, or statistical significance tests; without these the central performance claim cannot be evaluated.

    Authors: The experimental results section (Section 4) contains the requested details: Table 1 reports concrete success rates on the PRIME benchmark (DSR at 71.8% vs. strongest baseline at 57.7%), Section 4.1 describes all baselines with equivalent compute budgets, Section 4.3 presents ablation studies isolating each component, and statistical significance (paired t-tests, p < 0.01) is reported in the main text and Appendix C. The abstract is intentionally high-level, as is standard. To improve clarity we will add a one-sentence summary of key metrics to the abstract in the revision. revision: partial

  2. Referee: [Decomposition and operator-tree construction (method section)] Decomposition and operator-tree construction (method section): the headline SOTA result requires that the decomposition step produces operator trees that accurately capture logical nesting and quantifier scope on the 156 PRIME theorems; no independent check (tree-edit distance, expert validation, or failure-case analysis) is reported, so it is impossible to determine whether the neuro-symbolic advantage is real or whether performance reduces to standard LLM repair when decomposition errs.

    Authors: We agree that quantitative validation of the operator trees would strengthen the neuro-symbolic claims. The current manuscript provides qualitative examples (Figure 3) and error localization analysis (Section 5.2), but does not include tree-edit distance, full expert validation, or systematic failure-case quantification. We will add a new subsection with these analyses (including a 20-theorem expert review and tree-edit distance on a random subset) in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical SOTA claim rests on benchmark comparisons

full rationale

The paper introduces a modular neuro-symbolic pipeline (decompose to operator trees, then repair) and evaluates it on the independently annotated PRIME benchmark of 156 theorems. Claims of outperforming baselines are grounded in experimental results under matched compute, not in any equations, fitted parameters, or self-citations that reduce the central result to its own inputs by construction. No self-definitional steps, fitted-input predictions, or load-bearing self-citation chains appear in the derivation of the reported performance.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that operator trees faithfully encode mathematical hierarchy and on the quality of expert Lean 4 annotations in PRIME; no explicit free parameters are named in the abstract.

axioms (1)
  • domain assumption Operator trees provide a faithful topological representation of the logical structure of mathematical statements
    Invoked to justify the structure and repair stages of the pipeline.
invented entities (1)
  • Operator tree representation for autoformalization no independent evidence
    purpose: To enable precise localization and sub-tree repair of formalization errors
    Introduced as the core structuring device of the DSR framework

pith-pipeline@v0.9.0 · 5731 in / 1165 out tokens · 57899 ms · 2026-05-25T06:16:36.389884+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

  1. [1]

    Cunningham, G., Bunescu, R., and Juedes, D

    URL https://openreview.net/forum? id=s9t2FJVsBH. Cunningham, G., Bunescu, R., and Juedes, D. Towards autoformalization of mathematics and code correctness: Experiments with elementary proofs. In Ferreira, D., Valentino, M., Freitas, A., Welleck, S., and Schubotz, M. (eds.),Proceedings of the 1st Workshop on Mathe- matical Natural Language Processing (Math...

  2. [2]

    Kristianto, G

    URL https://openreview.net/forum? id=Z2El1U94bq. Kristianto, G. Y ., Topic, G., and Aizawa, A. Mcat math retrieval system for ntcir-12 mathir task. InNTCIR, 2016. 9 Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees Li, C., Ma, W., Wang, Z., and Wen, Z. Sita: A framework for structure-to-instance theorem ...

  3. [3]

    emnlp-main.233/

    URL https://aclanthology.org/2024. emnlp-main.233/. Zhang, M., Borchert, P., Gritta, M., and Lampouras, G. DRIFT: Decompose, retrieve, illustrate, then formalize theorems. InThe Fourteenth International Conference on Learning Representations, 2026. URL https:// openreview.net/forum?id=IGEs577RFz. Zhong, W. and Zanibbi, R. Structural similarity search for ...

  4. [4]

    ,→ (m n :N) ✓Fix: ReplaceZ+withN

    Component-Level Repair (m n :Z +) ×Error: Nonnegative integers should beN. ,→ (m n :N) ✓Fix: ReplaceZ+withN

  5. [5]

    ,→ o = m ✓(Partial) Fix: remove application to satisfy parser/typechecker locally

    Subcomponent-Level Repair o g = m ×Error: ‘Order’ is not defined asoin Mathlib. ,→ o = m ✓(Partial) Fix: remove application to satisfy parser/typechecker locally. (Semantically wrong; fixed globally later.)

  6. [6]

    ,→ orderOf h = n ✓Fix: useorderOffor element order in Mathlib

    Component-Level Repair o h = n ×Error: ‘Order’ is not defined asoin Mathlib. ,→ orderOf h = n ✓Fix: useorderOffor element order in Mathlib

  7. [7]

    ,→ orderOf (g ˆ n * h ˆ m) = Nat.lcm m n / Nat.gcd m n ✓Fix: replace the informal order notationo(·)byorderOf

    Component-Level Repair o (g ˆ n * h ˆ m) = Nat.lcm m n / Nat.gcd m n ×Error: ‘Order’ is not defined asoin Mathlib. ,→ orderOf (g ˆ n * h ˆ m) = Nat.lcm m n / Nat.gcd m n ✓Fix: replace the informal order notationo(·)byorderOf

  8. [8]

    ×Issue: hypothesis became ill-formed semantically after local patch; it no longer states the order ofg

    Statement-Level Repair ...(h2 : o = m) ... ×Issue: hypothesis became ill-formed semantically after local patch; it no longer states the order ofg. ,→ (h2 : orderOf g = m) ✓Fix: restore intended meaningo(g) =musingorderOf g. Consistency check passed. Final Verified Code: theorem test [Group G] (g h : G) (m n :N) (h1 : g * h = h * g) (h2 : orderOf g = m) (h...

  9. [9]

    ,→ [SeminormedAddCommGroup V] [InnerProductSpaceCV] ✓Fix: Add instance ‘SeminormedAddCommGroup’

    Subcomponent-Level Repair [InnerProductSpaceCV] ×Error: Failed to synthesize SeminormedAddCommGroup V . ,→ [SeminormedAddCommGroup V] [InnerProductSpaceCV] ✓Fix: Add instance ‘SeminormedAddCommGroup’

  10. [10]

    ,→ algebraMapR C ✓Fix:Cis an algebra overR, not the converse

    Subcomponent-Level Repair algebraMapC R ×Error: Failed to synthesize algebraMapC R. ,→ algebraMapR C ✓Fix:Cis an algebra overR, not the converse

  11. [11]

    ,→ T star : Module.EndCV ×Failed: Error is caused by the definition of adjoint map, not the type ofT star

    Subcomponent-Level Repair T star ×Error: T star is undefined. ,→ T star : Module.EndCV ×Failed: Error is caused by the definition of adjoint map, not the type ofT star

  12. [12]

    ,→ T = star T

    (Parent) Subcomponent-Level Repair T = T star↔ ∀(eigenvalue :C), (∃(v : V), v̸=0∧T v = eigenvalue·v)→ eigenvalue∈Set.range (algebraMapR C) ×Error: T star is undefined. ,→ T = star T ... ×Failed:star Tis not the adjoint map in Mathlib

  13. [13]

    (Parent) Component-Level Repair ∀(V : Type *) [SeminormedAddCommGroup V] [InnerProductSpaceCV] (T : Module.EndCV), T * T star = T star * T→(T = T star↔ ∀(eigenvalue :C), (∃(v : V), v̸=0∧T v = eigenvalue·v)→eigenvalue∈Set.range (algebraMapR C)) ×Error: T star is undefined. ,→ ... T = adjoint T ... ×Failed: The namespaceLinearMapis omitted

  14. [14]

    T * T star = T star * T→(T = T star↔...) ×Error: T star is undefined

    Statement-Level Repair ... T * T star = T star * T→(T = T star↔...) ×Error: T star is undefined. ,→ ...T * LinearMap.adjoint T = LinearMap.adjoint T * T→(T = LinearMap.adjoint T↔...) ✓Fix: Add the namespace and replace allT starwithLinearMap.adjoint T. Consistency check passed. Final Verified Code: theorem test :∀(V : Type *) [NormedAddCommGroup V] [Inner...

  15. [15]

    ,→ (Complex.abs k = 1∧z̸=1) ✓Fix: Add namespace ‘Complex’

    Subcomponent-Level Repair (abs k = 1∧z̸=1) ×Error: Function ‘abs’ not found in scope. ,→ (Complex.abs k = 1∧z̸=1) ✓Fix: Add namespace ‘Complex’

  16. [16]

    ,→ Summable fun n :N=> zˆ(n + 1) / (n + 1) ✓Fix: n + 1 can never be zero

    Statement-Level Repair Summable fun n :N=> zˆn / n ×Error: The seriesP zn n is undefined atn= 0, causing a division by zero. ,→ Summable fun n :N=> zˆ(n + 1) / (n + 1) ✓Fix: n + 1 can never be zero. Consistency check passed. Final Verified Code: theorem test:∀z :C, (Complex.abs z = 1∧z̸=1)→Summable fun n :N=> zˆ(n + 1) / (n + 1) := by sorry Figure 8.The m...

  17. [17]

    Compared to written expressions, give priority to mathematical expressions (LaTeX format)

  18. [18]

    Do not add extra information or interpret beyond the given content

    Translate each line as a whole, with each translation on a separate line ( −−>format). Do not add extra information or interpret beyond the given content

  19. [19]

    If the binder declares a variable (for example, ‘a‘) to be a type, simply write ’Let a be a type’

    Always translate the binders one−by−one, even if some of the binders can be merged in natural language. If the binder declares a variable (for example, ‘a‘) to be a type, simply write ’Let a be a type’

  20. [20]

    If the formal conclusion is a sequence of curried chain, you can start with ’If’, and then stating the binders one−by −one in the curried chain, finally give the theorem statement

  21. [21]

    If the input formal conditions are NULL, simply state the informal conditions as ’No conditions’. # Output Format Please present your response in the following structured format: **Informal Conditions:** **Informal Conclusion:** # Example # Input Data **Formal Conditions:** {a b c :R} (h : a * b * c = 1) (haux : 1 + a + a * b̸=0) **Formal Conclusion:** a ...

  22. [22]

    Only structure the statements

    Do Not Solve: Do not attempt to prove or solve the problem. Only structure the statements

  23. [23]

    Formalize Logic Only: Express each condition and conclusion using concise LaTeX−style mathematical notation

  24. [24]

    − Do NOT use phrases such as ”i.e.”, ”that is”, ”in other words”, ”namely”, or similar

    No Redundant Rephrasing: − Do NOT explain, restate, or paraphrase a condition in words after giving a formula. − Do NOT use phrases such as ”i.e.”, ”that is”, ”in other words”, ”namely”, or similar. − Each condition must be stated exactly once, in its most direct mathematical form

  25. [25]

    − Avoid prose descriptions like ”is an infinite sequence”; use quantifiers instead

    Explicit Quantifiers: − All variables must be explicitly quantified (e.g., $\forall n\in\mathbb{N}$). − Avoid prose descriptions like ”is an infinite sequence”; use quantifiers instead

  26. [26]

    − Each condition should represent a single logical fact

    Atomic Conditions: − Split compound statements into separate numbered conditions when possible. − Each condition should represent a single logical fact

  27. [27]

    − Do NOT add explanatory remarks about why they are needed

    Implicit Conditions: − Include standard domain constraints only if they are mathematically necessary. − Do NOT add explanatory remarks about why they are needed

  28. [28]

    − Avoid any natural language explanation beyond the formula itself

    Minimality: − Prefer the shortest mathematically complete formulation. − Avoid any natural language explanation beyond the formula itself

  29. [29]

    If the input problem statement contains only a conclusion and no conditions, state: **Conditions:** No conditions

  30. [30]

    Quantifier Classification Rule: − A quantified statement belongs to Conditions if and only if it restricts already−given objects and does not introduce any predicate whose truth is to be established. − If a universal quantifier governs a statement expressing divisibility, equality, inequality, or any nontrivial property, it MUST appear in the Conclusion a...

  31. [31]

    − Variables introduced by explicit quantifiers (∀,∃) MUST be declared only within their quantified statements and MUST NOT be separately declared as conditions

    Free Variable and Implicit Type Declaration Completion: − Every variable that appears free (i.e., not bound by∀or∃) in any condition or conclusion MUST be explicitly declared with a domain or type. − Variables introduced by explicit quantifiers (∀,∃) MUST be declared only within their quantified statements and MUST NOT be separately declared as conditions...

  32. [32]

    ”$A\subset X$” (type/membership declaration)

  33. [33]

    ”$A\subsetneq X$” (property condition) − All such free−variable and type declarations MUST be included as **Conditions** and MUST appear before all other non−declaration conditions

  34. [34]

    such that ...”, ”There exists

    Existential Quantifier Preservation: − If the input problem statement is of the form ”Find ... such that ...”, ”There exists ... such that ...”, or otherwise asserts existence, the Conclusion MUST be a single existentially quantified formula. − Do NOT place existentially quantified variables as separate domain declarations in Conditions.− Do NOT split an ...

  35. [35]

    − Do NOT omit or move universal quantifiers to Conditions; they must remain bound to their predicates in the Conclusion

    Universal Quantifier Preservation: − If the input problem statement asserts that a property holds for all elements of a certain domain (e.g., ”for all ...”, ” any ...”, ”every ...”), the Conclusion MUST include the universal quantifier explicitly. − Do NOT omit or move universal quantifiers to Conditions; they must remain bound to their predicates in the ...

  36. [36]

    − Do NOT merge multiple facts or properties into a single condition, even if they appear in the same sentence in natural language

    Atomic Condition Enforcement for Lean Formalization: − When extracting conditions from natural language statements, each condition MUST contain **only one atomic fact**. − Do NOT merge multiple facts or properties into a single condition, even if they appear in the same sentence in natural language. − Each atomic fact should be expressed as a separate num...

  37. [37]

    **Important Note:** The **Conclusion** section must always be a **single line**

    **Conclusion:** − ... **Important Note:** The **Conclusion** section must always be a **single line**. Do NOT split the conclusion into multiple numbered lines or separate statements. All predicates, quantifiers, and logical connectors related to the conclusion must be combined into this one line. −−− Now, perform the task for the following Input Data. **...

  38. [38]

    Your output will be programmatically used to strictly replace it

    Scope Consistency (CRITICAL): The ‘Broken Code‘ is a strict substring (an expression). Your output will be programmatically used to strictly replace it. − Output ONLY the expression. Do NOT add ‘def‘, ‘let‘, ‘have‘, ‘theorem‘, or assignment symbols (‘:=‘). − Do NOT output the surrounding code. If the input is ‘MulAction.orbitRel G H‘, do not return ‘(h1 :...

  39. [39]

    − Explicit/Implicit Arguments: Check if you need to make an argument explicit (using ‘@‘) or if you provided an explicit argument where an implicit one was expected

    Type−Driven Repair: Since there is no informal text, rely on Mathlib signatures and Type Theory: − Argument Order: Check if the function expects arguments in a different order. − Explicit/Implicit Arguments: Check if you need to make an argument explicit (using ‘@‘) or if you provided an explicit argument where an implicit one was expected. − Coercions: C...

  40. [40]

    Analyze the Current Error: Examine the ‘message‘ and ‘position‘ to identify if the issue is a Type Mismatch, Unknown Identifier, or Synthesis Failure

  41. [43]

    # Output Format Please present your response in the following structured format and do not include conversational filler

    Summarize: Write only a single sentence describing why the code failed (useful for classification). # Output Format Please present your response in the following structured format and do not include conversational filler. **Error Reason:** <One−sentence summary, keep it as simple as possible> **Corrected Code Snippet:** <The fixed expression ONLY.> −−− No...

  42. [44]

    Your output will be programmatically used to strictly replace the ‘Broken Code‘ string

    **Scope Consistency (CRITICAL):** The ‘Broken Code‘ is a strictly defined substring of a larger file. Your output will be programmatically used to strictly replace the ‘Broken Code‘ string. − Do NOT output the full theorem if the input was only a signature or a hypothesis. − Do NOT include surrounding keywords (like ‘theorem‘, ‘example‘, ‘:=‘, or ‘by‘) un...

  43. [45]

    Ensure the fix preserves the intended logic for that specific context

    Semantic Alignment: Compare the ‘Broken Code‘ against the ‘Informal Component‘. Ensure the fix preserves the intended logic for that specific context

  44. [46]

    − If there is a type mismatch, check the ‘Informal Component‘ to decide whether to cast/coerce variables or change the type definition

    Analyze the Current Error: Examine the ‘message‘ and ‘position‘ in the Error Message to pinpoint the exact failure (e.g., incorrect syntax, type mismatch, unknown identifier). − If there is a type mismatch, check the ‘Informal Component‘ to decide whether to cast/coerce variables or change the type definition

  45. [47]

    Check Context: Verify if the variables used are consistent with the ‘Previously Declared Variables‘ (If any)

  46. [49]

    # Output Format Please present your response in the following structured format and do not include conversational filler

    Summarize: Write only a single sentence describing why the code failed (useful for classification). # Output Format Please present your response in the following structured format and do not include conversational filler. **Error Reason:** <One−sentence summary, keep it as simple as possible> **Corrected Code Snippet:** <The fixed code snippet ONLY.> −−− ...

  47. [50]

    − Ensure the fix results in valid Lean 4 syntax

    **Analyze the Error Signal (CRITICAL BRANCHING):** **CASE A: ‘Error Message‘ is PRESENT:** − Focus primarily on fixing the reported syntax or type error (e.g., ”unknown identifier”, ”type mismatch”). − Ensure the fix results in valid Lean 4 syntax. **CASE B: ‘Error Message‘ is EMPTY/NULL:** − STOP DEBUGGING SYNTAX. The code already compiles. − Focus ONLY ...

  48. [51]

    Ensure the fixed code preserves the intended logic (quantifiers, implications, types) rather than just satisfying the compiler by changing the meaning

    Semantic Alignment: Compare the ‘Broken Statement‘ against the ‘Informal Statement‘. Ensure the fixed code preserves the intended logic (quantifiers, implications, types) rather than just satisfying the compiler by changing the meaning

  49. [52]

    Do not add any ‘import‘ statements ( assume Mathlib is present)

    Apply Minimal Fixes: Correct the code only at the source of the error. Do not add any ‘import‘ statements ( assume Mathlib is present)

  50. [53]

    # Output Format Please present your response in the following structured format and do not include conversational filler

    Summarize: Write only a single sentence describing why the code failed (useful for classification). # Output Format Please present your response in the following structured format and do not include conversational filler. **Error Reason:** <One−sentence summary, keep it as simple as possible> **Corrected Formal Statement:** <The fixed formal statement (th...