Pseudo-Formalization for Automatic Proof Verification
Pith reviewed 2026-05-21 06:21 UTC · model grok-4.3
The pith
Natural language proofs can be verified automatically by translating them into self-contained Pseudo-Formal modules checked independently.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that decomposing a natural language proof into Pseudo-Formal modules—each stating premises, conclusion, and a natural language proof—allows an LLM to translate the original text and then verify the modules independently, establishing overall correctness more effectively than direct whole-proof judgments by an LLM.
What carries the argument
Pseudo-Formal modules, each a self-contained unit with premises, conclusion, and natural language proof that supports independent verification.
If this is right
- Error detection improves because issues can be isolated to individual modules instead of requiring re-assessment of the full proof.
- Verification extends to research-level mathematics where complete formalization remains impractical.
- Precision and recall for finding proof errors exceed those of direct LLM-as-judge baselines on the tested benchmarks.
- The released ArxivMathGradingBench supplies a shared test set for comparing future verification approaches on research-level problems.
Where Pith is reading between the lines
- Granular module-level feedback could support iterative refinement when training AI systems to generate or correct proofs.
- The same decomposition strategy might transfer to other structured reasoning tasks such as checking long chains of code or logical arguments.
- Hybrid systems could combine Pseudo-Formal modules with partial formalization in targeted sections to increase reliability further.
Load-bearing premise
An LLM can translate any natural language proof into accurate Pseudo-Formal modules without omitting or adding critical logical steps.
What would settle it
A natural language proof containing a logical error that the LLM translation into Pseudo-Formal form fails to capture, so that independent module verification still approves the flawed proof.
Figures
read the original abstract
Reliable verification of proofs remains a bottleneck for training and evaluating AI systems on hard mathematical reasoning. Fully formal proofs, in languages like Lean, are easy to verify because they are unambiguous and modular. Most proofs, particularly those written by AI systems, have neither property, and translating them into formal languages remains challenging in many frontier math settings. We propose Pseudo-Formalization (PF), a proof format that captures the modularity and precision of formal proofs while retaining the flexibility of natural language. A Pseudo-Formal proof is decomposed into self-contained modules, each stating its premises, conclusion, and proof in natural language. To verify the correctness of a regular natural language proof, an LLM translates it to Pseudo-Formal and then verifies each module independently, an algorithm we call Block Verification (BV). We evaluate PF+BV on two benchmarks spanning olympiad and research-level mathematics, where it pareto-dominates LLM-as-judge baselines on error-finding precision and recall. To support future work, we release our research-level proof verification benchmark ArxivMathGradingBench.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Pseudo-Formalization (PF), a modular proof representation in which natural-language proofs are decomposed into self-contained modules each stating premises, conclusion, and proof steps. It defines Block Verification (BV) as the procedure of using an LLM to translate a natural-language proof into PF modules and then verifying each module independently. The central empirical claim is that PF+BV Pareto-dominates standard LLM-as-judge baselines on error-finding precision and recall across two benchmarks (olympiad and research-level mathematics), and the authors release the ArxivMathGradingBench dataset to support further work.
Significance. If the soundness of independent module verification for overall proof correctness can be established, the approach would supply a useful intermediate point between fully formal verification and unstructured natural-language judging, potentially easing the bottleneck in training and evaluating AI mathematical reasoners. The public release of a research-level proof-verification benchmark is a concrete positive contribution that enables reproducible follow-up research.
major comments (2)
- [Block Verification algorithm description] The description of Block Verification states that each PF module is verified independently, yet provides no mechanism (entailment check, dependency graph validation, or global consistency step) to ensure that the conclusion of module i entails the premises of module i+1. Because the central claim is that local verification suffices to certify the entire proof, the absence of any cross-module chaining verification is load-bearing; an LLM translation error that misaligns adjacent modules would produce locally valid but globally invalid arguments.
- [Evaluation and experimental results] The evaluation reports Pareto dominance on precision and recall but supplies no information on benchmark construction, statistical tests, error-bar reporting, or controls for data leakage between training data and the released ArxivMathGradingBench. Without these details the empirical claim cannot be assessed at the level required to support the superiority statement.
minor comments (2)
- [Abstract] The abstract uses the term 'pareto-dominates' without specifying the exact set of metrics or the baseline configurations; a short table or explicit statement of the comparison points would improve clarity.
- [Pseudo-Formalization definition] Notation for the PF module components (premises, conclusion, proof text) is introduced informally; a compact definition or diagram would reduce ambiguity when readers compare PF to existing intermediate representations.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each of the major comments in detail below, indicating the revisions we plan to make.
read point-by-point responses
-
Referee: [Block Verification algorithm description] The description of Block Verification states that each PF module is verified independently, yet provides no mechanism (entailment check, dependency graph validation, or global consistency step) to ensure that the conclusion of module i entails the premises of module i+1. Because the central claim is that local verification suffices to certify the entire proof, the absence of any cross-module chaining verification is load-bearing; an LLM translation error that misaligns adjacent modules would produce locally valid but globally invalid arguments.
Authors: We agree that explicitly addressing the chaining between modules strengthens the presentation. In the Pseudo-Formalization approach, the translation to PF modules is designed such that the premises of each subsequent module are directly based on the conclusions of the prior ones, as part of the decomposition process. The independent verification then confirms that within each module, the proof steps correctly entail the conclusion from the stated premises. While we do not include an automated cross-module entailment check in the current BV algorithm, this is because the modularity assumes faithful translation; any misalignment would be a failure of the translation step rather than the verification per se. To clarify this, we will revise the manuscript to include a dedicated subsection explaining the assumptions underlying independent verification and the role of the translation in maintaining global structure. We will also add a discussion of potential failure modes due to translation errors and how they might be mitigated. revision: partial
-
Referee: [Evaluation and experimental results] The evaluation reports Pareto dominance on precision and recall but supplies no information on benchmark construction, statistical tests, error-bar reporting, or controls for data leakage between training data and the released ArxivMathGradingBench. Without these details the empirical claim cannot be assessed at the level required to support the superiority statement.
Authors: We acknowledge the need for greater transparency in the experimental setup to allow proper assessment of the results. In the revised manuscript, we will expand the evaluation section to provide: (1) a detailed description of how the ArxivMathGradingBench was constructed, including the selection criteria for proofs and error annotations; (2) statistical tests (e.g., paired t-tests or Wilcoxon tests) comparing PF+BV to baselines, along with p-values; (3) error bars or confidence intervals on the precision and recall metrics, computed over multiple runs or bootstrap samples; and (4) explicit controls for data leakage, such as verification that the benchmark proofs were not part of the LLM training data and use of held-out splits where applicable. These additions will support the empirical claims more rigorously. revision: yes
Circularity Check
No circularity: empirical evaluation of new format on external benchmarks
full rationale
The paper defines Pseudo-Formalization (PF) as a modular natural-language proof format and Block Verification (BV) as independent per-module checking after LLM translation. It then reports empirical results on two benchmarks (including the newly released ArxivMathGradingBench), showing Pareto dominance over LLM-as-judge baselines on error-finding precision and recall. No equations, fitted parameters, or self-citations appear in the derivation; the central claim rests on external LLM capabilities and released test data rather than reducing to self-definition or internal fitting. The method is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery and embed_injective unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A Pseudo-Formal proof is a sequence of modules V={v_i = (P_i, c_i, π_i)} equipped with a directed acyclic graph (DAG) G for the invocation dependency and a forest T for scope inheritance
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1 (Context cost of block-wise verification): ... O(n) calls to T1 with context length independent of n
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Learning to give checkable answers with prover-verifier games.arXiv preprint arXiv:2108.12099, 2021
Cem Anil, Guodong Zhang, Yuhuai Wu, and Roger Grosse. Learning to give checkable answers with prover-verifier games.arXiv preprint arXiv:2108.12099, 2021
-
[2]
Kevin Buzzard and Richard Taylor. Towards a Lean proof of Fermat’s Last Theorem.https: //imperialcollegelondon.github.io/FLT/blueprint.pdf, 2024
work page 2024
- [3]
-
[4]
Power-free palindromes and reversed primes: S
Shashi Chourasiya and Daniel R Johnston. Power-free palindromes and reversed primes: S. chourasiya, dr johnston.Monatshefte für Mathematik, pages 1–17, 2025
work page 2025
-
[5]
Escaping the cognitive well: Efficient competition math with off-the-shelf models,
Xingyu Dang, Rohit Agarwal, Rodrigo Porto, Anirudh Goyal, Liam H Fowl, and Sanjeev Arora. Escaping the cognitive well: Efficient competition math with off-the-shelf models,
- [6]
-
[7]
The lean theorem prover (system description)
Leonardo De Moura, Soonho Kong, Jeremy Avigad, Floris Van Doorn, and Jakob von Raumer. The lean theorem prover (system description). InInternational Conference on Automated Deduction, pages 378–388. Springer, 2015
work page 2015
-
[8]
Finite codimension stability of invariant surfaces.arXiv preprint arXiv:2502.00898, 2025
Giovanni Forni. Finite codimension stability of invariant surfaces.arXiv preprint arXiv:2502.00898, 2025
-
[9]
Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Dayiheng Liu, Chang Zhou, Wen Xiao, Tianyu Liu, and Baobao Chang. Llm critics help catch bugs in mathematics: Towards a better mathematical verifier with natural language feedback. InFindings of the Association for Computational Linguistics: ACL 2025, pages 14588–14604. Associati...
work page 2025
-
[10]
Formal proof – theory and practice.Notices of the American Mathematical Society, 55:1395–1406, 2008
John Harrison. Formal proof – theory and practice.Notices of the American Mathematical Society, 55:1395–1406, 2008
work page 2008
-
[11]
Context rot: How increasing input tokens impacts llm performance.URL https://research
Kelly Hong, Anton Troynikov, and Jeff Huber. Context rot: How increasing input tokens impacts llm performance.URL https://research. trychroma. com/context-rot, retrieved October, 20: 2025, 2025
work page 2025
-
[12]
Pessimistic verification for open ended math questions.arXiv preprint arXiv:2511.21522, 2025
Yanxing Huang, Zihan Tang, Zejin Lin, Peng Li, and Yang Liu. Pessimistic verification for open ended math questions.arXiv preprint arXiv:2511.21522, 2025
-
[13]
Yichen Huang and Lin F Yang. Winning gold at imo 2025 with a model-agnostic verification- and-refinement pipeline.arXiv preprint arXiv:2507.15855, 2025
-
[14]
Prover-verifier games improve legibility of llm outputs.arXiv preprint arXiv:2407.13692, 2024
Jan Hendrik Kirchner, Yining Chen, Harri Edwards, Jan Leike, Nat McAleese, and Yuri Burda. Prover-verifier games improve legibility of llm outputs.arXiv preprint arXiv:2407.13692, 2024
-
[15]
PrimeNumberTheoremAnd: A formalization of the prime number theorem and related results in Lean 4
Alex Kontorovich and Terence Tao. PrimeNumberTheoremAnd: A formalization of the prime number theorem and related results in Lean 4. https://github.com/ AlexKontorovich/PrimeNumberTheoremAnd, 2024
work page 2024
-
[16]
Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. In International Conference on Learning Representations, 2024. 12
work page 2024
-
[17]
Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, et al. Goedel-prover: A frontier model for open-source automated theorem proving.arXiv preprint arXiv:2502.07640, 2025
-
[18]
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, et al. Goedel-prover-v2: Scaling formal theorem proving with scaffolded data synthesis and self-correction.arXiv preprint arXiv:2508.03613, 2025
work page internal anchor Pith review arXiv 2025
-
[19]
Trust, but verify: A self-verification approach to reinforcement learning with verifiable rewards
Xiaoyuan Liu, Tian Liang, Zhiwei He, Jiahao Xu, Wenxuan Wang, Pinjia He, Zhaopeng Tu, Haitao Mi, and Dong Yu. Trust, but verify: A self-verification approach to reinforcement learning with verifiable rewards. InProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025), 2025
work page 2025
-
[20]
Towards robust mathematical reasoning
Minh-Thang Luong, Dawsen Hwang, Hoang H Nguyen, Golnaz Ghiasi, Yuri Chervonyi, Insuk Seo, Junsu Kim, Garrett Bingham, Jonathan Lee, Swaroop Mishra, et al. Towards robust mathematical reasoning. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35406–35430, 2025
work page 2025
-
[21]
Sadegh Mahdavi, Branislav Kisacanin, Shubham Toshniwal, Wei Du, Ivan Moshkov, George Armstrong, Renjie Liao, Christos Thrampoulidis, and Igor Gitman. Scaling generative ver- ifiers for natural language mathematical proof verification and selection.arXiv preprint arXiv:2511.13027, 2025
-
[22]
The expressive power of transformers with chain of thought.arXiv preprint arXiv:2310.07923, 2023
William Merrill and Ashish Sabharwal. The expressive power of transformers with chain of thought.arXiv preprint arXiv:2310.07923, 2023
-
[23]
The lean 4 theorem prover and programming language
Leonardo de Moura and Sebastian Ullrich. The lean 4 theorem prover and programming language. InInternational Conference on Automated Deduction, pages 625–635. Springer, 2021
work page 2021
-
[24]
Tobias Nipkow, Markus Wenzel, and Lawrence C Paulson.Isabelle/HOL: a proof assistant for higher-order logic. Springer, 2002
work page 2002
-
[25]
Shrey Pandit, Austin Xu, Xuan-Phi Nguyen, Yifei Ming, Caiming Xiong, and Shafiq Joty. Hard2verify: A step-level verification benchmark for open-ended frontier math.arXiv preprint arXiv:2510.13744, 2025
-
[26]
Guillaume Rond. Real algebraic surfaces biholomorphically equivalent but not algebraically equivalent.arXiv preprint arXiv:2503.08957, 2025
-
[27]
Liquid tensor experiment.Experimental Mathematics, 31(2):349–354, 2022
Peter Scholze. Liquid tensor experiment.Experimental Mathematics, 31(2):349–354, 2022
work page 2022
-
[28]
Deepseekmath-v2: Towards self-verifiable mathematical reasoning
Zhihong Shao, Yuxiang Luo, Chengda Lu, ZZ Ren, Jiewen Hu, Tian Ye, Zhibin Gou, Shirong Ma, and Xiaokang Zhang. Deepseekmath-v2: Towards self-verifiable mathematical reasoning. arXiv preprint arXiv:2511.22570, 2025
-
[29]
Heimdall: test-time scaling on the generative verification.arXiv preprint arXiv:2504.10337, 2025
Wenlei Shi and Xing Jin. Heimdall: test-time scaling on the generative verification.arXiv preprint arXiv:2504.10337, 2025
-
[30]
A promising path towards autoformalization and general artificial intelli- gence
Christian Szegedy. A promising path towards autoformalization and general artificial intelli- gence. InInternational Conference on Intelligent Computer Mathematics, pages 3–20. Springer, 2020
work page 2020
-
[31]
From solving to verifying: A unified objective for robust reasoning in llms, 2025
Xiaoxuan Wang, Bo Liu, Song Jiang, Jingzhou Liu, Jingyuan Qi, Xia Chen, and Baosheng He. From solving to verifying: A unified objective for robust reasoning in llms, 2025. URL https://arxiv.org/abs/2511.15137. 13
-
[32]
Yuhuai Wu, Albert Qiaochu Jiang, Wenda Li, Markus Rabe, Charles Staats, Mateja Jamnik, and Christian Szegedy. Autoformalization with large language models.Advances in neural information processing systems, 35:32353–32368, 2022
work page 2022
-
[33]
Pencil: Long thoughts with short memory, 2025
Chenxiao Yang, Nathan Srebro, David McAllester, and Zhiyuan Li. Pencil: Long thoughts with short memory, 2025. URLhttps://arxiv.org/abs/2503.14337
-
[34]
Recursive models for long-horizon reasoning
Chenxiao Yang, Nathan Srebro, and Zhiyuan Li. Recursive models for long-horizon reasoning. arXiv preprint arXiv:2603.02112, 2026
-
[35]
Weihao Zeng, Yuzhen Huang, and Junxian He. Loca-bench: Benchmarking language agents under controllable and extreme context growth.arXiv preprint arXiv:2602.07962, 2026
-
[36]
Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, and Rishabh Agarwal. Generative verifiers: Reward modeling as next-token prediction. InProceedings of the International Conference on Learning Representations (ICLR), 2025. URL https://arxiv. org/abs/2408.15240
-
[37]
Eric Zhao, Pranjal Awasthi, and Sreenivas Gollapudi. Sample, scrutinize and scale: Effective inference-time search by scaling verification.arXiv preprint arXiv:2502.01839, 2025
-
[38]
Processbench: Identifying process errors in mathematical reasoning
Chujie Zheng, Zhenru Zhang, Beichen Zhang, Runji Lin, Keming Lu, Bowen Yu, Dayiheng Liu, Jingren Zhou, and Junyang Lin. Processbench: Identifying process errors in mathematical reasoning. InThe 63rd Annual Meeting of the Association for Computational Linguistics, 2025
work page 2025
-
[39]
Leanarchitect: Automating blueprint generation for humans and ai, 2026
Thomas Zhu, Pietro Monticone, Jeremy Avigad, and Sean Welleck. Leanarchitect: Automating blueprint generation for humans and ai, 2026. URL https://arxiv.org/abs/2601. 22554. 14 A Context cost of verification: formal statement and proof Definition 2(Good Pseudo-Formal proof).Agood Pseudo-Formal proofis a Pseudo-Formal proof (Definition 1) with the followin...
work page 2026
-
[40]
The depth of T is D<n
-
[41]
The lengths of the conclusion c, the premises, and the proof are bounded above by L. Definition 3(Block-wise verification algorithm).Let P= (G , T) be a Pseudo-Formal proof with node set V={v 1, . . . ,vn}, where each node is written as v= (P v,c v,π v). For a node v∈V , let AncT(v) denote its ancestors in the scope-inheritance forest T, and define its de...
work page 2026
-
[42]
Lemmas - Do not introduce any deeper hierarchy. - If a deeper tree seems natural, flatten it into a sequential list of lemmas inside the relevant proposition. - The theorem proof may only cite propositions; a proposition proof may only cite its own lemmas and the statements of earlier propositions. - No trivial decompositions: do not decompose the theorem...
-
[43]
**Contexts**: A sequence of statements from which the Assertion may or may not inherit definitions, assumptions, or conditions. These are often the parent or ancestor statements of the Assertion, and can be the same as the global theorem. They are provided solely so you can understand the definitions and assumptions of the Assertion. They have NOT been ve...
-
[44]
You may assume all established results are correct and use them freely - do NOT re-verify them
**Established Results **: Statements that have already been verified or can be assumed to be correct. You may assume all established results are correct and use them freely - do NOT re-verify them. The proof of the Assertion can invoke these results as long as the assumptions are properly justified and the definitions are consistent
-
[45]
**Assertion**: The specific statement whose proof you must verify
-
[46]
**Proposed Proof **: The proof of the Assertion to verify. Instructions: - Verify ONLY the proposed proof of the Assertion. - Read the Assertion carefully and analyze the proof step by step. - Identify any incorrect, unjustified, or logically invalid reasoning. - Pay close attention to potentially confusing or ambiguous interpretations of concepts. - When...
-
[47]
Math Problem: The original problem statement. 20
-
[48]
Original Solution Steps: The solver’s original solution split into indexed steps, written as: <step>[0] ...</step> <step>[1] ...</step> and so on
-
[49]
Rewritten Proof: A structured rewrite of the original solution, decomposed into theorem / proposition / lemma / claim / fact blocks. This rewrite was produced automatically and may contain artifacts that are not present in the original solution
-
[50]
These may or may not be genuine errors in the original solution
Potential Errors: A list of potential errors flagged by automated verification of the rewritten proof. These may or may not be genuine errors in the original solution. Evaluation process:
-
[51]
Validate each potential error. For each flagged issue, decide whether it is a genuine mathematical error in the ORIGINAL solution or a false alarm caused by the rewrite. Cross-check against the original solution steps. If the original solution contains the missing reasoning, a stronger equivalent argument, or a correct alternative route, do not count the ...
-
[52]
Inspect the original steps as needed. The potential errors are useful leads, but your final output must be based on the original solution steps. Check any original steps needed to identify the first genuine incorrect step. If you find that the first incorrect step was not covered by any potential error, report it in <additional_errors>
-
[53]
For each original step, determine whether that step is correct or incorrect
Grade the original steps, not the rewrite. For each original step, determine whether that step is correct or incorrect. - A correct step is one where all mathematical content is correct and logically consistent with the problem and all previous correct steps. - An incorrect step is one that contains a mathematical error, an unjustified load- bearing claim...
-
[54]
corrected typos in the proof of Theorem 3
Identify the first incorrect step. The first incorrect step is the smallest original step index whose content is mathematically incorrect or whose reasoning first depends on an error. If every original step is correct, output -1. Important calibration rules: - The original solution steps are authoritative for the final verdict. - The rewritten proof and p...
work page 2025
-
[55]
The rendered PDF (attached as a file). Use this to read the paper as a human reader would, with all theorem/lemma/proposition numbers rendered
-
[56]
The raw LaTeX source (included below). Use this to inspect precise notation, equations, and label names if needed. Your task is to identify any mathematical errors in the paper that would require revision before publication. Focus on errors of mathematical substance: incorrect proofs, unjustified steps, false claims, gaps in reasoning, miscomputed quantit...
-
[57]
The rendered PDF of the paper (attached as a file). Use it to read the paper as a human would and to identify the rendered numerical or letter labels of every theorem/lemma/proposition/corollary/claim
-
[58]
Use it to inspect precise notation, equations, and the exact wording of each statement and proof
The raw LaTeX source of the paper (included below). Use it to inspect precise notation, equations, and the exact wording of each statement and proof. Structure: - The top level of the rewrite is a list of **theorems**. Use one <THEOREM_STATEMENT id="N"> block per top-level result the paper proves. Top-level results are the paper’s flagship theorem(s) plus...
-
[59]
Propositions (shared pool, numbered globally across all theorems)
-
[60]
N"> may cite any of the propositions; a <PROPOSITION_PROOF id=
Lemmas (numbered within each proposition) - Do not introduce any deeper hierarchy. If a deeper tree seems natural, flatten it into a sequential list of lemmas inside the relevant proposition. - Each <THEOREM_PROOF id="N"> may cite any of the propositions; a <PROPOSITION_PROOF id="K"> may cite its own lemmas and the statements of earlier propositions. A th...
-
[61]
The rendered PDF of the original paper (attached as a file)
-
[62]
The raw LaTeX source of the paper (included below -- ground truth -- your rewrite must faithfully represent this)
-
[63]
Your previous rewritten paper, which contained faithfulness errors
-
[64]
Requirements for the new rewrite: - Fix every issue listed in the Identified Errors
The specific discrepancies flagged by the checker. Requirements for the new rewrite: - Fix every issue listed in the Identified Errors. For each issue, make sure the new rewrite no longer deviates from the original paper in that way. - Do NOT introduce new discrepancies: do not strengthen/weaken claims, omit steps, add arguments, drift in notation, or mis...
-
[65]
**Original Paper (LaTeX) **: The full LaTeX source of the original paper
-
[66]
These are provided so you can understand the scope of the Assertion
**Contexts**: Statements from the rewritten paper that the Assertion inherits definitions or assumptions from (e.g., the enclosing theorem statement, the enclosing proposition statement). These are provided so you can understand the scope of the Assertion
-
[67]
You may use them as reference points
**Established Results **: Statements from the rewritten paper that have already been checked and can be assumed to be faithfully rewritten. You may use them as reference points. 28
-
[68]
**Assertion**: The specific rewritten statement whose faithfulness you must verify
-
[69]
**Proposed Proof **: The rewritten proof of the Assertion (may be empty for a top- level theorem statement). Your task is to determine whether the Assertion and its Proposed Proof **faithfully represent** the corresponding part of the Original Paper. Check for:
-
[70]
**Strengthened or weakened claims **: Does the Assertion claim more or less than the original paper establishes at the corresponding point?
-
[71]
**Omitted content **: Does the Proposed Proof drop a non-trivial argument that appears in the original paper?
-
[72]
**Added content **: Does the Proposed Proof introduce new arguments, repairs, or proof ideas not present in the original paper?
-
[73]
**Notation drift **: Are variables, functions, or definitions used differently than in the original paper?
-
[74]
**Misinterpretation**: Does the Assertion or Proposed Proof misunderstand the original’s reasoning or logical structure?
-
[75]
**Scope errors **: Are assumptions incorrectly inherited, dropped, or added compared to the original? Instructions: - Do NOT judge whether the original paper is mathematically correct. Your sole task is faithfulness. - Only flag changes that alter mathematical meaning. Cosmetic rephrasing is fine. - Use the Contexts to understand what the Assertion is all...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.