Theoria: Rewrite-Acceptability Verification over Informal Reasoning States
Pith reviewed 2026-07-02 12:10 UTC · model grok-4.3
The pith
Theoria verifies informal AI answers by rewriting them into sequences of auditable state transitions each backed by explicit justifications.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Theoria rewrites candidate solutions into sequences of typed state transitions each licensed by an explicit justification, enforcing the completeness-of-change invariant so every difference between consecutive states is accounted for. On HLE-Verified Gold it certifies 105 of 185 problems at 91.4 percent strict precision with Wilson interval [84.5 percent, 95.4 percent] and produces human-readable traces. On 95 adversarial poisoned proofs across 15 domains, structured judges reach 94.7 percent detection versus 83.2 percent for holistic judging, the 11.5-point gap concentrated in hidden premises and fabricated citations as predicted by the formal analysis.
What carries the argument
Rewrite-acceptability verification that decomposes informal reasoning into typed state transitions and enforces completeness of change so every mutation requires an explicit justification.
If this is right
- Every certification yields a human-readable trace in which each step can be independently challenged.
- Structured judges detect hidden premises at 90.6 percent versus 62.5 percent for holistic judges.
- Fabricated citations are caught at 100 percent versus 90 percent.
- The method and holistic judging succeed on largely disjoint sets of problems.
- Certified precision reaches 97.1 percent on the GPQA Diamond subset.
Where Pith is reading between the lines
- The same state-transition discipline could be applied to longer multi-step tasks where coherence failures accumulate.
- Traces might support targeted human editing by isolating the exact transition that introduced an error.
- Combining the rewrite layer with existing formal proof assistants could raise coverage on problems that partially admit formalization.
- The approach suggests a route to auditing reasoning chains that remain too informal for current formal tools.
Load-bearing premise
The completeness-of-change invariant can be reliably enforced during the rewriting process without missing or misclassifying hidden premises in informal reasoning.
What would settle it
Manual inspection of a sample of Theoria-certified traces that reveals a substantial fraction containing undetected hidden premises or fabricated citations.
read the original abstract
When should an AI system's answer be trusted? Formal proof assistants offer certainty but cannot reach most of the problem distribution; scalar LLM judges offer coverage but produce opaque scores that cannot be audited after the fact and are subject to the same coherence issues as any LLM. We present Theoria, a verification architecture that closes this gap. A candidate solution is rewritten into a sequence of typed state transitions, each licensed by an explicit justification, whether that be a citation, computation, or problem-given fact, and every transition is independently auditable. The foundational invariant is completeness of change: every difference between consecutive proof states must be accounted for, so hidden premises surface as unlicensed mutations rather than passing silently. On HLE-Verified Gold (185 text-only expert problems), Theoria certifies 105 at 91.4% strict precision (Wilson 95% CI [84.5%, 95.4%]). Every certification produces a human readable proof trace in which each step can be independently challenged. Holistic LLM judges achieve comparable precision at matched coverage but fail on different problems (Jaccard 0.14-0.36), making the approaches complementary. On 95 adversarial poisoned proofs across 15 domains, structured judges catch 94.7% versus 83.2% for holistic judging (p= 0.0017). The overall 11.5 pp gap concentrates in hidden premises (90.6% vs. 62.5%, a 28 pp difference) and fabricated citations (100% vs. 90%), the error classes where the formal analysis predicts an advantage; performance is identical on arithmetic and theorem-misapplication errors, where no advantage is predicted. On GPQA Diamond (n= 65), certified precision is 97.1% (Wilson CI [85.1%, 99.5%]).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce Theoria, a verification architecture that rewrites AI candidate solutions into sequences of typed state transitions with explicit justifications (citations, computations, or problem-given facts), enforcing a 'completeness of change' invariant to ensure every difference between consecutive proof states is accounted for, thereby surfacing hidden premises. It reports certifying 105 out of 185 HLE-Verified Gold problems at 91.4% strict precision (Wilson CI [84.5%, 95.4%]), with human-readable proof traces; complementary performance to holistic LLM judges (Jaccard 0.14-0.36); and superior detection on 95 adversarial poisoned proofs (94.7% vs 83.2%, p=0.0017), with a 28 pp advantage on hidden premises.
Significance. If the rewriting process reliably enforces the invariant without missing hidden premises, the approach could provide a valuable auditable alternative to both formal proof systems and scalar LLM judges. The empirical results on expert-level benchmarks (HLE, GPQA) and the targeted advantage on predicted error classes (hidden premises, fabricated citations) in adversarial testing lend support to the method's potential. The generation of independently challengeable proof traces is a clear strength, as is the use of external benchmarks without apparent parameter fitting.
major comments (2)
- Abstract: Reports concrete precision numbers (91.4% on HLE-Verified Gold) and statistical comparisons (p=0.0017, 28 pp gap on hidden premises) but provides no description of the rewriting algorithm, typing mechanism, or how states are constructed, leaving the central claim of enforcing the completeness of change invariant without visible derivation support.
- Verification architecture description: The foundational invariant of completeness of change (every difference between consecutive proof states must be accounted for by an explicit justification) is presented as reliably enforceable in informal text, but without a formal definition of state differences, a decision procedure, or inter-annotator study, it is unclear how the reported advantages (e.g., 90.6% vs 62.5% on hidden premises) are achieved; this is load-bearing for interpreting the Wilson CIs and adversarial results.
minor comments (2)
- Results section: The Jaccard overlap values (0.14-0.36) between Theoria and holistic judges are reported but without specifying the exact problems or calculation method.
- Notation for typed state transitions could be clarified with a concrete example early in the paper to aid readability.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for highlighting areas where greater clarity would strengthen the manuscript. We address each major comment below and outline targeted revisions to improve the presentation of the core architecture while preserving the empirical contributions.
read point-by-point responses
-
Referee: Abstract: Reports concrete precision numbers (91.4% on HLE-Verified Gold) and statistical comparisons (p=0.0017, 28 pp gap on hidden premises) but provides no description of the rewriting algorithm, typing mechanism, or how states are constructed, leaving the central claim of enforcing the completeness of change invariant without visible derivation support.
Authors: The abstract is deliberately concise to comply with length constraints and to focus on results. The rewriting algorithm, typing of state transitions, and state construction procedure are described in detail in Section 3 of the manuscript. To improve accessibility, we will revise the abstract to include a brief clause such as 'via rewriting into sequences of typed state transitions licensed by explicit justifications' that directly references the mechanisms supporting the completeness-of-change invariant. revision: yes
-
Referee: Verification architecture description: The foundational invariant of completeness of change (every difference between consecutive proof states must be accounted for by an explicit justification) is presented as reliably enforceable in informal text, but without a formal definition of state differences, a decision procedure, or inter-annotator study, it is unclear how the reported advantages (e.g., 90.6% vs 62.5% on hidden premises) are achieved; this is load-bearing for interpreting the Wilson CIs and adversarial results.
Authors: The manuscript presents the invariant through the operational rewriting process and validates it empirically via the adversarial poisoned-proof experiments, which were explicitly constructed to probe the predicted error classes (hidden premises and fabricated citations). The 28 pp advantage on hidden premises is therefore tied directly to the architecture's requirement for explicit justifications. We agree that a formal definition of state differences and a decision procedure would strengthen the theoretical section; we will add a new subsection providing these. An inter-annotator study on the rewriting process was not performed; we will explicitly note this as a limitation and indicate it as future work, while retaining the adversarial results as the primary empirical support for the reported advantages. revision: partial
Circularity Check
No significant circularity; claims rest on external empirical benchmarks
full rationale
The paper presents Theoria as an architecture enforcing the completeness-of-change invariant during rewriting of informal proofs into auditable state transitions. All reported results (105/185 certifications at 91.4% precision on HLE-Verified Gold, 97.1% on GPQA Diamond, 94.7% vs 83.2% on adversarial set) are direct empirical measurements on held-out external datasets with no fitted parameters, no predictions derived from the same data used for tuning, and no load-bearing self-citations. The invariant itself is stated as an explicit design choice rather than derived from prior results by the same authors. No equation or step reduces a claimed outcome to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Completeness of change: every difference between consecutive proof states must be accounted for by an explicit justification (citation, computation, or problem-given fact).
invented entities (1)
-
Theoria verification architecture
no independent evidence
Reference graph
Works this paper leans on
- [1]
-
[2]
Advancing Mathematics Research with AI-Driven Formal Proof Search
G. Tsoukalas et al. Advancing mathematics research with AI-driven formal proof search. arXiv:2605.22763,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Aristotle: IMO-level Automated Theorem Proving
19 T. Achim et al. Aristotle: IMO-level automated theorem proving.arXiv:2510.01346,
work page internal anchor Pith review Pith/arXiv arXiv
- [4]
-
[5]
(Proof fully formalized in Lean/Mathlib and produced automatically by AxiomProver.) H. Lightman et al. Let’s verify step by step.arXiv:2305.20050,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
P. Wang et al. Math-Shepherd: Verify and reinforce LLMs step-by-step without human annotations. arXiv:2312.08935,
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
L. Yuan et al. Free process rewards without process labels.arXiv:2412.01981,
-
[8]
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
L. Zheng et al. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena.arXiv:2306.05685,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
J. Gu et al. A survey on LLM-as-a-judge.arXiv:2411.15594,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Z. Ling et al. Deductive verification of chain-of-thought reasoning.arXiv:2306.03872,
-
[11]
K. Zhou and L. Zhang. Step-wise formal verification for LLM-based mathematical problem solving (MATH-VF).arXiv:2505.20869,
- [12]
-
[13]
Reinforced Self-Training (ReST) for Language Modeling
C. Gulcehre et al. Reinforced self-training (ReST) for language modeling.arXiv:2308.08998,
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
D. Zhang et al. ReST-MCTS*: LLM self-training via process reward guided tree search. arXiv:2406.03816,
-
[15]
L. Phan et al. Humanity’s last exam.arXiv:2501.14249,
work page internal anchor Pith review Pith/arXiv arXiv
-
[16]
W. Zhai et al. HLE-Verified: A systematic verification and structured revision of Humanity’s Last Exam.arXiv:2602.13964,
-
[17]
Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing
W. Yuan et al. Verify before you commit: Towards faithful reasoning in LLM agents via self-auditing (SAVeR).arXiv:2604.08401,
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
J. Fang et al. Graph of verification: Structured verification of LLM reasoning with directed acyclic graphs.arXiv:2506.12509,
-
[19]
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
D. Rein et al. GPQA: A graduate-level Google-proof Q&A benchmark.arXiv:2311.12022,
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.