Recognition: 2 theorem links
· Lean TheoremLattice Deduction Transformers
Pith reviewed 2026-05-12 01:23 UTC · model grok-4.3
The pith
A recurrent transformer achieves perfect accuracy on extreme Sudoku by projecting its latent state through a lattice to approximate sound deduction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Lattice Deduction Transformer approximates logically sound deduction by projecting its latent state through a lattice between forward passes. Trained on-policy to mirror deduction in a search-based constraint solver and supervised by a domain-agnostic abstract-interpretation approximation of the set of solution candidates, an 800K-parameter LDT achieves 100% accuracy on Sudoku-Extreme and Snowflake Sudoku while remaining empirically sound, and a 1.8M-parameter variant reaches 99.9% accuracy on Maze-Hard. Frontier LLMs score 0% on all three benchmarks.
What carries the argument
Projecting the latent state through a lattice between forward passes, which approximates the set of solution candidates via abstract interpretation and enforces soundness during recurrent reasoning.
Load-bearing premise
Projecting the latent state through a lattice produces an approximation of logically sound deduction that generalizes beyond the training distribution when supervised by abstract interpretation.
What would settle it
A held-out test set of Sudoku puzzles whose solution structures or constraint densities differ from the training distribution, where accuracy falls below 100% or the model produces incorrect answers without abstaining.
Figures
read the original abstract
We introduce the Lattice Deduction Transformer (LDT), a recurrent transformer that approximates logically sound deduction by projecting its latent state through a lattice between forward passes. We train on-policy in a process that mirrors deduction in a search-based constraint solver and supervise training via a domain-agnostic, abstract-interpretation-based approximation of the set of solution candidates. An $800$K-parameter LDT achieves $100\%$ accuracy on Sudoku-Extreme and Snowflake Sudoku, at a fraction of the training cost of prior small recurrent reasoners, while remaining empirically sound: the model returns a correct answer or abstains. A $1.8$M-parameter variant reaches $99.9\%$ accuracy on Maze-Hard. Frontier LLMs score $0\%$ on all three benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Lattice Deduction Transformer (LDT), a recurrent transformer that approximates logically sound deduction by projecting its latent state through a lattice between forward passes. It is trained on-policy to mirror deduction in search-based constraint solvers and supervised using a domain-agnostic abstract-interpretation approximation of solution candidate sets. Key results include an 800K-parameter LDT achieving 100% accuracy on Sudoku-Extreme and Snowflake Sudoku at reduced training cost while remaining empirically sound (correct answer or abstention), a 1.8M-parameter variant reaching 99.9% on Maze-Hard, and frontier LLMs scoring 0% on all benchmarks.
Significance. If the lattice projection mechanism is shown to produce generalizable deductive approximations rather than distribution matching, the work would be significant for neural-symbolic reasoning: it demonstrates that small-parameter recurrent models can achieve near-perfect accuracy on hard constraint problems with built-in soundness via abstention, at far lower cost than prior recurrent reasoners or LLMs. The domain-agnostic supervision via abstract interpretation is a positive design choice that could generalize across tasks.
major comments (2)
- [Abstract] Abstract: the central performance claims (100% accuracy on Sudoku-Extreme, 99.9% on Maze-Hard) and the assertion of empirical soundness via abstention are presented without any information on training/validation data splits, error analysis, or the precise abstention rule; these details are load-bearing for determining whether the lattice projection enables extrapolation or whether results are explained by training-trajectory reproduction plus conservative abstention.
- [Abstract] Abstract: the claim that lattice projection 'approximates logically sound deduction' supervised only by abstract-interpretation over-approximations lacks supporting analysis showing that the learned recurrent dynamics remain inside the deductive closure rather than merely fitting observed trajectories; this is required to substantiate generalization beyond the training distribution.
minor comments (1)
- [Abstract] Abstract: the phrase 'at a fraction of the training cost of prior small recurrent reasoners' would be strengthened by a brief quantitative comparison (e.g., epochs or FLOPs) even if full details appear later.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback on our manuscript. We address each of the major comments below and have updated the abstract and discussion sections to provide additional clarity on the experimental setup and the deductive properties of the model.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claims (100% accuracy on Sudoku-Extreme, 99.9% on Maze-Hard) and the assertion of empirical soundness via abstention are presented without any information on training/validation data splits, error analysis, or the precise abstention rule; these details are load-bearing for determining whether the lattice projection enables extrapolation or whether results are explained by training-trajectory reproduction plus conservative abstention.
Authors: We concur that these details are important for interpreting the results. Although the full specifications are provided in the experimental setup (Section 4) and results analysis (Section 5), we have revised the abstract to concisely include information on the data splits, a summary of the error analysis, and the abstention rule. This revision helps readers assess the generalization capabilities of the lattice projection mechanism versus potential memorization of training trajectories. revision: yes
-
Referee: [Abstract] Abstract: the claim that lattice projection 'approximates logically sound deduction' supervised only by abstract-interpretation over-approximations lacks supporting analysis showing that the learned recurrent dynamics remain inside the deductive closure rather than merely fitting observed trajectories; this is required to substantiate generalization beyond the training distribution.
Authors: The paper includes supporting analysis for this claim through the description of the lattice projection mechanism in Section 3, which enforces properties that keep the state within the deductive closure by design, combined with the on-policy training that follows solver deduction steps. The abstract-interpretation supervision provides an over-approximation that guides the model without requiring exact solutions. Generalization is substantiated by the high accuracy on out-of-distribution benchmarks like Sudoku-Extreme and Maze-Hard, where the model either produces correct solutions or abstains. We have added a brief elaboration in the abstract and a new paragraph in the discussion section to make this analysis more prominent. revision: yes
Circularity Check
No significant circularity; architectural mechanism presented as independent contribution
full rationale
The paper defines the Lattice Deduction Transformer via a recurrent projection of latent states through a lattice, trained on-policy under abstract-interpretation supervision. No equations, derivations, or self-citations are exhibited that reduce the claimed accuracy or soundness to fitted inputs by construction, nor does any load-bearing premise collapse to a prior result by the same authors. The central claim rests on an empirical architectural choice whose generalization properties are left to experimental validation rather than being enforced by definitional identity or self-referential training.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce the Lattice Deduction Transformer (LDT), a recurrent transformer that approximates logically sound deduction by projecting its latent state through a lattice between forward passes.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Abstract interpretation... models information states as elements of a lattice with a top element ⊤... Deduction is a process that descends from ⊤ toward more informative states
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Chain-of-thought prompting elicits reasoning in large language models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Processing Systems, volume 35, 2022
work page 2022
-
[2]
Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InInternational Conference on Learning Representations (ICLR), 2023
work page 2023
-
[3]
Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling LLM test-time compute opti- mally can be more effective than scaling parameters for reasoning. InInternational Conference on Learning Representations (ICLR), 2025
work page 2025
-
[4]
Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg
Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Struc- tured denoising diffusion models in discrete state-spaces. InAdvances in Neural Information Processing Systems (NeurIPS), 2021
work page 2021
-
[5]
P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. InConference Record of the Fourth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 238–252, Los Angeles, California, 1977. ACM Press, New York, NY
work page 1977
-
[6]
P. Cousot and R. Cousot. Systematic design of program analysis frameworks. InConference Record of the Sixth Annual ACM Symposium on Principles of Programming Languages, pages 269–282, San Antonio, Texas, 1979. ACM Press, New York, NY
work page 1979
-
[7]
Adaptive computation time for recurrent neural networks, 2016
Alex Graves. Adaptive computation time for recurrent neural networks, 2016
work page 2016
-
[8]
Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Łukasz Kaiser. Uni- versal transformers. InInternational Conference on Learning Representations (ICLR), 2019
work page 2019
-
[9]
Hierarchical reasoning model, 2025
Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, and Yasin Abbasi Yadkori. Hierarchical reasoning model, 2025
work page 2025
-
[10]
Less is more: Recursive reasoning with tiny networks, 2025
Alexia Jolicoeur-Martineau. Less is more: Recursive reasoning with tiny networks, 2025
work page 2025
-
[11]
Sotaku: From-scratch experiments on iterative neural Sudoku solvers
Cheng Lou. Sotaku: From-scratch experiments on iterative neural Sudoku solvers. Soft- ware, GitHub repository https://github.com/chenglou/sotaku, commit 9e13341, 2026. Accessed: 2026-05-07
work page 2026
-
[12]
Po-Wei Wang, Priya L. Donti, Bryan Wilder, and J. Zico Kolter. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. InProceedings of the 36th International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research, pages 6545–6554. PMLR, 2019
work page 2019
-
[13]
Learning modulo theories.arXiv preprint arXiv:2301.11435, 2023
Matt Fredrikson, Kaiji Lu, Saranya Vijayakumar, Somesh Jha, Vijay Ganesh, and Zifan Wang. Learning modulo theories.arXiv preprint arXiv:2301.11435, 2023
-
[14]
Neural abstract interpretation
Shaurya Gomber and Gagandeep Singh. Neural abstract interpretation. InICLR 2025 Workshop: VerifAI: AI Verification in the Wild, 2025
work page 2025
-
[15]
Learning fast and precise numerical analysis
Jingxuan He, Gagandeep Singh, Markus Püschel, and Martin Vechev. Learning fast and precise numerical analysis. InProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 1112–1127, 2020
work page 2020
-
[16]
SAIL: Sound abstract interpreters with LLMs
Qiuhan Gu, Avaljot Singh, and Gagandeep Singh. SAIL: Sound abstract interpreters with LLMs. Proceedings of the ACM on Programming Languages, 10(PLDI), 2026. To appear
work page 2026
-
[17]
Fast numerical program analysis with reinforcement learning
Gagandeep Singh, Markus Püschel, and Martin Vechev. Fast numerical program analysis with reinforcement learning. InComputer Aided Verification (CAV), volume 10981 ofLecture Notes in Computer Science, pages 211–229. Springer, 2018. 10
work page 2018
-
[18]
Learning a strategy for choosing widening thresholds from a large codebase
Sooyoung Cha, Sehun Jeong, and Hakjoo Oh. Learning a strategy for choosing widening thresholds from a large codebase. InProgramming Languages and Systems (APLAS), volume 10017 ofLecture Notes in Computer Science, pages 25–41. Springer, 2016
work page 2016
-
[19]
BERT: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4171–4186, 2019
work page 2019
-
[20]
Remasking discrete diffusion models with inference-time scaling
Guanghan Wang, Yair Schiff, Subham Sekhar Sahoo, and V olodymyr Kuleshov. Remasking discrete diffusion models with inference-time scaling. InAdvances in Neural Information Processing Systems (NeurIPS), 2025
work page 2025
- [21]
-
[22]
Next semantic scale prediction via hierarchical diffusion language models
Cai Zhou, Chenyu Wang, Dinghuai Zhang, Shangyuan Tong, Yifei Wang, Stephen Bates, and Tommi Jaakkola. Next semantic scale prediction via hierarchical diffusion language models. In Advances in Neural Information Processing Systems (NeurIPS), 2025
work page 2025
-
[23]
A computing procedure for quantification theory.Journal of the ACM, 7(3):201–215, 1960
Martin Davis and Hilary Putnam. A computing procedure for quantification theory.Journal of the ACM, 7(3):201–215, 1960
work page 1960
-
[24]
A machine program for theorem- proving.Communications of the ACM, 5(7):394–397, 1962
Martin Davis, George Logemann, and Donald Loveland. A machine program for theorem- proving.Communications of the ACM, 5(7):394–397, 1962
work page 1962
-
[25]
Marques-Silva, Inês Lynce, and Sharad Malik
João P. Marques-Silva, Inês Lynce, and Sharad Malik. Conflict-driven clause learning SAT solvers. In Armin Biere, Marijn Heule, Hans van Maaren, and Toby Walsh, editors,Handbook of Satisfiability, volume 185 ofFrontiers in Artificial Intelligence and Applications, pages 131–153. IOS Press, 2009
work page 2009
-
[26]
A tutorial on stålmarck’s proof procedure for propositional logic
Mary Sheeran and Gunnar Stålmarck. A tutorial on stålmarck’s proof procedure for propositional logic. In Ganesh Gopalakrishnan and Phillip Windley, editors,Formal Methods in Computer- Aided Design, pages 82–99, Berlin, Heidelberg, 1998. Springer Berlin Heidelberg
work page 1998
-
[27]
Two optimizations on the stålmarck procedure, 2025
Sergei Leonov and Liam Davis. Two optimizations on the stålmarck procedure, 2025
work page 2025
-
[28]
Abstract conflict driven learning
Vijay D’Silva, Leopold Haller, and Daniel Kroening. Abstract conflict driven learning. InPro- ceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages 143–154. ACM, 2013
work page 2013
-
[29]
Aditya V . Thakur and Thomas W. Reps. A generalization of stålmarck’s method. InStatic Analysis – 19th International Symposium (SAS), volume 7460 ofLecture Notes in Computer Science, pages 334–351. Springer, 2012
work page 2012
-
[30]
RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024
work page 2024
-
[31]
On-policy distillation of language models: Learning from self- generated mistakes
Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos, Matthieu Geist, and Olivier Bachem. On-policy distillation of language models: Learning from self- generated mistakes. InInternational Conference on Learning Representations (ICLR), 2024
work page 2024
-
[32]
Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, 2 edition, 2018
work page 2018
-
[33]
Codebleu: a method for automatic evaluation of code synthesis, 2020
Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. Codebleu: a method for automatic evaluation of code synthesis, 2020
work page 2020
-
[34]
Tdoku: A fast Sudoku solver and generator
Tom Dillon. Tdoku: A fast Sudoku solver and generator. Software, GitHub repository https://github.com/t-dillon/tdoku, 2025. Accessed: 2026-04-27
work page 2025
-
[35]
Rasmus Berg Palm, Ulrich Paquet, and Ole Winther. Recurrent relational networks. InAdvances in Neural Information Processing Systems (NeurIPS), volume 31, 2018. 11
work page 2018
-
[36]
Kyubyong Park. Can convolutional neural networks crack Sudoku puzzles? Software, GitHub repositoryhttps://github.com/Kyubyong/sudoku, 2018. Accessed: 2026-05-07
work page 2018
-
[37]
Anthropic. Introducing claude opus 4.6. Anthropic News, February 2026. Accessed: 2026-04- 29
work page 2026
-
[38]
Deepseek-v4 preview release: Ushering in the era of 1m context length
DeepSeek-AI. Deepseek-v4 preview release: Ushering in the era of 1m context length. DeepSeek Official Blog, April 2026. Accessed: 2026-04-29
work page 2026
-
[39]
OpenAI. Gpt-5.4 thinking system card. Technical report, OpenAI, March 2026
work page 2026
-
[40]
Haniel Barbosa, Clark W. Barrett, Martin Brain, Gereon Kremer, Hanna Lachnitt, Makai Mann, Abdalrhman Mohamed, Mudathir Mohamed, Aina Niemetz, Andres Nötzli, Alex Ozdemir, Mathias Preiner, Andrew Reynolds, Ying Sheng, Cesare Tinelli, and Yoni Zohar. cvc5: A versatile and industrial-strength SMT solver. In Dana Fisman and Grigore Rosu, editors,Tools and Al...
work page 2022
-
[41]
On the Measure of Intelligence
François Chollet. On the measure of intelligence.arXiv preprint arXiv:1911.01547, 2019
work page internal anchor Pith review arXiv 1911
-
[42]
Arc-agi-2: A new challenge for frontier ai reasoning systems, 2025
Francois Chollet, Mike Knoop, Gregory Kamradt, Bryan Landers, and Henry Pinkard. Arc-agi-2: A new challenge for frontier ai reasoning systems, 2025
work page 2025
-
[43]
Arc-agi-3: A new challenge for frontier agentic intelligence, 2026
ARC Prize Foundation. Arc-agi-3: A new challenge for frontier agentic intelligence, 2026. A Abstract Interpretation for Logical Problems This appendix gives a formal introduction to abstract interpretation. The framework is most often presented in the setting of program analysis, where the concrete domain is the (typically infinite) set of program states;...
work page 2026
-
[44]
We index the K chains within a slot in ascending order 0,1,
search would have paid, even though inference is run with the parallel slot/chain batch described above. We index the K chains within a slot in ascending order 0,1, . . . , K−1 and admit a fresh puzzle to a slot whenever the current one terminates. For each puzzle, let chain w be the first to reach an accepted solution at round dw. The sequential cost we ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.