pith. machine review for the scientific record. sign in

arxiv: 2605.08605 · v1 · submitted 2026-05-09 · 💻 cs.LG · cs.AI· cs.LO

Recognition: 2 theorem links

· Lean Theorem

Lattice Deduction Transformers

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:23 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.LO
keywords Lattice Deduction Transformerrecurrent transformerSudokuconstraint satisfactionabstract interpretationlogical deductionsmall language modelsneuro-symbolic reasoning
0
0 comments X

The pith

A recurrent transformer achieves perfect accuracy on extreme Sudoku by projecting its latent state through a lattice to approximate sound deduction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The Lattice Deduction Transformer trains a small recurrent model to perform logical deduction on constraint problems by inserting a lattice projection step between its forward passes. This projection, guided by abstract interpretation of possible solutions, lets the network mirror the behavior of search-based solvers during on-policy training. The resulting 800K-parameter model reaches 100 percent accuracy on Sudoku-Extreme and Snowflake Sudoku while either returning a correct answer or abstaining, and a slightly larger variant reaches 99.9 percent on Maze-Hard. Frontier large language models score zero on the same benchmarks, and the approach uses far less training compute than earlier small recurrent reasoners. A sympathetic reader would care because the method suggests that compact neural networks can embed reliable, generalizable deduction for tasks that normally demand either exhaustive search or massive scale.

Core claim

The Lattice Deduction Transformer approximates logically sound deduction by projecting its latent state through a lattice between forward passes. Trained on-policy to mirror deduction in a search-based constraint solver and supervised by a domain-agnostic abstract-interpretation approximation of the set of solution candidates, an 800K-parameter LDT achieves 100% accuracy on Sudoku-Extreme and Snowflake Sudoku while remaining empirically sound, and a 1.8M-parameter variant reaches 99.9% accuracy on Maze-Hard. Frontier LLMs score 0% on all three benchmarks.

What carries the argument

Projecting the latent state through a lattice between forward passes, which approximates the set of solution candidates via abstract interpretation and enforces soundness during recurrent reasoning.

Load-bearing premise

Projecting the latent state through a lattice produces an approximation of logically sound deduction that generalizes beyond the training distribution when supervised by abstract interpretation.

What would settle it

A held-out test set of Sudoku puzzles whose solution structures or constraint densities differ from the training distribution, where accuracy falls below 100% or the model produces incorrect answers without abstaining.

Figures

Figures reproduced from arXiv: 2605.08605 by Alberto Alfarano, Leopold Haller, Liam Davis, Mark Santolucito.

Figure 1
Figure 1. Figure 1: The Lattice Deduction Transformer (LDT) performs sound deduction on a lattice using a [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Lattice-tensor encoding in the transformer. Surviving candidates are highlighted; eliminated [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Test/train compute trade-off on Sudoku-Extreme. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Maze-Hard task and multi-solution supervision via [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Round-trip γ ◦ α on 2 × 2 Sudoku. The set S contains two grids that differ in the right-hand column. Abstraction drops the cell-to-cell correlation, so concretizing the result yields four grids – the original two plus two additional grids that satisfy the per-cell candidate sets but were not in S. {1} {1, 2} {1} {1, 2} a ∈ A ∅ γ(a) ∈ C ∅ ∅ ∅ ∅ α(γ(a)) = ⊥ γ α [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Round-trip α ◦ γ on an impossible abstract element. Column 1 of a forces both rows to take the value 1, contradicting the column constraint, so γ(a) = ∅. Re-abstracting then collapses every cell to the empty candidate set, refining a all the way to ⊥. A.2 Sound Function Approximation and Deduction Abstract interpretation is usually concerned not with single states but with operations on states – for instan… view at source ↗
Figure 7
Figure 7. Figure 7: Lifting a Galois connection to monotone-function lattices. The Galois connection between [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: A snowflake Sudoku with n = 3 (3 hexagons, 18 cells). Each hexagon is divided into 6 triangular cells, each taking a value in {1, . . . , 6}. Intra-hex constraints require all 6 cells within each hexagon to be distinct. The shaded cells form a cross-hex constraint group: 2 cells from each hexagon that must also be mutually distinct. C LLM Baseline Prompts C.1 Sudoku-Extreme System prompt You are a Sudoku s… view at source ↗
read the original abstract

We introduce the Lattice Deduction Transformer (LDT), a recurrent transformer that approximates logically sound deduction by projecting its latent state through a lattice between forward passes. We train on-policy in a process that mirrors deduction in a search-based constraint solver and supervise training via a domain-agnostic, abstract-interpretation-based approximation of the set of solution candidates. An $800$K-parameter LDT achieves $100\%$ accuracy on Sudoku-Extreme and Snowflake Sudoku, at a fraction of the training cost of prior small recurrent reasoners, while remaining empirically sound: the model returns a correct answer or abstains. A $1.8$M-parameter variant reaches $99.9\%$ accuracy on Maze-Hard. Frontier LLMs score $0\%$ on all three benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Lattice Deduction Transformer (LDT), a recurrent transformer that approximates logically sound deduction by projecting its latent state through a lattice between forward passes. It is trained on-policy to mirror deduction in search-based constraint solvers and supervised using a domain-agnostic abstract-interpretation approximation of solution candidate sets. Key results include an 800K-parameter LDT achieving 100% accuracy on Sudoku-Extreme and Snowflake Sudoku at reduced training cost while remaining empirically sound (correct answer or abstention), a 1.8M-parameter variant reaching 99.9% on Maze-Hard, and frontier LLMs scoring 0% on all benchmarks.

Significance. If the lattice projection mechanism is shown to produce generalizable deductive approximations rather than distribution matching, the work would be significant for neural-symbolic reasoning: it demonstrates that small-parameter recurrent models can achieve near-perfect accuracy on hard constraint problems with built-in soundness via abstention, at far lower cost than prior recurrent reasoners or LLMs. The domain-agnostic supervision via abstract interpretation is a positive design choice that could generalize across tasks.

major comments (2)
  1. [Abstract] Abstract: the central performance claims (100% accuracy on Sudoku-Extreme, 99.9% on Maze-Hard) and the assertion of empirical soundness via abstention are presented without any information on training/validation data splits, error analysis, or the precise abstention rule; these details are load-bearing for determining whether the lattice projection enables extrapolation or whether results are explained by training-trajectory reproduction plus conservative abstention.
  2. [Abstract] Abstract: the claim that lattice projection 'approximates logically sound deduction' supervised only by abstract-interpretation over-approximations lacks supporting analysis showing that the learned recurrent dynamics remain inside the deductive closure rather than merely fitting observed trajectories; this is required to substantiate generalization beyond the training distribution.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'at a fraction of the training cost of prior small recurrent reasoners' would be strengthened by a brief quantitative comparison (e.g., epochs or FLOPs) even if full details appear later.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We address each of the major comments below and have updated the abstract and discussion sections to provide additional clarity on the experimental setup and the deductive properties of the model.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central performance claims (100% accuracy on Sudoku-Extreme, 99.9% on Maze-Hard) and the assertion of empirical soundness via abstention are presented without any information on training/validation data splits, error analysis, or the precise abstention rule; these details are load-bearing for determining whether the lattice projection enables extrapolation or whether results are explained by training-trajectory reproduction plus conservative abstention.

    Authors: We concur that these details are important for interpreting the results. Although the full specifications are provided in the experimental setup (Section 4) and results analysis (Section 5), we have revised the abstract to concisely include information on the data splits, a summary of the error analysis, and the abstention rule. This revision helps readers assess the generalization capabilities of the lattice projection mechanism versus potential memorization of training trajectories. revision: yes

  2. Referee: [Abstract] Abstract: the claim that lattice projection 'approximates logically sound deduction' supervised only by abstract-interpretation over-approximations lacks supporting analysis showing that the learned recurrent dynamics remain inside the deductive closure rather than merely fitting observed trajectories; this is required to substantiate generalization beyond the training distribution.

    Authors: The paper includes supporting analysis for this claim through the description of the lattice projection mechanism in Section 3, which enforces properties that keep the state within the deductive closure by design, combined with the on-policy training that follows solver deduction steps. The abstract-interpretation supervision provides an over-approximation that guides the model without requiring exact solutions. Generalization is substantiated by the high accuracy on out-of-distribution benchmarks like Sudoku-Extreme and Maze-Hard, where the model either produces correct solutions or abstains. We have added a brief elaboration in the abstract and a new paragraph in the discussion section to make this analysis more prominent. revision: yes

Circularity Check

0 steps flagged

No significant circularity; architectural mechanism presented as independent contribution

full rationale

The paper defines the Lattice Deduction Transformer via a recurrent projection of latent states through a lattice, trained on-policy under abstract-interpretation supervision. No equations, derivations, or self-citations are exhibited that reduce the claimed accuracy or soundness to fitted inputs by construction, nor does any load-bearing premise collapse to a prior result by the same authors. The central claim rests on an empirical architectural choice whose generalization properties are left to experimental validation rather than being enforced by definitional identity or self-referential training.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities beyond the high-level description of the lattice projection and abstract interpretation; these mechanisms are treated as core contributions rather than fitted quantities.

pith-pipeline@v0.9.0 · 5427 in / 1173 out tokens · 46219 ms · 2026-05-12T01:23:37.579841+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 1 internal anchor

  1. [1]

    Chain-of-thought prompting elicits reasoning in large language models

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Processing Systems, volume 35, 2022

  2. [2]

    Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou

    Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InInternational Conference on Learning Representations (ICLR), 2023

  3. [3]

    Scaling LLM test-time compute opti- mally can be more effective than scaling parameters for reasoning

    Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling LLM test-time compute opti- mally can be more effective than scaling parameters for reasoning. InInternational Conference on Learning Representations (ICLR), 2025

  4. [4]

    Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg

    Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Struc- tured denoising diffusion models in discrete state-spaces. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

  5. [5]

    Cousot and R

    P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. InConference Record of the Fourth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 238–252, Los Angeles, California, 1977. ACM Press, New York, NY

  6. [6]

    Cousot and R

    P. Cousot and R. Cousot. Systematic design of program analysis frameworks. InConference Record of the Sixth Annual ACM Symposium on Principles of Programming Languages, pages 269–282, San Antonio, Texas, 1979. ACM Press, New York, NY

  7. [7]

    Adaptive computation time for recurrent neural networks, 2016

    Alex Graves. Adaptive computation time for recurrent neural networks, 2016

  8. [8]

    Uni- versal transformers

    Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Łukasz Kaiser. Uni- versal transformers. InInternational Conference on Learning Representations (ICLR), 2019

  9. [9]

    Hierarchical reasoning model, 2025

    Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, and Yasin Abbasi Yadkori. Hierarchical reasoning model, 2025

  10. [10]

    Less is more: Recursive reasoning with tiny networks, 2025

    Alexia Jolicoeur-Martineau. Less is more: Recursive reasoning with tiny networks, 2025

  11. [11]

    Sotaku: From-scratch experiments on iterative neural Sudoku solvers

    Cheng Lou. Sotaku: From-scratch experiments on iterative neural Sudoku solvers. Soft- ware, GitHub repository https://github.com/chenglou/sotaku, commit 9e13341, 2026. Accessed: 2026-05-07

  12. [12]

    Donti, Bryan Wilder, and J

    Po-Wei Wang, Priya L. Donti, Bryan Wilder, and J. Zico Kolter. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. InProceedings of the 36th International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research, pages 6545–6554. PMLR, 2019

  13. [13]

    Learning modulo theories.arXiv preprint arXiv:2301.11435, 2023

    Matt Fredrikson, Kaiji Lu, Saranya Vijayakumar, Somesh Jha, Vijay Ganesh, and Zifan Wang. Learning modulo theories.arXiv preprint arXiv:2301.11435, 2023

  14. [14]

    Neural abstract interpretation

    Shaurya Gomber and Gagandeep Singh. Neural abstract interpretation. InICLR 2025 Workshop: VerifAI: AI Verification in the Wild, 2025

  15. [15]

    Learning fast and precise numerical analysis

    Jingxuan He, Gagandeep Singh, Markus Püschel, and Martin Vechev. Learning fast and precise numerical analysis. InProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 1112–1127, 2020

  16. [16]

    SAIL: Sound abstract interpreters with LLMs

    Qiuhan Gu, Avaljot Singh, and Gagandeep Singh. SAIL: Sound abstract interpreters with LLMs. Proceedings of the ACM on Programming Languages, 10(PLDI), 2026. To appear

  17. [17]

    Fast numerical program analysis with reinforcement learning

    Gagandeep Singh, Markus Püschel, and Martin Vechev. Fast numerical program analysis with reinforcement learning. InComputer Aided Verification (CAV), volume 10981 ofLecture Notes in Computer Science, pages 211–229. Springer, 2018. 10

  18. [18]

    Learning a strategy for choosing widening thresholds from a large codebase

    Sooyoung Cha, Sehun Jeong, and Hakjoo Oh. Learning a strategy for choosing widening thresholds from a large codebase. InProgramming Languages and Systems (APLAS), volume 10017 ofLecture Notes in Computer Science, pages 25–41. Springer, 2016

  19. [19]

    BERT: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4171–4186, 2019

  20. [20]

    Remasking discrete diffusion models with inference-time scaling

    Guanghan Wang, Yair Schiff, Subham Sekhar Sahoo, and V olodymyr Kuleshov. Remasking discrete diffusion models with inference-time scaling. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  21. [21]

    Krishnan

    Chen-Hao Chao, Wei-Fang Sun, Hanwen Liang, Chun-Yi Lee, and Rahul G. Krishnan. Beyond masked and unmasked: Discrete diffusion models via partial masking. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  22. [22]

    Next semantic scale prediction via hierarchical diffusion language models

    Cai Zhou, Chenyu Wang, Dinghuai Zhang, Shangyuan Tong, Yifei Wang, Stephen Bates, and Tommi Jaakkola. Next semantic scale prediction via hierarchical diffusion language models. In Advances in Neural Information Processing Systems (NeurIPS), 2025

  23. [23]

    A computing procedure for quantification theory.Journal of the ACM, 7(3):201–215, 1960

    Martin Davis and Hilary Putnam. A computing procedure for quantification theory.Journal of the ACM, 7(3):201–215, 1960

  24. [24]

    A machine program for theorem- proving.Communications of the ACM, 5(7):394–397, 1962

    Martin Davis, George Logemann, and Donald Loveland. A machine program for theorem- proving.Communications of the ACM, 5(7):394–397, 1962

  25. [25]

    Marques-Silva, Inês Lynce, and Sharad Malik

    João P. Marques-Silva, Inês Lynce, and Sharad Malik. Conflict-driven clause learning SAT solvers. In Armin Biere, Marijn Heule, Hans van Maaren, and Toby Walsh, editors,Handbook of Satisfiability, volume 185 ofFrontiers in Artificial Intelligence and Applications, pages 131–153. IOS Press, 2009

  26. [26]

    A tutorial on stålmarck’s proof procedure for propositional logic

    Mary Sheeran and Gunnar Stålmarck. A tutorial on stålmarck’s proof procedure for propositional logic. In Ganesh Gopalakrishnan and Phillip Windley, editors,Formal Methods in Computer- Aided Design, pages 82–99, Berlin, Heidelberg, 1998. Springer Berlin Heidelberg

  27. [27]

    Two optimizations on the stålmarck procedure, 2025

    Sergei Leonov and Liam Davis. Two optimizations on the stålmarck procedure, 2025

  28. [28]

    Abstract conflict driven learning

    Vijay D’Silva, Leopold Haller, and Daniel Kroening. Abstract conflict driven learning. InPro- ceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages 143–154. ACM, 2013

  29. [29]

    Thakur and Thomas W

    Aditya V . Thakur and Thomas W. Reps. A generalization of stålmarck’s method. InStatic Analysis – 19th International Symposium (SAS), volume 7460 ofLecture Notes in Computer Science, pages 334–351. Springer, 2012

  30. [30]

    RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

    Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

  31. [31]

    On-policy distillation of language models: Learning from self- generated mistakes

    Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos, Matthieu Geist, and Olivier Bachem. On-policy distillation of language models: Learning from self- generated mistakes. InInternational Conference on Learning Representations (ICLR), 2024

  32. [32]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, 2 edition, 2018

  33. [33]

    Codebleu: a method for automatic evaluation of code synthesis, 2020

    Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. Codebleu: a method for automatic evaluation of code synthesis, 2020

  34. [34]

    Tdoku: A fast Sudoku solver and generator

    Tom Dillon. Tdoku: A fast Sudoku solver and generator. Software, GitHub repository https://github.com/t-dillon/tdoku, 2025. Accessed: 2026-04-27

  35. [35]

    Recurrent relational networks

    Rasmus Berg Palm, Ulrich Paquet, and Ole Winther. Recurrent relational networks. InAdvances in Neural Information Processing Systems (NeurIPS), volume 31, 2018. 11

  36. [36]

    Can convolutional neural networks crack Sudoku puzzles? Software, GitHub repositoryhttps://github.com/Kyubyong/sudoku, 2018

    Kyubyong Park. Can convolutional neural networks crack Sudoku puzzles? Software, GitHub repositoryhttps://github.com/Kyubyong/sudoku, 2018. Accessed: 2026-05-07

  37. [37]

    Introducing claude opus 4.6

    Anthropic. Introducing claude opus 4.6. Anthropic News, February 2026. Accessed: 2026-04- 29

  38. [38]

    Deepseek-v4 preview release: Ushering in the era of 1m context length

    DeepSeek-AI. Deepseek-v4 preview release: Ushering in the era of 1m context length. DeepSeek Official Blog, April 2026. Accessed: 2026-04-29

  39. [39]

    Gpt-5.4 thinking system card

    OpenAI. Gpt-5.4 thinking system card. Technical report, OpenAI, March 2026

  40. [40]

    Haniel Barbosa, Clark W. Barrett, Martin Brain, Gereon Kremer, Hanna Lachnitt, Makai Mann, Abdalrhman Mohamed, Mudathir Mohamed, Aina Niemetz, Andres Nötzli, Alex Ozdemir, Mathias Preiner, Andrew Reynolds, Ying Sheng, Cesare Tinelli, and Yoni Zohar. cvc5: A versatile and industrial-strength SMT solver. In Dana Fisman and Grigore Rosu, editors,Tools and Al...

  41. [41]

    On the Measure of Intelligence

    François Chollet. On the measure of intelligence.arXiv preprint arXiv:1911.01547, 2019

  42. [42]

    Arc-agi-2: A new challenge for frontier ai reasoning systems, 2025

    Francois Chollet, Mike Knoop, Gregory Kamradt, Bryan Landers, and Henry Pinkard. Arc-agi-2: A new challenge for frontier ai reasoning systems, 2025

  43. [43]

    Arc-agi-3: A new challenge for frontier agentic intelligence, 2026

    ARC Prize Foundation. Arc-agi-3: A new challenge for frontier agentic intelligence, 2026. A Abstract Interpretation for Logical Problems This appendix gives a formal introduction to abstract interpretation. The framework is most often presented in the setting of program analysis, where the concrete domain is the (typically infinite) set of program states;...

  44. [44]

    We index the K chains within a slot in ascending order 0,1,

    search would have paid, even though inference is run with the parallel slot/chain batch described above. We index the K chains within a slot in ascending order 0,1, . . . , K−1 and admit a fresh puzzle to a slot whenever the current one terminates. For each puzzle, let chain w be the first to reach an accepted solution at round dw. The sequential cost we ...