arxiv: 2605.08605 · v1 · submitted 2026-05-09 · 💻 cs.LG · cs.AI· cs.LO

Recognition: 2 theorem links

· Lean Theorem

Lattice Deduction Transformers

Liam Davis , Leopold Haller , Alberto Alfarano , Mark Santolucito

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:23 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.LO

keywords Lattice Deduction Transformerrecurrent transformerSudokuconstraint satisfactionabstract interpretationlogical deductionsmall language modelsneuro-symbolic reasoning

0 comments

The pith

A recurrent transformer achieves perfect accuracy on extreme Sudoku by projecting its latent state through a lattice to approximate sound deduction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The Lattice Deduction Transformer trains a small recurrent model to perform logical deduction on constraint problems by inserting a lattice projection step between its forward passes. This projection, guided by abstract interpretation of possible solutions, lets the network mirror the behavior of search-based solvers during on-policy training. The resulting 800K-parameter model reaches 100 percent accuracy on Sudoku-Extreme and Snowflake Sudoku while either returning a correct answer or abstaining, and a slightly larger variant reaches 99.9 percent on Maze-Hard. Frontier large language models score zero on the same benchmarks, and the approach uses far less training compute than earlier small recurrent reasoners. A sympathetic reader would care because the method suggests that compact neural networks can embed reliable, generalizable deduction for tasks that normally demand either exhaustive search or massive scale.

Core claim

The Lattice Deduction Transformer approximates logically sound deduction by projecting its latent state through a lattice between forward passes. Trained on-policy to mirror deduction in a search-based constraint solver and supervised by a domain-agnostic abstract-interpretation approximation of the set of solution candidates, an 800K-parameter LDT achieves 100% accuracy on Sudoku-Extreme and Snowflake Sudoku while remaining empirically sound, and a 1.8M-parameter variant reaches 99.9% accuracy on Maze-Hard. Frontier LLMs score 0% on all three benchmarks.

What carries the argument

Projecting the latent state through a lattice between forward passes, which approximates the set of solution candidates via abstract interpretation and enforces soundness during recurrent reasoning.

Load-bearing premise

Projecting the latent state through a lattice produces an approximation of logically sound deduction that generalizes beyond the training distribution when supervised by abstract interpretation.

What would settle it

A held-out test set of Sudoku puzzles whose solution structures or constraint densities differ from the training distribution, where accuracy falls below 100% or the model produces incorrect answers without abstaining.

Figures

Figures reproduced from arXiv: 2605.08605 by Alberto Alfarano, Leopold Haller, Liam Davis, Mark Santolucito.

**Figure 2.** Figure 2: Lattice-tensor encoding in the transformer. Surviving candidates are highlighted; eliminated [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Test/train compute trade-off on Sudoku-Extreme. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Maze-Hard task and multi-solution supervision via [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Round-trip γ ◦ α on 2 × 2 Sudoku. The set S contains two grids that differ in the right-hand column. Abstraction drops the cell-to-cell correlation, so concretizing the result yields four grids – the original two plus two additional grids that satisfy the per-cell candidate sets but were not in S. {1} {1, 2} {1} {1, 2} a ∈ A ∅ γ(a) ∈ C ∅ ∅ ∅ ∅ α(γ(a)) = ⊥ γ α [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Round-trip α ◦ γ on an impossible abstract element. Column 1 of a forces both rows to take the value 1, contradicting the column constraint, so γ(a) = ∅. Re-abstracting then collapses every cell to the empty candidate set, refining a all the way to ⊥. A.2 Sound Function Approximation and Deduction Abstract interpretation is usually concerned not with single states but with operations on states – for instan… view at source ↗

**Figure 7.** Figure 7: Lifting a Galois connection to monotone-function lattices. The Galois connection between [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: A snowflake Sudoku with n = 3 (3 hexagons, 18 cells). Each hexagon is divided into 6 triangular cells, each taking a value in {1, . . . , 6}. Intra-hex constraints require all 6 cells within each hexagon to be distinct. The shaded cells form a cross-hex constraint group: 2 cells from each hexagon that must also be mutually distinct. C LLM Baseline Prompts C.1 Sudoku-Extreme System prompt You are a Sudoku s… view at source ↗

read the original abstract

We introduce the Lattice Deduction Transformer (LDT), a recurrent transformer that approximates logically sound deduction by projecting its latent state through a lattice between forward passes. We train on-policy in a process that mirrors deduction in a search-based constraint solver and supervise training via a domain-agnostic, abstract-interpretation-based approximation of the set of solution candidates. An $800$K-parameter LDT achieves $100\%$ accuracy on Sudoku-Extreme and Snowflake Sudoku, at a fraction of the training cost of prior small recurrent reasoners, while remaining empirically sound: the model returns a correct answer or abstains. A $1.8$M-parameter variant reaches $99.9\%$ accuracy on Maze-Hard. Frontier LLMs score $0\%$ on all three benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Lattice Deduction Transformers embed lattice projections into a small recurrent model to hit near-perfect scores on hard Sudoku and maze tasks while abstaining on uncertainty, but the evidence for true deductive generalization is still thin.

read the letter

The main point is that an 800k-parameter recurrent transformer projects its latent state through a lattice between steps, trains on-policy like a constraint solver, and uses abstract-interpretation supervision to reach 100% accuracy on Sudoku-Extreme and Snowflake Sudoku while either getting the answer right or abstaining. A slightly larger version does 99.9% on Maze-Hard, and frontier LLMs score zero on the same tests.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Lattice Deduction Transformer (LDT), a recurrent transformer that approximates logically sound deduction by projecting its latent state through a lattice between forward passes. It is trained on-policy to mirror deduction in search-based constraint solvers and supervised using a domain-agnostic abstract-interpretation approximation of solution candidate sets. Key results include an 800K-parameter LDT achieving 100% accuracy on Sudoku-Extreme and Snowflake Sudoku at reduced training cost while remaining empirically sound (correct answer or abstention), a 1.8M-parameter variant reaching 99.9% on Maze-Hard, and frontier LLMs scoring 0% on all benchmarks.

Significance. If the lattice projection mechanism is shown to produce generalizable deductive approximations rather than distribution matching, the work would be significant for neural-symbolic reasoning: it demonstrates that small-parameter recurrent models can achieve near-perfect accuracy on hard constraint problems with built-in soundness via abstention, at far lower cost than prior recurrent reasoners or LLMs. The domain-agnostic supervision via abstract interpretation is a positive design choice that could generalize across tasks.

major comments (2)

[Abstract] Abstract: the central performance claims (100% accuracy on Sudoku-Extreme, 99.9% on Maze-Hard) and the assertion of empirical soundness via abstention are presented without any information on training/validation data splits, error analysis, or the precise abstention rule; these details are load-bearing for determining whether the lattice projection enables extrapolation or whether results are explained by training-trajectory reproduction plus conservative abstention.
[Abstract] Abstract: the claim that lattice projection 'approximates logically sound deduction' supervised only by abstract-interpretation over-approximations lacks supporting analysis showing that the learned recurrent dynamics remain inside the deductive closure rather than merely fitting observed trajectories; this is required to substantiate generalization beyond the training distribution.

minor comments (1)

[Abstract] Abstract: the phrase 'at a fraction of the training cost of prior small recurrent reasoners' would be strengthened by a brief quantitative comparison (e.g., epochs or FLOPs) even if full details appear later.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We address each of the major comments below and have updated the abstract and discussion sections to provide additional clarity on the experimental setup and the deductive properties of the model.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claims (100% accuracy on Sudoku-Extreme, 99.9% on Maze-Hard) and the assertion of empirical soundness via abstention are presented without any information on training/validation data splits, error analysis, or the precise abstention rule; these details are load-bearing for determining whether the lattice projection enables extrapolation or whether results are explained by training-trajectory reproduction plus conservative abstention.

Authors: We concur that these details are important for interpreting the results. Although the full specifications are provided in the experimental setup (Section 4) and results analysis (Section 5), we have revised the abstract to concisely include information on the data splits, a summary of the error analysis, and the abstention rule. This revision helps readers assess the generalization capabilities of the lattice projection mechanism versus potential memorization of training trajectories. revision: yes
Referee: [Abstract] Abstract: the claim that lattice projection 'approximates logically sound deduction' supervised only by abstract-interpretation over-approximations lacks supporting analysis showing that the learned recurrent dynamics remain inside the deductive closure rather than merely fitting observed trajectories; this is required to substantiate generalization beyond the training distribution.

Authors: The paper includes supporting analysis for this claim through the description of the lattice projection mechanism in Section 3, which enforces properties that keep the state within the deductive closure by design, combined with the on-policy training that follows solver deduction steps. The abstract-interpretation supervision provides an over-approximation that guides the model without requiring exact solutions. Generalization is substantiated by the high accuracy on out-of-distribution benchmarks like Sudoku-Extreme and Maze-Hard, where the model either produces correct solutions or abstains. We have added a brief elaboration in the abstract and a new paragraph in the discussion section to make this analysis more prominent. revision: yes

Circularity Check

0 steps flagged

No significant circularity; architectural mechanism presented as independent contribution

full rationale

The paper defines the Lattice Deduction Transformer via a recurrent projection of latent states through a lattice, trained on-policy under abstract-interpretation supervision. No equations, derivations, or self-citations are exhibited that reduce the claimed accuracy or soundness to fitted inputs by construction, nor does any load-bearing premise collapse to a prior result by the same authors. The central claim rests on an empirical architectural choice whose generalization properties are left to experimental validation rather than being enforced by definitional identity or self-referential training.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities beyond the high-level description of the lattice projection and abstract interpretation; these mechanisms are treated as core contributions rather than fitted quantities.

pith-pipeline@v0.9.0 · 5427 in / 1173 out tokens · 46219 ms · 2026-05-12T01:23:37.579841+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce the Lattice Deduction Transformer (LDT), a recurrent transformer that approximates logically sound deduction by projecting its latent state through a lattice between forward passes.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Abstract interpretation... models information states as elements of a lattice with a top element ⊤... Deduction is a process that descends from ⊤ toward more informative states

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 1 internal anchor

[1]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Processing Systems, volume 35, 2022

work page 2022
[2]

Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[3]

Scaling LLM test-time compute opti- mally can be more effective than scaling parameters for reasoning

Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling LLM test-time compute opti- mally can be more effective than scaling parameters for reasoning. InInternational Conference on Learning Representations (ICLR), 2025

work page 2025
[4]

Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg

Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Struc- tured denoising diffusion models in discrete state-spaces. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[5]

Cousot and R

P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. InConference Record of the Fourth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 238–252, Los Angeles, California, 1977. ACM Press, New York, NY

work page 1977
[6]

Cousot and R

P. Cousot and R. Cousot. Systematic design of program analysis frameworks. InConference Record of the Sixth Annual ACM Symposium on Principles of Programming Languages, pages 269–282, San Antonio, Texas, 1979. ACM Press, New York, NY

work page 1979
[7]

Adaptive computation time for recurrent neural networks, 2016

Alex Graves. Adaptive computation time for recurrent neural networks, 2016

work page 2016
[8]

Uni- versal transformers

Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Łukasz Kaiser. Uni- versal transformers. InInternational Conference on Learning Representations (ICLR), 2019

work page 2019
[9]

Hierarchical reasoning model, 2025

Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, and Yasin Abbasi Yadkori. Hierarchical reasoning model, 2025

work page 2025
[10]

Less is more: Recursive reasoning with tiny networks, 2025

Alexia Jolicoeur-Martineau. Less is more: Recursive reasoning with tiny networks, 2025

work page 2025
[11]

Sotaku: From-scratch experiments on iterative neural Sudoku solvers

Cheng Lou. Sotaku: From-scratch experiments on iterative neural Sudoku solvers. Soft- ware, GitHub repository https://github.com/chenglou/sotaku, commit 9e13341, 2026. Accessed: 2026-05-07

work page 2026
[12]

Donti, Bryan Wilder, and J

Po-Wei Wang, Priya L. Donti, Bryan Wilder, and J. Zico Kolter. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. InProceedings of the 36th International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research, pages 6545–6554. PMLR, 2019

work page 2019
[13]

Learning modulo theories.arXiv preprint arXiv:2301.11435, 2023

Matt Fredrikson, Kaiji Lu, Saranya Vijayakumar, Somesh Jha, Vijay Ganesh, and Zifan Wang. Learning modulo theories.arXiv preprint arXiv:2301.11435, 2023

work page arXiv 2023
[14]

Neural abstract interpretation

Shaurya Gomber and Gagandeep Singh. Neural abstract interpretation. InICLR 2025 Workshop: VerifAI: AI Verification in the Wild, 2025

work page 2025
[15]

Learning fast and precise numerical analysis

Jingxuan He, Gagandeep Singh, Markus Püschel, and Martin Vechev. Learning fast and precise numerical analysis. InProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 1112–1127, 2020

work page 2020
[16]

SAIL: Sound abstract interpreters with LLMs

Qiuhan Gu, Avaljot Singh, and Gagandeep Singh. SAIL: Sound abstract interpreters with LLMs. Proceedings of the ACM on Programming Languages, 10(PLDI), 2026. To appear

work page 2026
[17]

Fast numerical program analysis with reinforcement learning

Gagandeep Singh, Markus Püschel, and Martin Vechev. Fast numerical program analysis with reinforcement learning. InComputer Aided Verification (CAV), volume 10981 ofLecture Notes in Computer Science, pages 211–229. Springer, 2018. 10

work page 2018
[18]

Learning a strategy for choosing widening thresholds from a large codebase

Sooyoung Cha, Sehun Jeong, and Hakjoo Oh. Learning a strategy for choosing widening thresholds from a large codebase. InProgramming Languages and Systems (APLAS), volume 10017 ofLecture Notes in Computer Science, pages 25–41. Springer, 2016

work page 2016
[19]

BERT: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4171–4186, 2019

work page 2019
[20]

Remasking discrete diffusion models with inference-time scaling

Guanghan Wang, Yair Schiff, Subham Sekhar Sahoo, and V olodymyr Kuleshov. Remasking discrete diffusion models with inference-time scaling. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[21]

Krishnan

Chen-Hao Chao, Wei-Fang Sun, Hanwen Liang, Chun-Yi Lee, and Rahul G. Krishnan. Beyond masked and unmasked: Discrete diffusion models via partial masking. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[22]

Next semantic scale prediction via hierarchical diffusion language models

Cai Zhou, Chenyu Wang, Dinghuai Zhang, Shangyuan Tong, Yifei Wang, Stephen Bates, and Tommi Jaakkola. Next semantic scale prediction via hierarchical diffusion language models. In Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[23]

A computing procedure for quantification theory.Journal of the ACM, 7(3):201–215, 1960

Martin Davis and Hilary Putnam. A computing procedure for quantification theory.Journal of the ACM, 7(3):201–215, 1960

work page 1960
[24]

A machine program for theorem- proving.Communications of the ACM, 5(7):394–397, 1962

Martin Davis, George Logemann, and Donald Loveland. A machine program for theorem- proving.Communications of the ACM, 5(7):394–397, 1962

work page 1962
[25]

Marques-Silva, Inês Lynce, and Sharad Malik

João P. Marques-Silva, Inês Lynce, and Sharad Malik. Conflict-driven clause learning SAT solvers. In Armin Biere, Marijn Heule, Hans van Maaren, and Toby Walsh, editors,Handbook of Satisfiability, volume 185 ofFrontiers in Artificial Intelligence and Applications, pages 131–153. IOS Press, 2009

work page 2009
[26]

A tutorial on stålmarck’s proof procedure for propositional logic

Mary Sheeran and Gunnar Stålmarck. A tutorial on stålmarck’s proof procedure for propositional logic. In Ganesh Gopalakrishnan and Phillip Windley, editors,Formal Methods in Computer- Aided Design, pages 82–99, Berlin, Heidelberg, 1998. Springer Berlin Heidelberg

work page 1998
[27]

Two optimizations on the stålmarck procedure, 2025

Sergei Leonov and Liam Davis. Two optimizations on the stålmarck procedure, 2025

work page 2025
[28]

Abstract conflict driven learning

Vijay D’Silva, Leopold Haller, and Daniel Kroening. Abstract conflict driven learning. InPro- ceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages 143–154. ACM, 2013

work page 2013
[29]

Thakur and Thomas W

Aditya V . Thakur and Thomas W. Reps. A generalization of stålmarck’s method. InStatic Analysis – 19th International Symposium (SAS), volume 7460 ofLecture Notes in Computer Science, pages 334–351. Springer, 2012

work page 2012
[30]

RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

work page 2024
[31]

On-policy distillation of language models: Learning from self- generated mistakes

Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos, Matthieu Geist, and Olivier Bachem. On-policy distillation of language models: Learning from self- generated mistakes. InInternational Conference on Learning Representations (ICLR), 2024

work page 2024
[32]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, 2 edition, 2018

work page 2018
[33]

Codebleu: a method for automatic evaluation of code synthesis, 2020

Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. Codebleu: a method for automatic evaluation of code synthesis, 2020

work page 2020
[34]

Tdoku: A fast Sudoku solver and generator

Tom Dillon. Tdoku: A fast Sudoku solver and generator. Software, GitHub repository https://github.com/t-dillon/tdoku, 2025. Accessed: 2026-04-27

work page 2025
[35]

Recurrent relational networks

Rasmus Berg Palm, Ulrich Paquet, and Ole Winther. Recurrent relational networks. InAdvances in Neural Information Processing Systems (NeurIPS), volume 31, 2018. 11

work page 2018
[36]

Can convolutional neural networks crack Sudoku puzzles? Software, GitHub repositoryhttps://github.com/Kyubyong/sudoku, 2018

Kyubyong Park. Can convolutional neural networks crack Sudoku puzzles? Software, GitHub repositoryhttps://github.com/Kyubyong/sudoku, 2018. Accessed: 2026-05-07

work page 2018
[37]

Introducing claude opus 4.6

Anthropic. Introducing claude opus 4.6. Anthropic News, February 2026. Accessed: 2026-04- 29

work page 2026
[38]

Deepseek-v4 preview release: Ushering in the era of 1m context length

DeepSeek-AI. Deepseek-v4 preview release: Ushering in the era of 1m context length. DeepSeek Official Blog, April 2026. Accessed: 2026-04-29

work page 2026
[39]

Gpt-5.4 thinking system card

OpenAI. Gpt-5.4 thinking system card. Technical report, OpenAI, March 2026

work page 2026
[40]

Haniel Barbosa, Clark W. Barrett, Martin Brain, Gereon Kremer, Hanna Lachnitt, Makai Mann, Abdalrhman Mohamed, Mudathir Mohamed, Aina Niemetz, Andres Nötzli, Alex Ozdemir, Mathias Preiner, Andrew Reynolds, Ying Sheng, Cesare Tinelli, and Yoni Zohar. cvc5: A versatile and industrial-strength SMT solver. In Dana Fisman and Grigore Rosu, editors,Tools and Al...

work page 2022
[41]

On the Measure of Intelligence

François Chollet. On the measure of intelligence.arXiv preprint arXiv:1911.01547, 2019

work page internal anchor Pith review arXiv 1911
[42]

Arc-agi-2: A new challenge for frontier ai reasoning systems, 2025

Francois Chollet, Mike Knoop, Gregory Kamradt, Bryan Landers, and Henry Pinkard. Arc-agi-2: A new challenge for frontier ai reasoning systems, 2025

work page 2025
[43]

Arc-agi-3: A new challenge for frontier agentic intelligence, 2026

ARC Prize Foundation. Arc-agi-3: A new challenge for frontier agentic intelligence, 2026. A Abstract Interpretation for Logical Problems This appendix gives a formal introduction to abstract interpretation. The framework is most often presented in the setting of program analysis, where the concrete domain is the (typically infinite) set of program states;...

work page 2026
[44]

We index the K chains within a slot in ascending order 0,1,

search would have paid, even though inference is run with the parallel slot/chain batch described above. We index the K chains within a slot in ascending order 0,1, . . . , K−1 and admit a fresh puzzle to a slot whenever the current one terminates. For each puzzle, let chain w be the first to reach an accepted solution at round dw. The sequential cost we ...

work page