arxiv: 2604.08816 · v1 · submitted 2026-04-09 · 💻 cs.LG

Recognition: unknown

Loom: A Scalable Analytical Neural Computer Architecture

Mehmet Kerem Turkcan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:56 UTC · model grok-4.3

classification 💻 cs.LG

keywords Loom architectureanalytical transformer weightsneural computerinstruction set emulationprogram executionfixed-cost computationC program compilationstate tensor

0 comments

The pith

A transformer with analytically derived weights implements a full 22-opcode computer that runs any compiled C program when looped on a state tensor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Loom as a computer architecture in which C programs are compiled to a 22-opcode instruction set and executed by repeatedly applying an 8-layer transformer model. The model's weights are calculated directly from the required operations rather than learned from data, and the entire program state, including memory and program counter, resides in a single fixed-size input tensor. Each forward pass through the model advances the program by one instruction, with computation cost remaining constant regardless of program length. Because the weights stay fixed and program-independent, the same model can execute any valid compiled program simply by loading it into the state tensor.

Core claim

The central claim is that the complete semantics of a 22-opcode instruction set can be realized exactly by the weight matrices of an 8-layer transformer, so that iterative application of the model to a state tensor X in R^{d x n} produces the same sequence of state updates that a conventional CPU would perform on the same program.

What carries the argument

The analytically derived weight matrices of the fixed 8-layer transformer, which encode the 22-opcode instruction set and are applied iteratively to advance the program counter and update the state tensor.

If this is right

Execution cost per instruction is constant and independent of program length or history.
The same fixed weights can run any program that fits in the state tensor, because programs live in the data rather than in the model.
Increasing the state-tensor dimensions d and n scales the architecture while leaving the weight matrices unchanged.
Compact configurations (smaller d and n) remain sufficient for non-trivial tasks such as a 9x9 Sudoku solver using only 284 instructions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the approach extends to larger instruction sets, transformers could serve as exact simulators for general-purpose computation without requiring learned control logic.
Fixed analytical weights open the possibility of hybrid systems in which a neural model performs both learned inference and deterministic algorithmic steps in the same forward pass.
Because cost is independent of program length, the architecture could be tested for long-running computations where conventional neural execution would become impractical.

Load-bearing premise

The analytically derived weights correctly realize every opcode's semantics for arbitrary programs without overflow, precision loss, or unhandled edge cases inside the fixed-size state tensor.

What would settle it

Compile a C program containing nested loops, function calls, and array operations to Loom's instruction set, run it through the model until the program counter reaches zero, and check whether the final state tensor matches the output of the same program on a standard C compiler.

read the original abstract

We present Loom, a computer architecture that executes programs compiled from C inside a looped transformer whose weights are derived analytically. The architecture implements a 22-opcode instruction set in 8 transformer layers. Each forward pass executes one instruction; the model is applied iteratively until the program counter reaches zero. The full machine state resides in a single tensor $X \in \mathbb{R}^{d \times n}$ of fixed size, and every step has fixed cost for fixed $d$ and $n$, independent of program length or execution history. The default configuration uses $d = 155$ and $n = 1024$, yielding 4.7 million parameters and 928 instruction slots. A compact configuration at $d = 146$ and $n = 512$ suffices for a 9$\times$9 Sudoku solver (284 instructions). The weights are program-independent: programs live in the state tensor, and the same fixed-weight model executes any compiled program. We make Loom source code publicly available at https://github.com/mkturkcan/Loom.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Loom puts a 22-opcode C ISA into an 8-layer looped transformer with analytically derived fixed weights and constant per-step cost, but the verification is too thin to confirm it works for arbitrary programs.

read the letter

The core idea is a transformer that acts as a fixed-weight computer: each forward pass runs one instruction from a 22-opcode set, the loop continues until the program counter hits zero, and the entire machine state lives in one fixed tensor of size d by n. Weights stay the same no matter what program runs, and cost per step does not grow with program length. That combination of analytical weight derivation, external program state, and iterative execution on a transformer is not something I have seen described before in the cited literature. The public GitHub release and the working 9x9 Sudoku solver in the compact 146 by 512 configuration are concrete positives that let others inspect the actual implementation. The Sudoku case at least shows the architecture can handle a non-trivial compiled program with 284 instructions without training. The design keeps everything program-independent, which is a clean separation worth noting. The main soft spot is the lack of visible derivation or error analysis for how the eight layers exactly realize every opcode. The abstract states the weights come from analytical construction, but without the steps shown it is impossible to check whether arithmetic, memory ops, conditionals, and PC updates map precisely to the matrix multiplies and attention patterns. The fixed tensor size also raises questions about overflow, precision, negative numbers, or division edge cases that could break on some C programs even if the Sudoku example passes. One working instance does not establish generality across arbitrary compiled code. This paper is aimed at researchers in neuro-symbolic systems or anyone exploring neural architectures that embed verifiable symbolic execution. A reader who wants to see a concrete alternative to trained neural computers would get value from the design and the released code. It deserves a serious referee because the idea is distinct enough and the implementation is public, even though the soundness claims need direct checking. I would recommend sending it for review with a request for the full analytical derivations and additional test programs beyond the Sudoku solver.

Referee Report

2 major / 2 minor

Summary. The paper presents Loom, a computer architecture that executes programs compiled from C inside a looped transformer whose weights are derived analytically. It implements a 22-opcode instruction set across 8 transformer layers, with each forward pass executing one instruction on a fixed-size state tensor X ∈ ℝ^{d×n} (default d=155, n=1024). Execution iterates until the program counter reaches zero, with fixed per-step cost independent of program length; weights are program-independent and the same model runs any compiled program. A compact configuration (d=146, n=512) is shown for a 9×9 Sudoku solver using 284 instructions. Source code is released publicly.

Significance. If the analytical weight construction is correct and complete, the result would be significant: it offers a fixed-parameter, fixed-cost neural architecture that exactly emulates a symbolic ISA without training or approximation, with programs residing entirely in the state tensor. Public code availability supports reproducibility and allows direct verification of the claimed analytical derivations.

major comments (2)

[Weight derivation and ISA implementation sections] The central claim that the 8-layer transformer with analytically derived weights exactly implements the full 22-opcode ISA (arithmetic, memory, control flow, PC updates) on the fixed-size state tensor X is load-bearing, yet the manuscript provides no explicit derivation, matrix constructions, attention patterns, or error analysis showing how each opcode is realized without precision loss, overflow, or unhandled cases (e.g., negative values, zero-division). This must be supplied, as any deviation would break exact semantics for arbitrary C programs despite the fixed-cost guarantee.
[Sudoku example and experimental validation] The Sudoku solver demonstration (compact d=146, n=512 configuration, 284 instructions) is presented as validation, but lacks a full execution trace, comparison against a reference interpreter, or coverage of edge cases in the state tensor slots; this is insufficient to confirm correctness of the analytical construction for the complete ISA.

minor comments (2)

[State tensor definition] Clarify the exact allocation of the d-dimensional state slots to registers, memory, flags, and program counter, including how overflow or out-of-bounds accesses are handled analytically.
[Abstract and architecture overview] The abstract states 'the weights are program-independent' and 'programs live in the state tensor'; add a short table or diagram mapping the 22 opcodes to the 8 layers to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and for recognizing the potential significance of Loom if the analytical claims hold. We address each major comment below and will revise the manuscript to strengthen the presentation of the derivations and validation.

read point-by-point responses

Referee: [Weight derivation and ISA implementation sections] The central claim that the 8-layer transformer with analytically derived weights exactly implements the full 22-opcode ISA (arithmetic, memory, control flow, PC updates) on the fixed-size state tensor X is load-bearing, yet the manuscript provides no explicit derivation, matrix constructions, attention patterns, or error analysis showing how each opcode is realized without precision loss, overflow, or unhandled cases (e.g., negative values, zero-division). This must be supplied, as any deviation would break exact semantics for arbitrary C programs despite the fixed-cost guarantee.

Authors: We agree that explicit derivations are necessary to fully substantiate the exact ISA implementation. The manuscript describes the high-level analytical construction of the 8-layer transformer and the fixed state tensor X, with the complete weight matrices and opcode mappings implemented in the publicly released code. To address the concern directly, we will add a dedicated appendix in the revised manuscript containing the explicit matrix constructions, attention patterns, and per-opcode derivations for all 22 instructions, along with an error analysis addressing floating-point precision, potential overflow, and edge cases including negative values and division by zero. This will make the exact semantics verifiable from the paper itself. revision: yes
Referee: [Sudoku example and experimental validation] The Sudoku solver demonstration (compact d=146, n=512 configuration, 284 instructions) is presented as validation, but lacks a full execution trace, comparison against a reference interpreter, or coverage of edge cases in the state tensor slots; this is insufficient to confirm correctness of the analytical construction for the complete ISA.

Authors: We acknowledge that the Sudoku demonstration would benefit from more detailed validation to confirm the analytical construction. The example illustrates that a compact configuration suffices for a non-trivial program, and the released code permits direct execution and inspection. In the revision we will augment the experimental section with a full execution trace (for the Sudoku solver or a reduced test case), side-by-side comparisons against a reference interpreter for selected instructions, and explicit discussion of state-tensor slot allocation and edge-case handling. These additions will provide stronger empirical support without altering the core claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; analytical derivation is self-contained

full rationale

The paper derives the 8-layer transformer weights analytically from the semantics of the 22-opcode instruction set, with programs residing in the fixed-size state tensor X and the same fixed weights executing any compiled C program. No equations or claims reduce by construction to their own inputs (e.g., no fitted parameters renamed as predictions, no self-definitional loops where the result defines the premise, and no load-bearing self-citations for uniqueness theorems or ansatzes). The central claim is an external assertion of exact emulation correctness for arbitrary programs, which stands or falls on verification outside the derivation itself rather than tautological reduction. This is the expected non-circular outcome for an analytical construction.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that a transformer can be analytically configured to simulate exact register-machine semantics; d and n are chosen design parameters rather than fitted values.

free parameters (2)

state dimension d
Chosen as 155 (default) or 146 (compact) to support the instruction set and memory slots.
state width n
Chosen as 1024 (default) or 512 (compact) to provide sufficient instruction slots.

axioms (1)

domain assumption An 8-layer transformer with analytically set weights can implement the full semantics of a 22-opcode instruction set without training.
Invoked as the basis for the looped execution model.

invented entities (1)

Fixed-size state tensor X in R^{d x n} no independent evidence
purpose: Holds the complete machine state (program, registers, memory, program counter) for iterative transformer processing.
New representation introduced to decouple program from model weights.

pith-pipeline@v0.9.0 · 5477 in / 1440 out tokens · 65033 ms · 2026-05-10T16:56:45.944065+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 5 canonical work pages

[1]

Simulation of graph algorithms with looped transformers

Artur Back de Luca and Kimon Fountoulakis. Simulation of graph algorithms with looped transformers. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 2319–2363. PMLR, 2024

2024
[2]

Learning to add, multiply, and execute algorithmic instructions exactly with neural networks

Artur Back de Luca, George Giapitzakis, and Kimon Fountoulakis. Learning to add, multiply, and execute algorithmic instructions exactly with neural networks. InAdvances in Neural Information Processing Systems, 2025. NeurIPS 2025 poster

2025
[3]

Lee, and Dim- itris Papailiopoulos

Angeliki Giannou, Shashank Rajput, Jy-Yong Sohn, Kangwook Lee, Jason D. Lee, and Dim- itris Papailiopoulos. Looped transformers as programmable computers. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pages 11398–11442. PMLR, 2023

2023
[4]

Kakade, Samy Jelassi, and Eran Malach

Kaiying Hou, David Brandfonbrener, Sham M. Kakade, Samy Jelassi, and Eran Malach. Uni- versal length generalization with Turing programs. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 23873–23893. PMLR, 2025

2025
[5]

Neural algorithmic reasoning for hypergraphs with looped transformers.arXiv preprint arXiv:2501.10688, 2025

Zekai Huang, Yingyu Liang, Zhenmei Shi, Zhao Song, and Zhen Zhuang. Neural algorithmic reasoning for hypergraphs with looped transformers.arXiv preprint arXiv:2501.10688, 2025

work page arXiv 2025
[6]

Softmax trans- formers are Turing-complete

Hongjian Jiang, Michael Hahn, Georg Zetzsche, and Anthony Widjaja Lin. Softmax trans- formers are Turing-complete. InInternational Conference on Learning Representations, 2026. ICLR 2026 oral

2026
[7]

Angelo Huang, Samuele Marro, Anthony Cohn, Nigel Shadbolt, and Michael Wooldridge

Emanuele La Malfa, Christoph Weinhuber, Orazio Torre, Fangru Lin, X. Angelo Huang, Samuele Marro, Anthony Cohn, Nigel Shadbolt, and Michael Wooldridge. Code simulation as a proxy for high-order tasks in large language models.arXiv preprint arXiv:2502.03568, 2025

work page arXiv 2025
[8]

Constant bit-size transformers are Turing complete

Qian Li and Yuyi Wang. Constant bit-size transformers are Turing complete. InAdvances in Neural Information Processing Systems, 2025. NeurIPS 2025 poster

2025
[9]

Efficient Turing machine simulation with transformers

Qian Li and Yuyi Wang. Efficient Turing machine simulation with transformers. InInterna- tional Conference on Learning Representations, 2026. ICLR 2026 poster

2026
[10]

Looped ReLU MLPs may be all you need as practical programmable computers

Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, and Yufa Zhou. Looped ReLU MLPs may be all you need as practical programmable computers. InProceedings of The 28th In- ternational Conference on Artificial Intelligence and Statistics, volume 258 ofProceedings of Machine Learning Research, pages 2647–2655. PMLR, 2025. 16

2025
[11]

Tracr: Compiled transformers as a laboratory for interpretability

David Lindner, J´ anos Kram´ ar, Sebastian Farquhar, Matthew Rahtz, Thomas McGrath, and Vladimir Mikulik. Tracr: Compiled transformers as a laboratory for interpretability. InAd- vances in Neural Information Processing Systems, 2023. NeurIPS 2023

2023
[12]

Algorithmic language models with neurally com- piled libraries.arXiv preprint arXiv:2407.04899, 2024

Lucas Saldyt and Subbarao Kambhampati. Algorithmic language models with neurally com- piled libraries.arXiv preprint arXiv:2407.04899, 2024

work page arXiv 2024
[13]

Autoregressive large language models are computationally universal.arXiv preprint arXiv:2410.03170, 2024

Dale Schuurmans, Hanjun Dai, and Francesco Zanini. Autoregressive large language models are computationally universal.arXiv preprint arXiv:2410.03170, 2024

work page arXiv 2024
[14]

Toutanova

Peter Shaw, James Cohan, Jacob Eisenstein, Kenton Lee, Jonathan Berant, and Kristina N. Toutanova. ALTA: Compiler-based analysis of transformers.Transactions on Machine Learn- ing Research, 2025

2025
[15]

Can LLMs be computers? Percepta blog post, 2026

Christos Tzamos. Can LLMs be computers? Percepta blog post, 2026. Published March 11, 2026

2026
[16]

Thinking like transformers

Gail Weiss, Yoav Goldberg, and Eran Yahav. Thinking like transformers. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 11080–11090. PMLR, 2021

2021
[17]

On expressive power of looped transformers: Theoretical analysis and enhancement via timestep encoding

Kevin Xu and Issei Sato. On expressive power of looped transformers: Theoretical analysis and enhancement via timestep encoding. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 69613– 69646. PMLR, 2025

2025
[18]

Transformers are efficient compilers, provably

Xiyu Zhai, Runlong Zhou, Liao Zhang, and Simon Shaolei Du. Transformers are efficient compilers, provably. InConference on Language Modeling, 2025. COLM 2025

2025
[19]

Weights to code: Extracting interpretable algorithms from the discrete transformer.arXiv preprint arXiv:2601.05770, 2026

Yifan Zhang, Wei Bi, Kechi Zhang, Dongming Jin, Jie Fu, and Zhi Jin. Weights to code: Extracting interpretable algorithms from the discrete transformer.arXiv preprint arXiv:2601.05770, 2026. 17

work page arXiv 2026