Sutra: Tensor-Op RNNs as a Compilation Target for Vector Symbolic Architectures

Emma Leonhart

arxiv: 2605.20919 · v2 · pith:Z5HIGJQXnew · submitted 2026-05-20 · 💻 cs.LG · cs.AI· cs.PL

Sutra: Tensor-Op RNNs as a Compilation Target for Vector Symbolic Architectures

Emma Leonhart This is my paper

Pith reviewed 2026-05-25 05:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.PL

keywords vector symbolic architecturestensor compilationfunctional programmingKleene logicfrozen embeddingsrotation bindingPyTorch autogradbundle decoding

0 comments

The pith

Sutra compiles functional programs using rotation binding and Kleene logic directly to PyTorch tensor graphs that decode bundles at 100% accuracy on any of four frozen embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a typed functional language can beta-reduce an entire program, including VSA primitives and control flow, into one fused tensor-op graph over a frozen embedding. This graph performs exact decoding of bundled symbols at width k=8 on three text encoders and one protein model, where the Hadamard product already fails. The same emitted graph admits full PyTorch autograd, so a fuzzy-rule classifier written in the source language trains from random initialization to 100% accuracy. Trained scalar parameters can be written back into the original source as literals, making the result both executable symbolic code and a reproducible neural network.

Core claim

The central claim is that rotation binding, unbind, bundle, tail recursion, and Lagrange-interpolated polynomial approximations of Kleene three-valued connectives all lower to tensor operations while preserving enough information for exact decoding across arbitrary frozen embedding spaces; the identical compiled artifact therefore functions as a logic program on any substrate and as a differentiable neural network under autograd.

What carries the argument

The Sutra compiler that beta-reduces the full program (primitives, control flow, string I/O) to a single fused tensor-op graph, with Kleene connectives realized as exact Lagrange polynomials on the {-1,0,+1} grid.

If this is right

The identical source achieves 100% decoding on every tested embedding where the textbook Hadamard product collapses to a few percent.
Autograd through the emitted graph trains a five-class fuzzy-rule classifier from 18.7% to 100.0% accuracy.
A trained cosine-gain scalar can be written back into the .su source and reproduces the trained logits to ~2e-7 upon recompilation.
The same artifact serves simultaneously as a legible logic program and as a trainable neural network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on image or audio encoders to check whether the same compiled program continues to decode correctly without modality-specific changes.
Because the source remains human-readable after training, the method offers a route to inspectable models that start as explicit symbolic rules.
The polynomial realization of Kleene logic may allow other three-valued or fuzzy logics to be compiled to tensor graphs in the same style.
Cross-modal transfer becomes possible: a program written once could be executed on any embedding substrate that supports the required tensor operations.

Load-bearing premise

Lowering rotation binding, unbind, bundle, and the Lagrange-interpolated Kleene connectives to tensor operations preserves enough information for exact decoding across arbitrary frozen embedding spaces without any post-hoc tuning.

What would settle it

Running the compiled k=8 graph on mxbai-embed-large or ESM-2 and obtaining bundle decoding accuracy materially below 100%.

Figures

Figures reproduced from arXiv: 2605.20919 by Emma Leonhart.

**Figure 2.** Figure 2: The K = 3 rule pipeline. The Sutra source above is the literal program the compiler beta-reduces to the tensor-op graph shown; own binds to p1 when computing rule1 (o0, o1 = p2, p3), and to p2, p3 for rule2, rule3 respectively. Solid boxes are PyTorch tensor ops; dashed boxes are learnable prototypes. The AND in the leftmost branch combines cos(x, p1) with the AND-of-NOTs over the other classes; rule2 and … view at source ↗

**Figure 3.** Figure 3: Five-stage compilation pipeline (§4). Boxes are intermediate artifacts; italic labels are the [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

read the original abstract

Sutra is a typed, purely functional programming language whose compiled forward pass is a PyTorch neural network. The compiler beta-reduces the whole program -- primitives, control flow, string I/O -- to one fused tensor-op graph over a frozen embedding substrate. Rotation binding, unbind, bundle, polynomial Kleene three-valued logic, and tail-recursive loops all lower to tensor operations; the Kleene connectives are Lagrange-interpolated polynomials exact on the {-1, 0, +1} truth grid. Validation is one fact tested two ways. (1) The same program runs on four frozen embeddings spanning two modalities -- three text encoders (nomic-embed-text, all-minilm, mxbai-embed-large) and one protein language model (ESM-2) -- and decodes bundles at 100% accuracy through width k=8 on every substrate, where the textbook Hadamard product has already collapsed (2.5% on mxbai-embed-large, 7.5% on all-minilm). (2) PyTorch autograd flows through the actually compiled graph: a fuzzy-rule classifier written in .su trains from random init (18.7 +/- 9.5%; chance = 20%, five classes) to 100.0 +/- 0.0% (three seeds) by backpropagating through the emitted graph, the symbolic source unmodified. A weighted variant additionally trains a scalar cosine gain and writes it back into the .su source as a numeric literal; recompiling reproduces the trained behaviour to ~2e-7 per logit, so the trained model is itself legible, recompilable code. The same artifact is therefore both a logic program and a trainable neural network.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Sutra compiles a VSA language to a single differentiable tensor graph that runs across unrelated frozen embeddings with 100% bundle recovery where Hadamard fails, but the lowering details and faithfulness checks are still missing from the abstract.

read the letter

The paper's core contribution is a compiler that beta-reduces an entire typed functional program—primitives, control flow, string I/O, tail recursion—into one fused PyTorch tensor graph. Rotation binding, unbind, bundle, and Lagrange-interpolated Kleene three-valued logic all get explicit lowering rules to tensor operations. The same source then executes on four frozen embeddings (three text, one protein) and recovers bundles at 100% accuracy for width k=8, while the textbook Hadamard product drops to 2.5–7.5%. Autograd also flows through the emitted graph, training a fuzzy-rule classifier from chance-level to 100% and, in one variant, writing a learned scalar back into the source as a literal so the trained model remains recompilable code. That combination of cross-substrate robustness and end-to-end differentiability is what is actually new here. The Lagrange interpolation for the logic connectives is a clean way to keep everything exact on the {-1,0,1} grid while staying differentiable. The fact that the trained weights can be emitted as source literals is a practical plus for interpretability. The soft spots are exactly where the abstract is thin. It states the accuracy numbers but gives no implementation of the lowering, no error analysis on the polynomial approximation, no ablation on the rotation matrices, and no check that the tensor graph is information-preserving for arbitrary embeddings rather than the four tested ones. The stress-test concern about singular-value dependence or specially separable test bundles therefore lands as a real open question until the methods section is read. No baselines against other VSA toolkits or other differentiable logic encodings appear either. This is for people working at the symbolic-neural boundary who want a concrete compilation path rather than another theoretical framing. A reader who needs working code that stays legible after training will find the artifact useful. It deserves a serious referee because the empirical claims are specific and the PyTorch integration makes them testable, even if the current write-up leaves the lowering soundness to be verified.

Referee Report

3 major / 2 minor

Summary. The paper introduces Sutra, a typed purely functional language whose compiler beta-reduces entire programs (including rotation binding, unbind, bundle, tail recursion, and Lagrange-interpolated polynomial Kleene three-valued logic) to a single fused PyTorch tensor graph over a frozen embedding substrate. It reports two main results: (1) the same program achieves 100% bundle decoding accuracy at width k=8 on four frozen embeddings spanning text (nomic-embed-text, all-minilm, mxbai-embed-large) and protein (ESM-2) modalities, where the Hadamard product baseline collapses to 2.5-7.5%; (2) a fuzzy-rule classifier written in Sutra trains from random initialization (18.7 +/- 9.5%) to 100.0 +/- 0.0% accuracy via PyTorch autograd on the emitted graph, with a weighted variant that writes a trained cosine-gain scalar back into the source as a numeric literal, reproducing the trained behavior to ~2e-7 per logit upon recompilation.

Significance. If the central claims hold, the work supplies a concrete compilation path that turns VSA logic programs into differentiable tensor graphs while preserving the ability to decode bundles exactly and to recompile trained parameters back into legible source; the cross-modality result on four unrelated frozen embeddings and the end-to-end training through the lowered graph are the primary contributions.

major comments (3)

[Abstract] Abstract: the headline claim that the tensor lowering of rotation binding/unbind/bundle plus Lagrange-interpolated Kleene connectives yields information-preserving graphs sufficient for 100% unbinding recovery at k=8 on arbitrary frozen embeddings is load-bearing for both results, yet the abstract supplies neither the explicit tensor definition of the rotation operator nor any argument that its invertibility is independent of the singular-value distribution of the embedding matrix.
[Abstract] Abstract: the reported 100% decoding and 100% training accuracies are presented without error analysis, implementation of the lowering, or verification that the test bundles are not constructed from symbols that happen to be linearly separable in the chosen spaces; this leaves the cross-substrate generality untested.
[Abstract] Abstract: the autograd training result is offered as downstream evidence, but without showing that the emitted graph actually implements the symbolic rotation and polynomial operations (rather than an approximation that happens to train well), the claim that the same artifact is both a logic program and a trainable neural network remains unsubstantiated.

minor comments (2)

The abstract states concrete accuracy figures (100.0 +/- 0.0%, 18.7 +/- 9.5%) but does not report the embedding dimension, the precise definition of bundle width k, or the number of symbols used in the decoding tests.
No comparison is given to other standard VSA binding operations (circular convolution, XOR, etc.) beyond the Hadamard product.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment point by point below, proposing targeted revisions to the abstract where they strengthen the presentation without altering the manuscript's core claims or results.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim that the tensor lowering of rotation binding/unbind/bundle plus Lagrange-interpolated Kleene connectives yields information-preserving graphs sufficient for 100% unbinding recovery at k=8 on arbitrary frozen embeddings is load-bearing for both results, yet the abstract supplies neither the explicit tensor definition of the rotation operator nor any argument that its invertibility is independent of the singular-value distribution of the embedding matrix.

Authors: The abstract reports empirical results on four specific frozen embeddings (three text, one protein) rather than claiming generality to arbitrary embeddings. The explicit tensor definitions for rotation binding, unbind, and bundle appear in Section 3.2; the Lagrange interpolation for Kleene logic is in Section 3.3. The manuscript does not supply a general proof that invertibility holds independently of any embedding's singular-value distribution, as the contribution is the empirical demonstration of 100% decoding on the four tested substrates where Hadamard fails. We will revise the abstract to add a cross-reference to Section 3 and to emphasize the empirical scope. revision: partial
Referee: [Abstract] Abstract: the reported 100% decoding and 100% training accuracies are presented without error analysis, implementation of the lowering, or verification that the test bundles are not constructed from symbols that happen to be linearly separable in the chosen spaces; this leaves the cross-substrate generality untested.

Authors: The results section already includes error analysis (100.0 +/- 0.0% over three seeds for both decoding and training). The lowering implementation, including beta-reduction of rotation, bundle, and polynomial connectives to fused tensor ops, is described in Sections 4 and 5. Test bundles are formed from randomly selected symbol combinations drawn from the embedding vocabulary; the fact that the identical bundles yield only 2.5-7.5% accuracy under Hadamard on the same substrates indicates the result is not an artifact of linear separability. We will add a short clause to the abstract noting the reported error bars and the random construction of the test bundles. revision: partial
Referee: [Abstract] Abstract: the autograd training result is offered as downstream evidence, but without showing that the emitted graph actually implements the symbolic rotation and polynomial operations (rather than an approximation that happens to train well), the claim that the same artifact is both a logic program and a trainable neural network remains unsubstantiated.

Authors: The fidelity of the emitted graph to the symbolic operations is verified by the weighted-variant experiment: after training a cosine-gain scalar via autograd, the scalar is written back into the .su source as a numeric literal; recompilation then reproduces the trained logits to ~2e-7 per output. This check confirms that the tensor graph implements the rotation and polynomial connectives with sufficient precision for the original symbolic behavior to be recovered exactly. We will revise the abstract to reference this recompilation verification as evidence that the artifact remains both a logic program and a trainable network. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical validation of lowering is independent of inputs

full rationale

The paper presents a compiler that beta-reduces VSA primitives (rotation binding, unbind, bundle, Lagrange-interpolated Kleene connectives) to fused tensor graphs over frozen embeddings. The central claims are validated by direct execution: identical programs achieve 100% bundle decoding at k=8 across four unrelated frozen encoders (nomic, all-minilm, mxbai, ESM-2) where Hadamard fails, and by standard PyTorch autograd training a fuzzy-rule classifier from random initialization to 100% accuracy on the emitted graph. No equation or result is shown to equal its own fitted inputs by construction; the weighted scalar variant simply records a learned value back into source code, which is a post-hoc legibility step rather than a definitional loop. No self-citations appear as load-bearing premises for the lowering or the cross-substrate result. The derivation chain is therefore a standard compiler lowering plus external empirical test, self-contained against the stated benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; the central claims rest on the unexamined assumption that tensor operations can faithfully realize the listed VSA primitives and that the chosen polynomial interpolation is exact for the intended logic without further constraints.

axioms (1)

domain assumption Frozen embedding spaces from different modalities form suitable substrates for exact VSA decoding after the described lowering.
Invoked by the claim that the same program succeeds on nomic-embed-text, all-minilm, mxbai-embed-large, and ESM-2.

pith-pipeline@v0.9.0 · 5852 in / 1428 out tokens · 41370 ms · 2026-05-25T05:55:31.517784+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Rotation binding... Haar-orthogonal rotations... Lagrange-interpolated polynomials exact on the {-1,0,+1} truth grid... soft-halt RNN cell
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

same program runs on four frozen embeddings... decodes bundles at 100% accuracy through width k=8
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat.induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Kleene connectives... AND = ½(a+b+ab−a²−b²+a²b²)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

• Darwiche, A., & Marquis, P. (2002). A knowledge compilation map.JAIR17:229–264. • Gayler, R. W. (2003). Vector symbolic architectures answer Jackendoff's challenges for cognitive neuroscience.Joint International Conference on Cognitive Science. • Kanerva, P. (2009). Hyperdimensional computing: An introduction to computing in dis- tributed representation...

work page 2002
[2]

vector" means

Kluwer Academic. The standard reference for t-norm-based fuzzy logics (Gödel, Łukasiewicz, product) cited in §1.1-1 to place Sutra's polynomial connectives. • Heddes, M., Nunes, I., Vergés, P., Kleyko, D., Abraham, D., Givargis, T., Nicolau, A., & Veidenbaum, A. (2023). Torchhd: An open source python library to support research on hyperdimensional computi...

work page arXiv 2023

[1] [1]

• Darwiche, A., & Marquis, P. (2002). A knowledge compilation map.JAIR17:229–264. • Gayler, R. W. (2003). Vector symbolic architectures answer Jackendoff's challenges for cognitive neuroscience.Joint International Conference on Cognitive Science. • Kanerva, P. (2009). Hyperdimensional computing: An introduction to computing in dis- tributed representation...

work page 2002

[2] [2]

vector" means

Kluwer Academic. The standard reference for t-norm-based fuzzy logics (Gödel, Łukasiewicz, product) cited in §1.1-1 to place Sutra's polynomial connectives. • Heddes, M., Nunes, I., Vergés, P., Kleyko, D., Abraham, D., Givargis, T., Nicolau, A., & Veidenbaum, A. (2023). Torchhd: An open source python library to support research on hyperdimensional computi...

work page arXiv 2023