pith. sign in

arxiv: 2605.20919 · v2 · pith:Z5HIGJQXnew · submitted 2026-05-20 · 💻 cs.LG · cs.AI· cs.PL

Sutra: Tensor-Op RNNs as a Compilation Target for Vector Symbolic Architectures

Pith reviewed 2026-05-25 05:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.PL
keywords vector symbolic architecturestensor compilationfunctional programmingKleene logicfrozen embeddingsrotation bindingPyTorch autogradbundle decoding
0
0 comments X

The pith

Sutra compiles functional programs using rotation binding and Kleene logic directly to PyTorch tensor graphs that decode bundles at 100% accuracy on any of four frozen embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a typed functional language can beta-reduce an entire program, including VSA primitives and control flow, into one fused tensor-op graph over a frozen embedding. This graph performs exact decoding of bundled symbols at width k=8 on three text encoders and one protein model, where the Hadamard product already fails. The same emitted graph admits full PyTorch autograd, so a fuzzy-rule classifier written in the source language trains from random initialization to 100% accuracy. Trained scalar parameters can be written back into the original source as literals, making the result both executable symbolic code and a reproducible neural network.

Core claim

The central claim is that rotation binding, unbind, bundle, tail recursion, and Lagrange-interpolated polynomial approximations of Kleene three-valued connectives all lower to tensor operations while preserving enough information for exact decoding across arbitrary frozen embedding spaces; the identical compiled artifact therefore functions as a logic program on any substrate and as a differentiable neural network under autograd.

What carries the argument

The Sutra compiler that beta-reduces the full program (primitives, control flow, string I/O) to a single fused tensor-op graph, with Kleene connectives realized as exact Lagrange polynomials on the {-1,0,+1} grid.

If this is right

  • The identical source achieves 100% decoding on every tested embedding where the textbook Hadamard product collapses to a few percent.
  • Autograd through the emitted graph trains a five-class fuzzy-rule classifier from 18.7% to 100.0% accuracy.
  • A trained cosine-gain scalar can be written back into the .su source and reproduces the trained logits to ~2e-7 upon recompilation.
  • The same artifact serves simultaneously as a legible logic program and as a trainable neural network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be tested on image or audio encoders to check whether the same compiled program continues to decode correctly without modality-specific changes.
  • Because the source remains human-readable after training, the method offers a route to inspectable models that start as explicit symbolic rules.
  • The polynomial realization of Kleene logic may allow other three-valued or fuzzy logics to be compiled to tensor graphs in the same style.
  • Cross-modal transfer becomes possible: a program written once could be executed on any embedding substrate that supports the required tensor operations.

Load-bearing premise

Lowering rotation binding, unbind, bundle, and the Lagrange-interpolated Kleene connectives to tensor operations preserves enough information for exact decoding across arbitrary frozen embedding spaces without any post-hoc tuning.

What would settle it

Running the compiled k=8 graph on mxbai-embed-large or ESM-2 and obtaining bundle decoding accuracy materially below 100%.

Figures

Figures reproduced from arXiv: 2605.20919 by Emma Leonhart.

Figure 1
Figure 1. Figure 1: Per-tick dataflow of the soft-halt RNN cell. Once [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The K = 3 rule pipeline. The Sutra source above is the literal program the compiler beta-reduces to the tensor-op graph shown; own binds to p1 when computing rule1 (o0, o1 = p2, p3), and to p2, p3 for rule2, rule3 respectively. Solid boxes are PyTorch tensor ops; dashed boxes are learnable prototypes. The AND in the leftmost branch combines cos(x, p1) with the AND-of-NOTs over the other classes; rule2 and … view at source ↗
Figure 3
Figure 3. Figure 3: Five-stage compilation pipeline (§4). Boxes are intermediate artifacts; italic labels are the [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
read the original abstract

Sutra is a typed, purely functional programming language whose compiled forward pass is a PyTorch neural network. The compiler beta-reduces the whole program -- primitives, control flow, string I/O -- to one fused tensor-op graph over a frozen embedding substrate. Rotation binding, unbind, bundle, polynomial Kleene three-valued logic, and tail-recursive loops all lower to tensor operations; the Kleene connectives are Lagrange-interpolated polynomials exact on the {-1, 0, +1} truth grid. Validation is one fact tested two ways. (1) The same program runs on four frozen embeddings spanning two modalities -- three text encoders (nomic-embed-text, all-minilm, mxbai-embed-large) and one protein language model (ESM-2) -- and decodes bundles at 100% accuracy through width k=8 on every substrate, where the textbook Hadamard product has already collapsed (2.5% on mxbai-embed-large, 7.5% on all-minilm). (2) PyTorch autograd flows through the actually compiled graph: a fuzzy-rule classifier written in .su trains from random init (18.7 +/- 9.5%; chance = 20%, five classes) to 100.0 +/- 0.0% (three seeds) by backpropagating through the emitted graph, the symbolic source unmodified. A weighted variant additionally trains a scalar cosine gain and writes it back into the .su source as a numeric literal; recompiling reproduces the trained behaviour to ~2e-7 per logit, so the trained model is itself legible, recompilable code. The same artifact is therefore both a logic program and a trainable neural network.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Sutra, a typed purely functional language whose compiler beta-reduces entire programs (including rotation binding, unbind, bundle, tail recursion, and Lagrange-interpolated polynomial Kleene three-valued logic) to a single fused PyTorch tensor graph over a frozen embedding substrate. It reports two main results: (1) the same program achieves 100% bundle decoding accuracy at width k=8 on four frozen embeddings spanning text (nomic-embed-text, all-minilm, mxbai-embed-large) and protein (ESM-2) modalities, where the Hadamard product baseline collapses to 2.5-7.5%; (2) a fuzzy-rule classifier written in Sutra trains from random initialization (18.7 +/- 9.5%) to 100.0 +/- 0.0% accuracy via PyTorch autograd on the emitted graph, with a weighted variant that writes a trained cosine-gain scalar back into the source as a numeric literal, reproducing the trained behavior to ~2e-7 per logit upon recompilation.

Significance. If the central claims hold, the work supplies a concrete compilation path that turns VSA logic programs into differentiable tensor graphs while preserving the ability to decode bundles exactly and to recompile trained parameters back into legible source; the cross-modality result on four unrelated frozen embeddings and the end-to-end training through the lowered graph are the primary contributions.

major comments (3)
  1. [Abstract] Abstract: the headline claim that the tensor lowering of rotation binding/unbind/bundle plus Lagrange-interpolated Kleene connectives yields information-preserving graphs sufficient for 100% unbinding recovery at k=8 on arbitrary frozen embeddings is load-bearing for both results, yet the abstract supplies neither the explicit tensor definition of the rotation operator nor any argument that its invertibility is independent of the singular-value distribution of the embedding matrix.
  2. [Abstract] Abstract: the reported 100% decoding and 100% training accuracies are presented without error analysis, implementation of the lowering, or verification that the test bundles are not constructed from symbols that happen to be linearly separable in the chosen spaces; this leaves the cross-substrate generality untested.
  3. [Abstract] Abstract: the autograd training result is offered as downstream evidence, but without showing that the emitted graph actually implements the symbolic rotation and polynomial operations (rather than an approximation that happens to train well), the claim that the same artifact is both a logic program and a trainable neural network remains unsubstantiated.
minor comments (2)
  1. The abstract states concrete accuracy figures (100.0 +/- 0.0%, 18.7 +/- 9.5%) but does not report the embedding dimension, the precise definition of bundle width k, or the number of symbols used in the decoding tests.
  2. No comparison is given to other standard VSA binding operations (circular convolution, XOR, etc.) beyond the Hadamard product.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment point by point below, proposing targeted revisions to the abstract where they strengthen the presentation without altering the manuscript's core claims or results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim that the tensor lowering of rotation binding/unbind/bundle plus Lagrange-interpolated Kleene connectives yields information-preserving graphs sufficient for 100% unbinding recovery at k=8 on arbitrary frozen embeddings is load-bearing for both results, yet the abstract supplies neither the explicit tensor definition of the rotation operator nor any argument that its invertibility is independent of the singular-value distribution of the embedding matrix.

    Authors: The abstract reports empirical results on four specific frozen embeddings (three text, one protein) rather than claiming generality to arbitrary embeddings. The explicit tensor definitions for rotation binding, unbind, and bundle appear in Section 3.2; the Lagrange interpolation for Kleene logic is in Section 3.3. The manuscript does not supply a general proof that invertibility holds independently of any embedding's singular-value distribution, as the contribution is the empirical demonstration of 100% decoding on the four tested substrates where Hadamard fails. We will revise the abstract to add a cross-reference to Section 3 and to emphasize the empirical scope. revision: partial

  2. Referee: [Abstract] Abstract: the reported 100% decoding and 100% training accuracies are presented without error analysis, implementation of the lowering, or verification that the test bundles are not constructed from symbols that happen to be linearly separable in the chosen spaces; this leaves the cross-substrate generality untested.

    Authors: The results section already includes error analysis (100.0 +/- 0.0% over three seeds for both decoding and training). The lowering implementation, including beta-reduction of rotation, bundle, and polynomial connectives to fused tensor ops, is described in Sections 4 and 5. Test bundles are formed from randomly selected symbol combinations drawn from the embedding vocabulary; the fact that the identical bundles yield only 2.5-7.5% accuracy under Hadamard on the same substrates indicates the result is not an artifact of linear separability. We will add a short clause to the abstract noting the reported error bars and the random construction of the test bundles. revision: partial

  3. Referee: [Abstract] Abstract: the autograd training result is offered as downstream evidence, but without showing that the emitted graph actually implements the symbolic rotation and polynomial operations (rather than an approximation that happens to train well), the claim that the same artifact is both a logic program and a trainable neural network remains unsubstantiated.

    Authors: The fidelity of the emitted graph to the symbolic operations is verified by the weighted-variant experiment: after training a cosine-gain scalar via autograd, the scalar is written back into the .su source as a numeric literal; recompilation then reproduces the trained logits to ~2e-7 per output. This check confirms that the tensor graph implements the rotation and polynomial connectives with sufficient precision for the original symbolic behavior to be recovered exactly. We will revise the abstract to reference this recompilation verification as evidence that the artifact remains both a logic program and a trainable network. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical validation of lowering is independent of inputs

full rationale

The paper presents a compiler that beta-reduces VSA primitives (rotation binding, unbind, bundle, Lagrange-interpolated Kleene connectives) to fused tensor graphs over frozen embeddings. The central claims are validated by direct execution: identical programs achieve 100% bundle decoding at k=8 across four unrelated frozen encoders (nomic, all-minilm, mxbai, ESM-2) where Hadamard fails, and by standard PyTorch autograd training a fuzzy-rule classifier from random initialization to 100% accuracy on the emitted graph. No equation or result is shown to equal its own fitted inputs by construction; the weighted scalar variant simply records a learned value back into source code, which is a post-hoc legibility step rather than a definitional loop. No self-citations appear as load-bearing premises for the lowering or the cross-substrate result. The derivation chain is therefore a standard compiler lowering plus external empirical test, self-contained against the stated benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; the central claims rest on the unexamined assumption that tensor operations can faithfully realize the listed VSA primitives and that the chosen polynomial interpolation is exact for the intended logic without further constraints.

axioms (1)
  • domain assumption Frozen embedding spaces from different modalities form suitable substrates for exact VSA decoding after the described lowering.
    Invoked by the claim that the same program succeeds on nomic-embed-text, all-minilm, mxbai-embed-large, and ESM-2.

pith-pipeline@v0.9.0 · 5852 in / 1428 out tokens · 41370 ms · 2026-05-25T05:55:31.517784+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

  1. [1]

    • Darwiche, A., & Marquis, P. (2002). A knowledge compilation map.JAIR17:229–264. • Gayler, R. W. (2003). Vector symbolic architectures answer Jackendoff's challenges for cognitive neuroscience.Joint International Conference on Cognitive Science. • Kanerva, P. (2009). Hyperdimensional computing: An introduction to computing in dis- tributed representation...

  2. [2]

    vector" means

    Kluwer Academic. The standard reference for t-norm-based fuzzy logics (Gödel, Łukasiewicz, product) cited in §1.1-1 to place Sutra's polynomial connectives. • Heddes, M., Nunes, I., Vergés, P., Kleyko, D., Abraham, D., Givargis, T., Nicolau, A., & Veidenbaum, A. (2023). Torchhd: An open source python library to support research on hyperdimensional computi...