Sutra: Tensor-Op RNNs as a Compilation Target for Vector Symbolic Architectures
Pith reviewed 2026-05-25 05:55 UTC · model grok-4.3
The pith
Sutra compiles functional programs using rotation binding and Kleene logic directly to PyTorch tensor graphs that decode bundles at 100% accuracy on any of four frozen embeddings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that rotation binding, unbind, bundle, tail recursion, and Lagrange-interpolated polynomial approximations of Kleene three-valued connectives all lower to tensor operations while preserving enough information for exact decoding across arbitrary frozen embedding spaces; the identical compiled artifact therefore functions as a logic program on any substrate and as a differentiable neural network under autograd.
What carries the argument
The Sutra compiler that beta-reduces the full program (primitives, control flow, string I/O) to a single fused tensor-op graph, with Kleene connectives realized as exact Lagrange polynomials on the {-1,0,+1} grid.
If this is right
- The identical source achieves 100% decoding on every tested embedding where the textbook Hadamard product collapses to a few percent.
- Autograd through the emitted graph trains a five-class fuzzy-rule classifier from 18.7% to 100.0% accuracy.
- A trained cosine-gain scalar can be written back into the .su source and reproduces the trained logits to ~2e-7 upon recompilation.
- The same artifact serves simultaneously as a legible logic program and as a trainable neural network.
Where Pith is reading between the lines
- The approach could be tested on image or audio encoders to check whether the same compiled program continues to decode correctly without modality-specific changes.
- Because the source remains human-readable after training, the method offers a route to inspectable models that start as explicit symbolic rules.
- The polynomial realization of Kleene logic may allow other three-valued or fuzzy logics to be compiled to tensor graphs in the same style.
- Cross-modal transfer becomes possible: a program written once could be executed on any embedding substrate that supports the required tensor operations.
Load-bearing premise
Lowering rotation binding, unbind, bundle, and the Lagrange-interpolated Kleene connectives to tensor operations preserves enough information for exact decoding across arbitrary frozen embedding spaces without any post-hoc tuning.
What would settle it
Running the compiled k=8 graph on mxbai-embed-large or ESM-2 and obtaining bundle decoding accuracy materially below 100%.
Figures
read the original abstract
Sutra is a typed, purely functional programming language whose compiled forward pass is a PyTorch neural network. The compiler beta-reduces the whole program -- primitives, control flow, string I/O -- to one fused tensor-op graph over a frozen embedding substrate. Rotation binding, unbind, bundle, polynomial Kleene three-valued logic, and tail-recursive loops all lower to tensor operations; the Kleene connectives are Lagrange-interpolated polynomials exact on the {-1, 0, +1} truth grid. Validation is one fact tested two ways. (1) The same program runs on four frozen embeddings spanning two modalities -- three text encoders (nomic-embed-text, all-minilm, mxbai-embed-large) and one protein language model (ESM-2) -- and decodes bundles at 100% accuracy through width k=8 on every substrate, where the textbook Hadamard product has already collapsed (2.5% on mxbai-embed-large, 7.5% on all-minilm). (2) PyTorch autograd flows through the actually compiled graph: a fuzzy-rule classifier written in .su trains from random init (18.7 +/- 9.5%; chance = 20%, five classes) to 100.0 +/- 0.0% (three seeds) by backpropagating through the emitted graph, the symbolic source unmodified. A weighted variant additionally trains a scalar cosine gain and writes it back into the .su source as a numeric literal; recompiling reproduces the trained behaviour to ~2e-7 per logit, so the trained model is itself legible, recompilable code. The same artifact is therefore both a logic program and a trainable neural network.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Sutra, a typed purely functional language whose compiler beta-reduces entire programs (including rotation binding, unbind, bundle, tail recursion, and Lagrange-interpolated polynomial Kleene three-valued logic) to a single fused PyTorch tensor graph over a frozen embedding substrate. It reports two main results: (1) the same program achieves 100% bundle decoding accuracy at width k=8 on four frozen embeddings spanning text (nomic-embed-text, all-minilm, mxbai-embed-large) and protein (ESM-2) modalities, where the Hadamard product baseline collapses to 2.5-7.5%; (2) a fuzzy-rule classifier written in Sutra trains from random initialization (18.7 +/- 9.5%) to 100.0 +/- 0.0% accuracy via PyTorch autograd on the emitted graph, with a weighted variant that writes a trained cosine-gain scalar back into the source as a numeric literal, reproducing the trained behavior to ~2e-7 per logit upon recompilation.
Significance. If the central claims hold, the work supplies a concrete compilation path that turns VSA logic programs into differentiable tensor graphs while preserving the ability to decode bundles exactly and to recompile trained parameters back into legible source; the cross-modality result on four unrelated frozen embeddings and the end-to-end training through the lowered graph are the primary contributions.
major comments (3)
- [Abstract] Abstract: the headline claim that the tensor lowering of rotation binding/unbind/bundle plus Lagrange-interpolated Kleene connectives yields information-preserving graphs sufficient for 100% unbinding recovery at k=8 on arbitrary frozen embeddings is load-bearing for both results, yet the abstract supplies neither the explicit tensor definition of the rotation operator nor any argument that its invertibility is independent of the singular-value distribution of the embedding matrix.
- [Abstract] Abstract: the reported 100% decoding and 100% training accuracies are presented without error analysis, implementation of the lowering, or verification that the test bundles are not constructed from symbols that happen to be linearly separable in the chosen spaces; this leaves the cross-substrate generality untested.
- [Abstract] Abstract: the autograd training result is offered as downstream evidence, but without showing that the emitted graph actually implements the symbolic rotation and polynomial operations (rather than an approximation that happens to train well), the claim that the same artifact is both a logic program and a trainable neural network remains unsubstantiated.
minor comments (2)
- The abstract states concrete accuracy figures (100.0 +/- 0.0%, 18.7 +/- 9.5%) but does not report the embedding dimension, the precise definition of bundle width k, or the number of symbols used in the decoding tests.
- No comparison is given to other standard VSA binding operations (circular convolution, XOR, etc.) beyond the Hadamard product.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major comment point by point below, proposing targeted revisions to the abstract where they strengthen the presentation without altering the manuscript's core claims or results.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline claim that the tensor lowering of rotation binding/unbind/bundle plus Lagrange-interpolated Kleene connectives yields information-preserving graphs sufficient for 100% unbinding recovery at k=8 on arbitrary frozen embeddings is load-bearing for both results, yet the abstract supplies neither the explicit tensor definition of the rotation operator nor any argument that its invertibility is independent of the singular-value distribution of the embedding matrix.
Authors: The abstract reports empirical results on four specific frozen embeddings (three text, one protein) rather than claiming generality to arbitrary embeddings. The explicit tensor definitions for rotation binding, unbind, and bundle appear in Section 3.2; the Lagrange interpolation for Kleene logic is in Section 3.3. The manuscript does not supply a general proof that invertibility holds independently of any embedding's singular-value distribution, as the contribution is the empirical demonstration of 100% decoding on the four tested substrates where Hadamard fails. We will revise the abstract to add a cross-reference to Section 3 and to emphasize the empirical scope. revision: partial
-
Referee: [Abstract] Abstract: the reported 100% decoding and 100% training accuracies are presented without error analysis, implementation of the lowering, or verification that the test bundles are not constructed from symbols that happen to be linearly separable in the chosen spaces; this leaves the cross-substrate generality untested.
Authors: The results section already includes error analysis (100.0 +/- 0.0% over three seeds for both decoding and training). The lowering implementation, including beta-reduction of rotation, bundle, and polynomial connectives to fused tensor ops, is described in Sections 4 and 5. Test bundles are formed from randomly selected symbol combinations drawn from the embedding vocabulary; the fact that the identical bundles yield only 2.5-7.5% accuracy under Hadamard on the same substrates indicates the result is not an artifact of linear separability. We will add a short clause to the abstract noting the reported error bars and the random construction of the test bundles. revision: partial
-
Referee: [Abstract] Abstract: the autograd training result is offered as downstream evidence, but without showing that the emitted graph actually implements the symbolic rotation and polynomial operations (rather than an approximation that happens to train well), the claim that the same artifact is both a logic program and a trainable neural network remains unsubstantiated.
Authors: The fidelity of the emitted graph to the symbolic operations is verified by the weighted-variant experiment: after training a cosine-gain scalar via autograd, the scalar is written back into the .su source as a numeric literal; recompilation then reproduces the trained logits to ~2e-7 per output. This check confirms that the tensor graph implements the rotation and polynomial connectives with sufficient precision for the original symbolic behavior to be recovered exactly. We will revise the abstract to reference this recompilation verification as evidence that the artifact remains both a logic program and a trainable network. revision: partial
Circularity Check
No circularity: empirical validation of lowering is independent of inputs
full rationale
The paper presents a compiler that beta-reduces VSA primitives (rotation binding, unbind, bundle, Lagrange-interpolated Kleene connectives) to fused tensor graphs over frozen embeddings. The central claims are validated by direct execution: identical programs achieve 100% bundle decoding at k=8 across four unrelated frozen encoders (nomic, all-minilm, mxbai, ESM-2) where Hadamard fails, and by standard PyTorch autograd training a fuzzy-rule classifier from random initialization to 100% accuracy on the emitted graph. No equation or result is shown to equal its own fitted inputs by construction; the weighted scalar variant simply records a learned value back into source code, which is a post-hoc legibility step rather than a definitional loop. No self-citations appear as load-bearing premises for the lowering or the cross-substrate result. The derivation chain is therefore a standard compiler lowering plus external empirical test, self-contained against the stated benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Frozen embedding spaces from different modalities form suitable substrates for exact VSA decoding after the described lowering.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Rotation binding... Haar-orthogonal rotations... Lagrange-interpolated polynomials exact on the {-1,0,+1} truth grid... soft-halt RNN cell
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
same program runs on four frozen embeddings... decodes bundles at 100% accuracy through width k=8
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat.induction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Kleene connectives... AND = ½(a+b+ab−a²−b²+a²b²)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
• Darwiche, A., & Marquis, P. (2002). A knowledge compilation map.JAIR17:229–264. • Gayler, R. W. (2003). Vector symbolic architectures answer Jackendoff's challenges for cognitive neuroscience.Joint International Conference on Cognitive Science. • Kanerva, P. (2009). Hyperdimensional computing: An introduction to computing in dis- tributed representation...
work page 2002
-
[2]
Kluwer Academic. The standard reference for t-norm-based fuzzy logics (Gödel, Łukasiewicz, product) cited in §1.1-1 to place Sutra's polynomial connectives. • Heddes, M., Nunes, I., Vergés, P., Kleyko, D., Abraham, D., Givargis, T., Nicolau, A., & Veidenbaum, A. (2023). Torchhd: An open source python library to support research on hyperdimensional computi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.