Transformers are Inherently Succinct
Pith reviewed 2026-05-21 21:08 UTC · model grok-4.3
The pith
Fixed-precision transformers describe languages exponentially more compactly than LTL or RNNs and doubly exponentially more compactly than finite automata.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove that fixed-precision transformers are remarkably succinct: they can be exponentially more succinct than both linear temporal logic (LTL) and recurrent neural networks, and, by extension, state-space models, and doubly exponentially more succinct than finite automata. In other words, there exist families of languages describable by polynomial-size transformers whose smallest equivalent LTL formula or recurrent neural network is exponentially large, and whose smallest equivalent automaton is doubly exponentially large. We also establish matching upper bounds, showing that any fixed-precision transformer can be converted to an LTL formula with at most an exponential blow-up.
What carries the argument
Exponential succinctness separation between polynomial-size fixed-precision transformers and exponentially larger equivalent LTL formulas or RNNs, realized by specific separating language families together with a size-bounded translation that encodes transformer computation steps into LTL.
Load-bearing premise
The lower-bound separations rely on the existence of particular language families whose minimal descriptions grow exponentially or doubly exponentially outside transformers, while the upper-bound translation assumes an encoding of transformer steps into LTL that stays within exponential size.
What would settle it
An explicit family of languages for which the smallest fixed-precision transformer is super-polynomial in the size of the smallest equivalent LTL formula would falsify the claimed exponential succinctness gap.
read the original abstract
We study succinctness as a measure of the expressive power of transformers. Succinctness -- how compactly a formalism can describe a language relative to other formalisms -- is a classical notion in logic and automata theory. We prove that fixed-precision transformers are remarkably succinct: they can be exponentially more succinct than both linear temporal logic (LTL) and recurrent neural networks, and, by extension, state-space models, and doubly exponentially more succinct than finite automata. In other words, there exist families of languages describable by polynomial-size transformers whose smallest equivalent LTL formula or recurrent neural network is exponentially large, and whose smallest equivalent automaton is doubly exponentially large. We also establish matching upper bounds, showing that any fixed-precision transformer can be converted to an LTL formula with at most an exponential blow-up -- improving a prior doubly exponential translation. As a consequence of this succinctness, we show that basic verification problems for transformers, such as emptiness and equivalence, are provably intractable: specifically, EXPSPACE-complete.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proves that fixed-precision transformers are exponentially more succinct than LTL and RNNs (and by extension state-space models) and doubly exponentially more succinct than finite automata. It exhibits families of languages recognizable by polynomial-size fixed-precision transformers whose minimal LTL or RNN representations are exponentially larger and whose minimal automata are doubly exponentially larger. It also gives a matching exponential upper bound by translating any fixed-precision transformer into an LTL formula, improving a prior doubly-exponential result, and derives that emptiness and equivalence for such transformers are EXPSPACE-complete.
Significance. If the constructions and translations hold, the result supplies matching succinctness bounds that clarify the expressive power of fixed-precision transformers relative to classical formalisms and yields concrete complexity consequences for verification. The explicit separating language families and the improved upper-bound translation are strengths that could influence subsequent work on neural model expressivity.
major comments (2)
- [§4] §4 (lower-bound constructions): the families L_n are asserted to require 2^Ω(n)-size LTL formulas while admitting only poly(n)-size fixed-precision transformers. The proof that the LTL lower bound remains exponential once the particular attention and feed-forward steps of the transformer encoding are taken into account is not fully detailed; if a more compact LTL encoding exists that exploits the fixed-precision arithmetic, the exponential separation collapses. An independent LTL lower-bound argument for these languages (separate from the transformer construction) is needed.
- [§5.1] §5.1, translation to LTL: the claimed exponential blow-up assumes a specific encoding of each transformer layer into LTL that preserves the size bound. The handling of the fixed bit-precision in the feed-forward networks and the multi-head attention mechanism could introduce an extra polynomial or exponential factor not bounded in the current argument; the size analysis must be tightened to confirm the exponential (rather than doubly exponential) bound.
minor comments (2)
- [Introduction] The extension to state-space models is mentioned only briefly; a short paragraph clarifying the reduction or why the same succinctness bounds apply would improve readability.
- [§2] Notation for the fixed-precision parameter (bit width) is introduced without an explicit global definition; adding a single sentence in §2 would eliminate ambiguity.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable comments on our paper. We address each of the major comments below, providing clarifications and indicating the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§4] §4 (lower-bound constructions): the families L_n are asserted to require 2^Ω(n)-size LTL formulas while admitting only poly(n)-size fixed-precision transformers. The proof that the LTL lower bound remains exponential once the particular attention and feed-forward steps of the transformer encoding are taken into account is not fully detailed; if a more compact LTL encoding exists that exploits the fixed-precision arithmetic, the exponential separation collapses. An independent LTL lower-bound argument for these languages (separate from the transformer construction) is needed.
Authors: We appreciate this observation. The lower bound for LTL is derived from classical results on the succinctness of LTL for languages that require exponential size to express certain counting or periodicity properties, independent of any particular model. The specific languages L_n are chosen such that any LTL formula must be exponentially large, as shown by reduction from known hard instances or using the fact that LTL cannot succinctly express certain regular languages without exponential blowup. The transformer construction does not introduce a way to bypass this because the fixed-precision arithmetic is limited and does not allow encoding the language more compactly in LTL. We will add a dedicated subsection in §4 providing an independent proof of the LTL lower bound for L_n, using standard techniques from automata theory and logic, to make this separation explicit and self-contained. revision: yes
-
Referee: [§5.1] §5.1, translation to LTL: the claimed exponential blow-up assumes a specific encoding of each transformer layer into LTL that preserves the size bound. The handling of the fixed bit-precision in the feed-forward networks and the multi-head attention mechanism could introduce an extra polynomial or exponential factor not bounded in the current argument; the size analysis must be tightened to confirm the exponential (rather than doubly exponential) bound.
Authors: Thank you for highlighting the need for tighter analysis. In the translation, each layer is encoded by simulating the attention and feed-forward computations using LTL formulas that track the fixed-precision values bit by bit. Since the precision is fixed (constant number of bits), the encoding of each numerical operation introduces only a constant factor per bit, leading to an overall polynomial overhead in the transformer size, which preserves the exponential blow-up from the number of layers or heads. We will revise the size analysis in §5.1 to explicitly bound the contribution of the bit-precision encoding, showing that it does not exceed a polynomial factor and confirming the exponential upper bound. revision: yes
Circularity Check
No circularity: explicit constructions and independent reductions
full rationale
The paper derives its succinctness results via explicit lower-bound language families separating fixed-precision transformers from LTL/RNNs/automata (with known exponential LTL lower bounds from automata theory) and a direct upper-bound translation encoding transformer computation into LTL with exponential blow-up. No step reduces a claimed prediction or theorem to a fitted parameter, self-definition, or load-bearing self-citation chain; the prior doubly-exponential translation is improved by a new explicit encoding rather than assumed. The derivation is self-contained against external benchmarks in logic and automata.
Axiom & Free-Parameter Ledger
axioms (3)
- standard math Standard syntax and semantics of Linear Temporal Logic (LTL).
- domain assumption Recurrent neural networks and state-space models as language recognizers with size measured by number of states or parameters.
- standard math Finite automata as acceptors whose size is the number of states.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We prove that fixed-precision transformers are remarkably succinct: they can be exponentially more succinct than both linear temporal logic (LTL) and recurrent neural networks... doubly exponentially more succinct than finite automata... non-emptiness problem for UHATs and B-RASP programs is EXPSPACE-complete.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We use Unique-Hard Attention Transformers (UHAT)... B-RASP programs... tiling problem reduction... 2^n-tiling problem is EXPSPACE-complete.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.