Transformers are Inherently Succinct

Anthony W. Lin; Pascal Bergstr\"a{\ss}er; Ryan Cotterell

arxiv: 2510.19315 · v3 · pith:2MPMVH6Vnew · submitted 2025-10-22 · 💻 cs.FL · cs.LG· cs.LO

Transformers are Inherently Succinct

Pascal Bergstr\"a{\ss}er , Ryan Cotterell , Anthony W. Lin This is my paper

Pith reviewed 2026-05-21 21:08 UTC · model grok-4.3

classification 💻 cs.FL cs.LGcs.LO

keywords transformerssuccinctnessLTLrecurrent neural networksfinite automataverificationexpressivenessEXPSPACE

0 comments

The pith

Fixed-precision transformers describe languages exponentially more compactly than LTL or RNNs and doubly exponentially more compactly than finite automata.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that fixed-precision transformers can represent certain families of languages with only polynomial size while any equivalent LTL formula or recurrent neural network requires exponential size and any equivalent finite automaton requires double-exponential size. This establishes a clear separation in how compactly each formalism can capture the same set of sequences. The authors also give a constructive upper bound that converts any fixed-precision transformer into an LTL formula using at most an exponential increase in size, which tightens an earlier doubly exponential translation. Because of these succinctness gaps, the paper concludes that fundamental verification questions for transformers, such as emptiness and equivalence, are EXPSPACE-complete.

Core claim

We prove that fixed-precision transformers are remarkably succinct: they can be exponentially more succinct than both linear temporal logic (LTL) and recurrent neural networks, and, by extension, state-space models, and doubly exponentially more succinct than finite automata. In other words, there exist families of languages describable by polynomial-size transformers whose smallest equivalent LTL formula or recurrent neural network is exponentially large, and whose smallest equivalent automaton is doubly exponentially large. We also establish matching upper bounds, showing that any fixed-precision transformer can be converted to an LTL formula with at most an exponential blow-up.

What carries the argument

Exponential succinctness separation between polynomial-size fixed-precision transformers and exponentially larger equivalent LTL formulas or RNNs, realized by specific separating language families together with a size-bounded translation that encodes transformer computation steps into LTL.

Load-bearing premise

The lower-bound separations rely on the existence of particular language families whose minimal descriptions grow exponentially or doubly exponentially outside transformers, while the upper-bound translation assumes an encoding of transformer steps into LTL that stays within exponential size.

What would settle it

An explicit family of languages for which the smallest fixed-precision transformer is super-polynomial in the size of the smallest equivalent LTL formula would falsify the claimed exponential succinctness gap.

read the original abstract

We study succinctness as a measure of the expressive power of transformers. Succinctness -- how compactly a formalism can describe a language relative to other formalisms -- is a classical notion in logic and automata theory. We prove that fixed-precision transformers are remarkably succinct: they can be exponentially more succinct than both linear temporal logic (LTL) and recurrent neural networks, and, by extension, state-space models, and doubly exponentially more succinct than finite automata. In other words, there exist families of languages describable by polynomial-size transformers whose smallest equivalent LTL formula or recurrent neural network is exponentially large, and whose smallest equivalent automaton is doubly exponentially large. We also establish matching upper bounds, showing that any fixed-precision transformer can be converted to an LTL formula with at most an exponential blow-up -- improving a prior doubly exponential translation. As a consequence of this succinctness, we show that basic verification problems for transformers, such as emptiness and equivalence, are provably intractable: specifically, EXPSPACE-complete.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fixed-precision transformers are exponentially more succinct than LTL and RNNs with matching bounds and EXPSPACE-complete verification.

read the letter

The key point is that the paper proves fixed-precision transformers can be exponentially more succinct than LTL and RNNs, and doubly exponentially more succinct than finite automata, with matching upper bounds that improve on prior translations, leading to EXPSPACE-completeness for verification problems like emptiness and equivalence. The new elements are the specific exponential succinctness separations for transformers versus those other models and the tightened upper bound on the LTL translation. The paper does well in providing concrete language families for the lower bounds and showing how the succinctness implies hardness results for verification. This gives a precise comparison of descriptional power that has direct implications for complexity in formal methods for ML. On the soft side, the lower bounds hinge on particular families of languages whose minimal LTL size is exponential; if those families admit smaller LTL formulas when the transformer's computation is considered, the separation might not be as strong. The definition of the fixed-precision model is crucial here, and any ambiguity in how precision is fixed could limit the generality. The stress test concern about the lower-bound families is worth checking against the actual constructions. Overall, this paper is for researchers in automata theory and the formal analysis of neural networks. A reader looking for new succinctness results or complexity bounds in this area would get something out of it. The thinking appears clear and engaged with existing work on expressivity and succinctness. I would recommend putting it through peer review rather than desk rejecting it, as the results are specific and could be verified by experts in the subfield.

Referee Report

2 major / 2 minor

Summary. The paper proves that fixed-precision transformers are exponentially more succinct than LTL and RNNs (and by extension state-space models) and doubly exponentially more succinct than finite automata. It exhibits families of languages recognizable by polynomial-size fixed-precision transformers whose minimal LTL or RNN representations are exponentially larger and whose minimal automata are doubly exponentially larger. It also gives a matching exponential upper bound by translating any fixed-precision transformer into an LTL formula, improving a prior doubly-exponential result, and derives that emptiness and equivalence for such transformers are EXPSPACE-complete.

Significance. If the constructions and translations hold, the result supplies matching succinctness bounds that clarify the expressive power of fixed-precision transformers relative to classical formalisms and yields concrete complexity consequences for verification. The explicit separating language families and the improved upper-bound translation are strengths that could influence subsequent work on neural model expressivity.

major comments (2)

[§4] §4 (lower-bound constructions): the families L_n are asserted to require 2^Ω(n)-size LTL formulas while admitting only poly(n)-size fixed-precision transformers. The proof that the LTL lower bound remains exponential once the particular attention and feed-forward steps of the transformer encoding are taken into account is not fully detailed; if a more compact LTL encoding exists that exploits the fixed-precision arithmetic, the exponential separation collapses. An independent LTL lower-bound argument for these languages (separate from the transformer construction) is needed.
[§5.1] §5.1, translation to LTL: the claimed exponential blow-up assumes a specific encoding of each transformer layer into LTL that preserves the size bound. The handling of the fixed bit-precision in the feed-forward networks and the multi-head attention mechanism could introduce an extra polynomial or exponential factor not bounded in the current argument; the size analysis must be tightened to confirm the exponential (rather than doubly exponential) bound.

minor comments (2)

[Introduction] The extension to state-space models is mentioned only briefly; a short paragraph clarifying the reduction or why the same succinctness bounds apply would improve readability.
[§2] Notation for the fixed-precision parameter (bit width) is introduced without an explicit global definition; adding a single sentence in §2 would eliminate ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable comments on our paper. We address each of the major comments below, providing clarifications and indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§4] §4 (lower-bound constructions): the families L_n are asserted to require 2^Ω(n)-size LTL formulas while admitting only poly(n)-size fixed-precision transformers. The proof that the LTL lower bound remains exponential once the particular attention and feed-forward steps of the transformer encoding are taken into account is not fully detailed; if a more compact LTL encoding exists that exploits the fixed-precision arithmetic, the exponential separation collapses. An independent LTL lower-bound argument for these languages (separate from the transformer construction) is needed.

Authors: We appreciate this observation. The lower bound for LTL is derived from classical results on the succinctness of LTL for languages that require exponential size to express certain counting or periodicity properties, independent of any particular model. The specific languages L_n are chosen such that any LTL formula must be exponentially large, as shown by reduction from known hard instances or using the fact that LTL cannot succinctly express certain regular languages without exponential blowup. The transformer construction does not introduce a way to bypass this because the fixed-precision arithmetic is limited and does not allow encoding the language more compactly in LTL. We will add a dedicated subsection in §4 providing an independent proof of the LTL lower bound for L_n, using standard techniques from automata theory and logic, to make this separation explicit and self-contained. revision: yes
Referee: [§5.1] §5.1, translation to LTL: the claimed exponential blow-up assumes a specific encoding of each transformer layer into LTL that preserves the size bound. The handling of the fixed bit-precision in the feed-forward networks and the multi-head attention mechanism could introduce an extra polynomial or exponential factor not bounded in the current argument; the size analysis must be tightened to confirm the exponential (rather than doubly exponential) bound.

Authors: Thank you for highlighting the need for tighter analysis. In the translation, each layer is encoded by simulating the attention and feed-forward computations using LTL formulas that track the fixed-precision values bit by bit. Since the precision is fixed (constant number of bits), the encoding of each numerical operation introduces only a constant factor per bit, leading to an overall polynomial overhead in the transformer size, which preserves the exponential blow-up from the number of layers or heads. We will revise the size analysis in §5.1 to explicitly bound the contribution of the bit-precision encoding, showing that it does not exceed a polynomial factor and confirming the exponential upper bound. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit constructions and independent reductions

full rationale

The paper derives its succinctness results via explicit lower-bound language families separating fixed-precision transformers from LTL/RNNs/automata (with known exponential LTL lower bounds from automata theory) and a direct upper-bound translation encoding transformer computation into LTL with exponential blow-up. No step reduces a claimed prediction or theorem to a fitted parameter, self-definition, or load-bearing self-citation chain; the prior doubly-exponential translation is improved by a new explicit encoding rather than assumed. The derivation is self-contained against external benchmarks in logic and automata.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The paper relies on standard definitions from automata theory and temporal logic without introducing new free parameters, ad-hoc axioms, or invented entities; all comparisons are derived from explicit size-bounded translations and reductions.

axioms (3)

standard math Standard syntax and semantics of Linear Temporal Logic (LTL).
Invoked when establishing the exponential succinctness gap and the upper-bound translation.
domain assumption Recurrent neural networks and state-space models as language recognizers with size measured by number of states or parameters.
Used for the exponential succinctness comparison.
standard math Finite automata as acceptors whose size is the number of states.
Basis for the doubly exponential succinctness lower bound.

pith-pipeline@v0.9.0 · 5711 in / 1528 out tokens · 86334 ms · 2026-05-21T21:08:52.111075+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove that fixed-precision transformers are remarkably succinct: they can be exponentially more succinct than both linear temporal logic (LTL) and recurrent neural networks... doubly exponentially more succinct than finite automata... non-emptiness problem for UHATs and B-RASP programs is EXPSPACE-complete.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use Unique-Hard Attention Transformers (UHAT)... B-RASP programs... tiling problem reduction... 2^n-tiling problem is EXPSPACE-complete.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.