CoT transformers simulate any Word RAM algorithm with poly-logarithmic overhead in three architectures, improving on quadratic TM overhead.
Unique Hard Attention: A Tale of Two Sides , url =
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 4years
2026 4representative citing papers
Sliding-window transformers without positional encodings are Turing complete because the sliding window breaks permutation symmetry and suffices to simulate Post machines via a constant-size histogram state.
Low-precision softmax transformers with chain-of-thought simulate Turing machines at logarithmic depth and width; summarized CoT improves this to logarithmic space scaling.
LOTUS uses a looped padded Transformer with parallel cross-entropy supervision on gold CoT tokens to match explicit CoT performance at 3B parameters while reducing thought-phase latency 2.5x-6.9x.
citing papers explorer
-
Efficiently Representing Algorithms With Chain-of-Thought Transformers
CoT transformers simulate any Word RAM algorithm with poly-logarithmic overhead in three architectures, improving on quadratic TM overhead.