CoT transformers simulate any Word RAM algorithm with poly-logarithmic overhead in three architectures, improving on quadratic TM overhead.
On the representational capacity of neural language models with chain-of-thought reasoning
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
A unified algebraic account reduces RNN expressivity to syntactic monoid division in wreath products and shows diagonal state-space models realize every even-modulus counter under unsigned-integer quantization but none under floating-point recurrences.
Low-precision softmax transformers with chain-of-thought simulate Turing machines at logarithmic depth and width; summarized CoT improves this to logarithmic space scaling.
LOTUS uses a looped padded Transformer with parallel cross-entropy supervision on gold CoT tokens to match explicit CoT performance at 3B parameters while reducing thought-phase latency 2.5x-6.9x.
citing papers explorer
-
Efficiently Representing Algorithms With Chain-of-Thought Transformers
CoT transformers simulate any Word RAM algorithm with poly-logarithmic overhead in three architectures, improving on quadratic TM overhead.
-
An Algebraic View of the Expressivity of Recurrent Language Models
A unified algebraic account reduces RNN expressivity to syntactic monoid division in wreath products and shows diagonal state-space models realize every even-modulus counter under unsigned-integer quantization but none under floating-point recurrences.
-
The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought
Low-precision softmax transformers with chain-of-thought simulate Turing machines at logarithmic depth and width; summarized CoT improves this to logarithmic space scaling.
-
Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers
LOTUS uses a looped padded Transformer with parallel cross-entropy supervision on gold CoT tokens to match explicit CoT performance at 3B parameters while reducing thought-phase latency 2.5x-6.9x.