Unique Hard Attention: A Tale of Two Sides , url =

Jerad, Selim, Svete, Anej, Li, Jiaoda, Cotterell, Ryan , booktitle = · 2025 · DOI 10.18653/v1/2025.acl-short.76

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Efficiently Representing Algorithms With Chain-of-Thought Transformers

cs.LG · 2026-06-18 · conditional · novelty 8.0

CoT transformers simulate any Word RAM algorithm with poly-logarithmic overhead in three architectures, improving on quadratic TM overhead.

Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete

cs.LG · 2026-06-01 · unverdicted · novelty 8.0

Sliding-window transformers without positional encodings are Turing complete because the sliding window breaks permutation symmetry and suffices to simulate Post machines via a constant-size histogram state.

The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Low-precision softmax transformers with chain-of-thought simulate Turing machines at logarithmic depth and width; summarized CoT improves this to logarithmic space scaling.

Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

LOTUS uses a looped padded Transformer with parallel cross-entropy supervision on gold CoT tokens to match explicit CoT performance at 3B parameters while reducing thought-phase latency 2.5x-6.9x.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete cs.LG · 2026-06-01 · unverdicted · none · ref 18
Sliding-window transformers without positional encodings are Turing complete because the sliding window breaks permutation symmetry and suffices to simulate Post machines via a constant-size histogram state.
The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought cs.LG · 2026-05-18 · unverdicted · none · ref 7
Low-precision softmax transformers with chain-of-thought simulate Turing machines at logarithmic depth and width; summarized CoT improves this to logarithmic space scaling.
Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers cs.LG · 2026-06-30 · unverdicted · none · ref 38
LOTUS uses a looped padded Transformer with parallel cross-entropy supervision on gold CoT tokens to match explicit CoT performance at 3B parameters while reducing thought-phase latency 2.5x-6.9x.

Unique Hard Attention: A Tale of Two Sides , url =

fields

years

verdicts

representative citing papers

citing papers explorer