Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality,

· 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models

cs.CL · 2026-03-17 · unverdicted · novelty 5.0

WAND adapts AR-TTS models to constant complexity via windowed attention and distillation, cutting KV cache memory by up to 66.2% while preserving quality and achieving length-invariant latency.

citing papers explorer

Showing 1 of 1 citing paper.

WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models cs.CL · 2026-03-17 · unverdicted · none · ref 22
WAND adapts AR-TTS models to constant complexity via windowed attention and distillation, cutting KV cache memory by up to 66.2% while preserving quality and achieving length-invariant latency.

Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality,

fields

years

verdicts

representative citing papers

citing papers explorer