OVQ-attention delivers linear-time constant-memory sequence mixing via sparse Gaussian-mixture-based memory updates, matching self-attention performance on tasks up to 64k length while using far less memory.
13 Online V ector Quantized Attention
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Online Vector Quantized Attention
OVQ-attention delivers linear-time constant-memory sequence mixing via sparse Gaussian-mixture-based memory updates, matching self-attention performance on tasks up to 64k length while using far less memory.