Latent-Condensed Attention condenses context in MLA's latent space via query-aware semantic pooling and positional anchor selection, delivering up to 2.5x prefilling speedup and 90% KV cache reduction at 128K length with a length-independent error bound.
InPro- ceedings of the International Workshop on Machine Learning and Programming Languages, pages 10– 19
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Latent-Condensed Transformer for Efficient Long Context Modeling
Latent-Condensed Attention condenses context in MLA's latent space via query-aware semantic pooling and positional anchor selection, delivering up to 2.5x prefilling speedup and 90% KV cache reduction at 128K length with a length-independent error bound.