LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Uniform popularity vector uniquely minimizes LRU hit rate H_C(p) on the interior simplex, with strict radial increase proven via explicit positive pair-square formula for the derivative.
citing papers explorer
-
LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding
LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.
-
Radial Extremality for LRU Caching and the Fill--Holst Conjecture
Uniform popularity vector uniquely minimizes LRU hit rate H_C(p) on the interior simplex, with strict radial increase proven via explicit positive pair-square formula for the derivative.