arXiv preprint arXiv:2402.09398 , year=

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference , author= · arXiv 2402.09398

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Tensor Cache: Eviction-conditioned Associative Memory for Transformers

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Tensor Cache augments sliding-window attention with an eviction-fed outer-product associative memory and a training correction to improve long-context performance under bounded memory.

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

cs.CL · 2024-06-04 · conditional · novelty 6.0

PyramidKV dynamically compresses KV cache across layers following pyramidal information funneling, matching full performance at 12% retention and outperforming alternatives at 0.7% retention with up to 20.5 accuracy gains.

citing papers explorer

Showing 2 of 2 citing papers.

Tensor Cache: Eviction-conditioned Associative Memory for Transformers cs.LG · 2026-05-21 · unverdicted · none · ref 29
Tensor Cache augments sliding-window attention with an eviction-fed outer-product associative memory and a training correction to improve long-context performance under bounded memory.
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling cs.CL · 2024-06-04 · conditional · none · ref 7
PyramidKV dynamically compresses KV cache across layers following pyramidal information funneling, matching full performance at 12% retention and outperforming alternatives at 0.7% retention with up to 20.5 accuracy gains.

arXiv preprint arXiv:2402.09398 , year=

fields

years

verdicts

representative citing papers

citing papers explorer