Tensor Cache augments sliding-window attention with an eviction-fed outer-product associative memory and a training correction to improve long-context performance under bounded memory.
arXiv preprint arXiv:2402.09398 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
PyramidKV dynamically compresses KV cache across layers following pyramidal information funneling, matching full performance at 12% retention and outperforming alternatives at 0.7% retention with up to 20.5 accuracy gains.
citing papers explorer
-
Tensor Cache: Eviction-conditioned Associative Memory for Transformers
Tensor Cache augments sliding-window attention with an eviction-fed outer-product associative memory and a training correction to improve long-context performance under bounded memory.
-
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
PyramidKV dynamically compresses KV cache across layers following pyramidal information funneling, matching full performance at 12% retention and outperforming alternatives at 0.7% retention with up to 20.5 accuracy gains.