A Simple and Effective L\_2 Norm-Based Strategy for KV Cache Compression

Devoto, Alessio, Zhao, Yu, Scardapane, Simone, Minervini, Pasquale · 2024 · DOI 10.18653/v1/2024.emnlp-main.1027

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

EntmaxKV: Support-Aware Decoding for Entmax Attention

cs.LG · 2026-05-20 · conditional · novelty 8.0

EntmaxKV enables exact sparse KV-cache decoding for entmax attention via support-aware page selection and a Gaussian threshold estimator, matching full attention quality at a fraction of the cache size with up to 5.43x speedup.

QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving

cs.AI · 2026-06-04 · unverdicted · novelty 7.0

QCFuse achieves full-prefill quality in RAG with 1.7x average prefill speedup over full prefill and 1.5x over ProphetKV via compressed query-aware cache fusion.

Value-Aware Stochastic KV Cache Eviction for Reasoning Models

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

VaSE improves KV cache eviction accuracy for reasoning models by over 4% versus prior eviction methods at 4x compression through value-magnitude protection and stochastic diversity.

TGV-KV: Text-Grounded KV Eviction for Vision-Language Models

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

TGV-KV uses text-vision budgeting, weighted ranking, and prioritised retention to evict KV cache in VLMs while retaining 99.2% accuracy at 5% budget on VizWiz-VQA.

citing papers explorer

Showing 2 of 2 citing papers after filters.

EntmaxKV: Support-Aware Decoding for Entmax Attention cs.LG · 2026-05-20 · conditional · none · ref 5
EntmaxKV enables exact sparse KV-cache decoding for entmax attention via support-aware page selection and a Gaussian threshold estimator, matching full attention quality at a fraction of the cache size with up to 5.43x speedup.
Value-Aware Stochastic KV Cache Eviction for Reasoning Models cs.LG · 2026-06-02 · unverdicted · none · ref 24
VaSE improves KV cache eviction accuracy for reasoning models by over 4% versus prior eviction methods at 4x compression through value-magnitude protection and stochastic diversity.

A Simple and Effective L\_2 Norm-Based Strategy for KV Cache Compression

fields

years

verdicts

representative citing papers

citing papers explorer