Zipcache: Accurate and ef- ficient kv cache quantization with salient token identifica- tion.Advances in Neural Information Processing Systems, 37:68287–68307

Yefei He, Luoming Zhang, Weijia Wu, Jing Liu, Hong Zhou, Bohan Zhuang · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Don't Waste Bits! Adaptive KV-Cache Quantization for Lightweight On-Device LLMs

cs.CV · 2026-04-06 · unverdicted · novelty 5.0

A data-driven adaptive policy for KV-cache bit-width selection based on token importance features reduces decoding latency by ~18% and improves accuracy over static quantization while staying near FP16 levels on SmolLM models.

citing papers explorer

Showing 1 of 1 citing paper.

Don't Waste Bits! Adaptive KV-Cache Quantization for Lightweight On-Device LLMs cs.CV · 2026-04-06 · unverdicted · none · ref 11
A data-driven adaptive policy for KV-cache bit-width selection based on token importance features reduces decoding latency by ~18% and improves accuracy over static quantization while staying near FP16 levels on SmolLM models.

Zipcache: Accurate and ef- ficient kv cache quantization with salient token identifica- tion.Advances in Neural Information Processing Systems, 37:68287–68307

fields

years

verdicts

representative citing papers

citing papers explorer