KVCapsule compresses KV cache in VLMs by 60% to deliver up to 2x higher tokens-per-second and 2.4x memory reduction with negligible accuracy loss.
Qaq: Quality adaptive quantization for llm kv cache
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
KVCapsule: Efficient Sequential KV Cache Compression for Vision-Language Models with Asymmetric Redundancy
KVCapsule compresses KV cache in VLMs by 60% to deliver up to 2x higher tokens-per-second and 2.4x memory reduction with negligible accuracy loss.