HybridKV reduces KV cache memory by up to 7.9x and speeds decoding by 1.52x in MLLMs with almost no performance loss by classifying heads into static and dynamic types and compressing them differently.
Harsh Jhamtani and Taylor Berg-Kirkpatrick
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference
HybridKV reduces KV cache memory by up to 7.9x and speeds decoding by 1.52x in MLLMs with almost no performance loss by classifying heads into static and dynamic types and compressing them differently.