Aircache: Activating inter-modal relevancy kv cache compression for efficient large vision-language model inference

Kai Huang, Hao Zou, Bochen Wang, Ye Xi, Zhen Xie, Hao Wang · 2025 · arXiv 2503.23956

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

Visual token pruning in MLLMs fails on complex reasoning due to Relevant Visual Information Shift during decoding, but the DSTP framework fixes it training-free across models.

KVCapsule: Efficient Sequential KV Cache Compression for Vision-Language Models with Asymmetric Redundancy

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

KVCapsule compresses KV cache in VLMs by 60% to deliver up to 2x higher tokens-per-second and 2.4x memory reduction with negligible accuracy loss.

citing papers explorer

Showing 2 of 2 citing papers.

Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding cs.CV · 2026-04-14 · unverdicted · none · ref 14
Visual token pruning in MLLMs fails on complex reasoning due to Relevant Visual Information Shift during decoding, but the DSTP framework fixes it training-free across models.
KVCapsule: Efficient Sequential KV Cache Compression for Vision-Language Models with Asymmetric Redundancy cs.CV · 2026-05-14 · unverdicted · none · ref 13
KVCapsule compresses KV cache in VLMs by 60% to deliver up to 2x higher tokens-per-second and 2.4x memory reduction with negligible accuracy loss.

Aircache: Activating inter-modal relevancy kv cache compression for efficient large vision-language model inference

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer