KVCapsule compresses KV cache in VLMs by 60% to deliver up to 2x higher tokens-per-second and 2.4x memory reduction with negligible accuracy loss.
Topv: Compatible token pruning with inference time optimization for fast and low-memory multimodal vision language model
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
OmniRefine introduces alignment-aware chunk refinement via similarity and dynamic programming followed by modality-cooperative token compression, achieving near-baseline accuracy at 44% token retention on WorldSense.
citing papers explorer
-
KVCapsule: Efficient Sequential KV Cache Compression for Vision-Language Models with Asymmetric Redundancy
KVCapsule compresses KV cache in VLMs by 60% to deliver up to 2x higher tokens-per-second and 2.4x memory reduction with negligible accuracy loss.
-
OmniRefine: Alignment-Aware Cooperative Compression for Efficient Omnimodal Large Language Models
OmniRefine introduces alignment-aware chunk refinement via similarity and dynamic programming followed by modality-cooperative token compression, achieving near-baseline accuracy at 44% token retention on WorldSense.