MI-Pruner: Crossmodal Mutual Information-guided Token Pruner for Efficient MLLMs

· 2026 · cs.CV · arXiv 2604.03072

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

For multimodal large language models (MLLMs), visual information is relatively sparse compared with text. As a result, research on visual pruning emerges for efficient inference. Current approaches typically measure token importance based on the attention scores in the visual encoder or in the LLM decoder, then select visual tokens with high attention scores while pruning others. In this paper, we pursue a different and more surgical approach. Instead of relying on mechanism-specific signals, we directly compute Mutual Information (MI) between visual and textual features themselves, prior to their interaction. This allows us to explicitly measure crossmodal dependency at the feature levels. Our MI-Pruner is simple, efficient and non-intrusive, requiring no access to internal attention maps or architectural modifications. Experimental results demonstrate that our approach outperforms previous attention-based pruning methods with minimal latency.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

EchoPrune: Interpreting Redundancy as Temporal Echoes for Efficient VideoLLMs

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

EchoPrune prunes video tokens via query relevance and temporal reconstruction error to let VideoLLMs handle up to 20x more frames under fixed budget with reported gains in accuracy and speed.

citing papers explorer

Showing 1 of 1 citing paper.

EchoPrune: Interpreting Redundancy as Temporal Echoes for Efficient VideoLLMs cs.CV · 2026-05-11 · unverdicted · none · ref 20 · internal anchor
EchoPrune prunes video tokens via query relevance and temporal reconstruction error to let VideoLLMs handle up to 20x more frames under fixed budget with reported gains in accuracy and speed.

MI-Pruner: Crossmodal Mutual Information-guided Token Pruner for Efficient MLLMs

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer