Stop looking for important tokens in multimodal language models: Duplication matters more

Wen, Z · 2025 · arXiv 2502.11494

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

representative citing papers

LearnPruner: Rethinking Attention-based Token Pruning in Vision Language Models

cs.CV · 2026-04-27 · unverdicted · novelty 7.0

LearnPruner prunes vision tokens to 5.5% of the original count while retaining about 95% of VLM performance and delivering 3.2 times faster inference by fixing attention sink in encoders and using unbiased middle-layer attention in LLMs.

Focus-then-Context: Subject-Centric Progressive Visual Token Reduction for Vision-Language Models

cs.CV · 2026-05-20 · conditional · novelty 6.0

SPpruner reduces visual tokens in VLMs via focus identification followed by context-aware scanning, retaining 22.2% tokens for 2.53x speedup on Qwen2.5-VL with negligible accuracy loss.

Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

A unified learnable KV eviction policy with cross-layer calibration reduces memory and matches or exceeds full-cache performance on long-context tasks by retaining useful tokens and limiting attention dilution.

ST-Prune: Training-Free Spatio-Temporal Token Pruning for Vision-Language Models in Autonomous Driving

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

ST-Prune is a training-free spatio-temporal token pruning framework for VLMs in autonomous driving that achieves near-lossless results at 90% token reduction by exploiting motion volatility, temporal recency, and multi-view geometry.

HAWK: Head Importance-Aware Visual Token Pruning in Multimodal Models

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

HAWK is a training-free method that prunes over 80% of visual tokens in MLLMs while retaining 96% accuracy by using head importance weights and text-guided attention to select task-relevant tokens.

RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference

cs.CV · 2026-05-01 · unverdicted · novelty 5.0 · 3 refs

RTPrune introduces a reading-twice inspired two-stage pruning technique for DeepSeek-OCR that retains 84.25% tokens while delivering 99.47% accuracy and 1.23x faster prefill on OmniDocBench.

EvoComp: Learning Visual Token Compression for Multimodal Large Language Models via Semantic-Guided Evolutionary Labeling

cs.CV · 2026-04-18 · unverdicted · novelty 5.0

EvoComp compresses visual tokens in MLLMs by 3x while retaining 99.3% accuracy via an evolutionary labeling strategy that searches for low-loss, semantically diverse token subsets.

Token Reduction via Local and Global Contexts Optimization for Efficient Video Large Language Models

cs.CV · 2026-03-02 · unverdicted · novelty 5.0

AOT reduces visual tokens in VLLMs via intra-frame and inter-frame anchors with local-global optimal transport, delivering competitive benchmark performance and efficiency gains in a training-free way.

citing papers explorer

Showing 8 of 8 citing papers.

LearnPruner: Rethinking Attention-based Token Pruning in Vision Language Models cs.CV · 2026-04-27 · unverdicted · none · ref 18
LearnPruner prunes vision tokens to 5.5% of the original count while retaining about 95% of VLM performance and delivering 3.2 times faster inference by fixing attention sink in encoders and using unbiased middle-layer attention in LLMs.
Focus-then-Context: Subject-Centric Progressive Visual Token Reduction for Vision-Language Models cs.CV · 2026-05-20 · conditional · none · ref 20
SPpruner reduces visual tokens in VLMs via focus identification followed by context-aware scanning, retaining 22.2% tokens for 2.53x speedup on Qwen2.5-VL with negligible accuracy loss.
Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction cs.LG · 2026-05-10 · unverdicted · none · ref 29
A unified learnable KV eviction policy with cross-layer calibration reduces memory and matches or exceeds full-cache performance on long-context tasks by retaining useful tokens and limiting attention dilution.
ST-Prune: Training-Free Spatio-Temporal Token Pruning for Vision-Language Models in Autonomous Driving cs.CV · 2026-04-21 · unverdicted · none · ref 17
ST-Prune is a training-free spatio-temporal token pruning framework for VLMs in autonomous driving that achieves near-lossless results at 90% token reduction by exploiting motion volatility, temporal recency, and multi-view geometry.
HAWK: Head Importance-Aware Visual Token Pruning in Multimodal Models cs.CV · 2026-04-09 · unverdicted · none · ref 36
HAWK is a training-free method that prunes over 80% of visual tokens in MLLMs while retaining 96% accuracy by using head importance weights and text-guided attention to select task-relevant tokens.
RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference cs.CV · 2026-05-01 · unverdicted · none · ref 16 · 3 links
RTPrune introduces a reading-twice inspired two-stage pruning technique for DeepSeek-OCR that retains 84.25% tokens while delivering 99.47% accuracy and 1.23x faster prefill on OmniDocBench.
EvoComp: Learning Visual Token Compression for Multimodal Large Language Models via Semantic-Guided Evolutionary Labeling cs.CV · 2026-04-18 · unverdicted · none · ref 51
EvoComp compresses visual tokens in MLLMs by 3x while retaining 99.3% accuracy via an evolutionary labeling strategy that searches for low-loss, semantically diverse token subsets.
Token Reduction via Local and Global Contexts Optimization for Efficient Video Large Language Models cs.CV · 2026-03-02 · unverdicted · none · ref 60
AOT reduces visual tokens in VLLMs via intra-frame and inter-frame anchors with local-global optimal transport, delivering competitive benchmark performance and efficiency gains in a training-free way.

Stop looking for important tokens in multimodal language models: Duplication matters more

fields

years

verdicts

representative citing papers

citing papers explorer