EvoPrune: Early-stage visual token pruning for efficient MLLMs

Yuhao Chen, Bin Shan, Xin Ye, Cheng Chen · 2026 · arXiv 2603.03681

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

cs.CV · 2026-06-10 · conditional · novelty 7.0

Reroute turns irreversible visual-token pruning into recoverable routing that reuses existing attention scores, improving grounding performance under aggressive reduction on LLaVA-1.5 and Qwen while preserving TFLOPs and KV-cache budgets.

Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs

cs.CV · 2026-05-10 · unverdicted · novelty 5.0

Fre-Res compresses video tokens by preserving spatial anchors and representing temporal dynamics with low-frequency residual tokens derived from 1D-DCT on inter-frame residuals, plus a Spatial-Guided Absorber to reinject the information.

VLMaxxing through FrameMogging Training-Free Anti-Recomputation for Video Vision-Language Models

cs.CV · 2026-05-05 · unverdicted · novelty 5.0

Training-free adaptive reuse of stable visual state in video VLMs reduces follow-up latency by 15-36x on Qwen2.5-VL while preserving correctness on VideoMME, with smaller first-query speedups via pruning.

citing papers explorer

Showing 3 of 3 citing papers.

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models cs.CV · 2026-06-10 · conditional · none · ref 15
Reroute turns irreversible visual-token pruning into recoverable routing that reuses existing attention scores, improving grounding performance under aggressive reduction on LLaVA-1.5 and Qwen while preserving TFLOPs and KV-cache budgets.
Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs cs.CV · 2026-05-10 · unverdicted · none · ref 3
Fre-Res compresses video tokens by preserving spatial anchors and representing temporal dynamics with low-frequency residual tokens derived from 1D-DCT on inter-frame residuals, plus a Spatial-Guided Absorber to reinject the information.
VLMaxxing through FrameMogging Training-Free Anti-Recomputation for Video Vision-Language Models cs.CV · 2026-05-05 · unverdicted · none · ref 7
Training-free adaptive reuse of stable visual state in video VLMs reduces follow-up latency by 15-36x on Qwen2.5-VL while preserving correctness on VideoMME, with smaller first-query speedups via pruning.

EvoPrune: Early-stage visual token pruning for efficient MLLMs

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer