Training-free adaptive reuse of stable visual state in video VLMs reduces follow-up latency by 15-36x on Qwen2.5-VL while preserving correctness on VideoMME, with smaller first-query speedups via pruning.
EvoPrune: Early-Stage Visual Token Prun- ing for Efficient MLLMs
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
VLMaxxing through FrameMogging Training-Free Anti-Recomputation for Video Vision-Language Models
Training-free adaptive reuse of stable visual state in video VLMs reduces follow-up latency by 15-36x on Qwen2.5-VL while preserving correctness on VideoMME, with smaller first-query speedups via pruning.