What kind of visual tokens do we need? training- free visual token pruning for multi-modal large language models from the perspective of graph

Yutao Jiang, Qiong Wu, Wenhao Lin, Wei Yu, Yiyi Zhou · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

ForestPrune: High-ratio Visual Token Compression for Video Multimodal Large Language Models via Spatial-Temporal Forest Modeling

cs.CV · 2026-03-24 · unverdicted · novelty 6.0

ForestPrune prunes 90% of visual tokens in video MLLMs like LLaVA-OneVision while retaining 95.8% accuracy by modeling tokens as spatial-temporal forests and scoring importance via tree depth and node roles.

citing papers explorer

Showing 1 of 1 citing paper.

ForestPrune: High-ratio Visual Token Compression for Video Multimodal Large Language Models via Spatial-Temporal Forest Modeling cs.CV · 2026-03-24 · unverdicted · none · ref 20
ForestPrune prunes 90% of visual tokens in video MLLMs like LLaVA-OneVision while retaining 95.8% accuracy by modeling tokens as spatial-temporal forests and scoring importance via tree depth and node roles.

What kind of visual tokens do we need? training- free visual token pruning for multi-modal large language models from the perspective of graph

fields

years

verdicts

representative citing papers

citing papers explorer