STS is a two-stage pruning framework that decouples structural diversity via repulsion sampling from semantic filtering via cross-attention to reduce redundancy in visual tokens for VLMs.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
When Attention Collapses: Stage-Aware Visual Token Pruning from Structure to Semantics
STS is a two-stage pruning framework that decouples structural diversity via repulsion sampling from semantic filtering via cross-attention to reduce redundancy in visual tokens for VLMs.