LiteLVLM is a training-free text-guided token pruning strategy that reverses CLIP similarity rankings to retain referent tokens and recover context for efficient pixel grounding while keeping 90% performance.
As shown in Table 7, LiteLVLM maintains its performance with only a 0.2% drop while pruning 65.9% of the total visual tokens (192 tokens)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
CLIP Tricks You: Training-free Token Pruning for Efficient Pixel Grounding in Large VIsion-Language Models
LiteLVLM is a training-free text-guided token pruning strategy that reverses CLIP similarity rankings to retain referent tokens and recover context for efficient pixel grounding while keeping 90% performance.