As shown in Table 7, LiteLVLM maintains its performance with only a 0.2% drop while pruning 65.9% of the total visual tokens (192 tokens)

to ground instances specified via referring expressions across video frames · 2017

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

CLIP Tricks You: Training-free Token Pruning for Efficient Pixel Grounding in Large VIsion-Language Models

cs.CV · 2026-05-13 · conditional · novelty 6.0

LiteLVLM is a training-free text-guided token pruning strategy that reverses CLIP similarity rankings to retain referent tokens and recover context for efficient pixel grounding while keeping 90% performance.

citing papers explorer

Showing 1 of 1 citing paper after filters.

CLIP Tricks You: Training-free Token Pruning for Efficient Pixel Grounding in Large VIsion-Language Models cs.CV · 2026-05-13 · conditional · none · ref 12
LiteLVLM is a training-free text-guided token pruning strategy that reverses CLIP similarity rankings to retain referent tokens and recover context for efficient pixel grounding while keeping 90% performance.

As shown in Table 7, LiteLVLM maintains its performance with only a 0.2% drop while pruning 65.9% of the total visual tokens (192 tokens)

fields

years

verdicts

representative citing papers

citing papers explorer