As shown in Table 7, our method consistently maintains competitive performance across multiple video under- standing benchmarks while retaining only 50% of the visual token budget

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CV · 2026-04-13 · unverdicted · novelty 6.0

DeSAP uses decoupled cross-modal similarity plus visual saliency to prune visual tokens in LVLMs, retaining 11.1% tokens for 10x FLOPs reduction and 98.1% performance on LLaVA-1.5-7B.

citing papers explorer

Showing 1 of 1 citing paper.

Decoupled Similarity for Task-Aware Token Pruning in Large Vision-Language Models cs.CV · 2026-04-13 · unverdicted · none · ref 51
DeSAP uses decoupled cross-modal similarity plus visual saliency to prune visual tokens in LVLMs, retaining 11.1% tokens for 10x FLOPs reduction and 98.1% performance on LLaVA-1.5-7B.

As shown in Table 7, our method consistently maintains competitive performance across multiple video under- standing benchmarks while retaining only 50% of the visual token budget

fields

years

verdicts

representative citing papers

citing papers explorer