Sink-Token-aware Pruning (SToP) suppresses semantically uninformative sink tokens during visual token pruning in Video LLMs, boosting fine-grained performance even at 90% pruning rates across hallucination, reasoning, and MCQA benchmarks.
arXiv preprint arXiv:2507.16018 (2025)
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 3polarities
background 3representative citing papers
The first survey on Attention Sink in Transformers structures the literature around fundamental utilization, mechanistic interpretation, and strategic mitigation.
Attention sinks in LVLM create a global-vs-local trade-off that a layer-wise gating module can balance to improve multimodal benchmark performance.
citing papers explorer
-
Sink-Token-Aware Pruning for Fine-Grained Video Understanding in Efficient Video LLMs
Sink-Token-aware Pruning (SToP) suppresses semantically uninformative sink tokens during visual token pruning in Video LLMs, boosting fine-grained performance even at 90% pruning rates across hallucination, reasoning, and MCQA benchmarks.
-
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
The first survey on Attention Sink in Transformers structures the literature around fundamental utilization, mechanistic interpretation, and strategic mitigation.
-
When Sinks Help or Hurt: Unified Framework for Attention Sink in Large Vision-Language Models
Attention sinks in LVLM create a global-vs-local trade-off that a layer-wise gating module can balance to improve multimodal benchmark performance.