LVSpec introduces the first training-free loosely speculative decoding framework for Video-LLMs that identifies sparse visual-relevant tokens for strict verification while tolerating position shifts for semantic fillers, delivering 2.7-2.9x speedup with over 99.8% performance retention.
InThe Thir- teenth International Conference on Learning Repre- sentations, ICLR 2025, Singapore, April 24-28, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video LLMs
LVSpec introduces the first training-free loosely speculative decoding framework for Video-LLMs that identifies sparse visual-relevant tokens for strict verification while tolerating position shifts for semantic fillers, delivering 2.7-2.9x speedup with over 99.8% performance retention.