AdaSpark delivers up to 57% FLOP reduction in Video-LLMs for long videos through adaptive cube- and token-level sparsity without apparent loss in performance on hour-scale benchmarks.
Q-frame: Query-aware frame selection and multi-resolution adaptation for video-llms
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
KeyVT improves zero-shot 3D question answering by hierarchically selecting semantically and geometrically relevant views and using optimal transport to extract representative tokens from them.
PEEK distills caption-conditioned frame relevance into a lightweight visual model, outperforming adaptive baselines on ActivityNet Captions and MSR-VTT especially at 1-2 frame budgets while adding only 5.2% overhead.
citing papers explorer
-
AdaSpark: Adaptive Sparsity for Efficient Long-Video Understanding
AdaSpark delivers up to 57% FLOP reduction in Video-LLMs for long videos through adaptive cube- and token-level sparsity without apparent loss in performance on hour-scale benchmarks.
-
Zero-Shot 3D Question Answering via Hierarchical View-to-Token Transportation
KeyVT improves zero-shot 3D question answering by hierarchically selecting semantically and geometrically relevant views and using optimal transport to extract representative tokens from them.
-
PEEK: Picking Essential frames via Efficient Knowledge distillation
PEEK distills caption-conditioned frame relevance into a lightweight visual model, outperforming adaptive baselines on ActivityNet Captions and MSR-VTT especially at 1-2 frame budgets while adding only 5.2% overhead.