Q-Zoom achieves up to 4.39x inference speedup in high-resolution MLLM scenarios via query-aware gating and region localization, matching or exceeding baseline accuracy on document and high-res benchmarks.
Improved baselines with visual instruction tuning,
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
A skeleton-based zero-shot VAD method distills LLM knowledge for action typicality during training and performs test-time context uniqueness analysis to derive scene-adaptive normality boundaries, claiming SOTA results on four datasets with over 100 unseen scenes.
citing papers explorer
-
Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models
Q-Zoom achieves up to 4.39x inference speedup in high-resolution MLLM scenarios via query-aware gating and region localization, matching or exceeding baseline accuracy on document and high-res benchmarks.
-
Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detection
A skeleton-based zero-shot VAD method distills LLM knowledge for action typicality during training and performs test-time context uniqueness analysis to derive scene-adaptive normality boundaries, claiming SOTA results on four datasets with over 100 unseen scenes.