Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.
Agentic keyframe search for video question answering
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
HiCrew improves long-form video question answering on EgoSchema and NExT-QA via a hybrid tree for temporal topology, question-aware captioning, and adaptive multi-agent planning, with gains in temporal and causal reasoning.
A survey of test-time scaling for multimodal foundation models that introduces a three-way taxonomy of sampling, feedback, and search approaches along with applications and benchmarks.
citing papers explorer
-
Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction
Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.
-
Test-Time Scaling in Multimodal Foundation Models: A Comprehensive Survey of Generation and Reasoning
A survey of test-time scaling for multimodal foundation models that introduces a three-way taxonomy of sampling, feedback, and search approaches along with applications and benchmarks.