Introduces CaST-Bench, a dataset of 2,066 causal questions on 1,015 videos with annotated causal chains and metrics to evaluate VLMs on spatio-temporal causal reasoning.
Vrbench: A benchmark for multi-step reasoning in long narrative videos
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
VideoDR is a new benchmark for open-web video deep research that tests multimodal models on cross-frame visual anchor extraction, interactive retrieval, and multi-hop reasoning over joint video-web evidence.
Chain-of-Glimpse is a reinforcement-learning-based framework that iteratively grounds visual evidence regions to enable multi-step object-aware reasoning in videos.
citing papers explorer
-
CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering
Introduces CaST-Bench, a dataset of 2,066 causal questions on 1,015 videos with annotated causal chains and metrics to evaluate VLMs on spatio-temporal causal reasoning.
-
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning
VideoDR is a new benchmark for open-web video deep research that tests multimodal models on cross-frame visual anchor extraction, interactive retrieval, and multi-hop reasoning over joint video-web evidence.
-
Chain-of-Glimpse: Search-Guided Progressive Object-Grounded Reasoning for Video Understanding
Chain-of-Glimpse is a reinforcement-learning-based framework that iteratively grounds visual evidence regions to enable multi-step object-aware reasoning in videos.