Mvbench: A comprehensive multi-modal video understand- ing benchmark

Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Luo, et al · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

Introduces CaST-Bench, a dataset of 2,066 causal questions on 1,015 videos with annotated causal chains and metrics to evaluate VLMs on spatio-temporal causal reasoning.

MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models

cs.CV · 2025-11-23 · unverdicted · novelty 5.0

MASS adds spatiotemporal motion signals and 3D grounding to VLMs and releases MASS-Bench, yielding physics-reasoning performance within 2% of Gemini-2.5-Flash after reinforcement fine-tuning.

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

cs.CV · 2026-05-19 · 2 refs

citing papers explorer

Showing 3 of 3 citing papers.

CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering cs.CV · 2026-05-22 · unverdicted · none · ref 22
Introduces CaST-Bench, a dataset of 2,066 causal questions on 1,015 videos with annotated causal chains and metrics to evaluate VLMs on spatio-temporal causal reasoning.
MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models cs.CV · 2025-11-23 · unverdicted · none · ref 28
MASS adds spatiotemporal motion signals and 3D grounding to VLMs and releases MASS-Bench, yielding physics-reasoning performance within 2% of Gemini-2.5-Flash after reinforcement fine-tuning.
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding cs.CV · 2026-05-19 · unreviewed · ref 11 · 2 links

Mvbench: A comprehensive multi-modal video understand- ing benchmark

fields

years

verdicts

representative citing papers

citing papers explorer