Vitatecs: A diagnostic dataset for temporal concept understanding of video-language models

Shicheng Li, Lei Li, Shuhuai Ren, Yuanxin Liu, Yi Liu, Rundong Gao, Xu Sun, Lu Hou · 2023 · arXiv 2311.17404

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

cs.CV · 2025-01-23 · unverdicted · novelty 7.0

Video-MMMU benchmark shows large multimodal models exhibit steep performance drops on higher cognitive tasks when learning from professional videos and lag significantly behind humans in knowledge acquisition.

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

cs.CV · 2024-12-18 · unverdicted · novelty 6.0

MLLMs achieve competitive but subhuman performance on the new VSI-Bench for visual-spatial intelligence from videos, with spatial reasoning as the main bottleneck and explicit cognitive map generation improving distance estimation.

TempCompass: Do Video LLMs Really Understand Videos?

cs.CV · 2024-03-01 · unverdicted · novelty 6.0

TempCompass benchmark reveals that state-of-the-art Video LLMs have poor ability to perceive temporal aspects such as speed, direction, and ordering in videos.

citing papers explorer

Showing 3 of 3 citing papers.

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos cs.CV · 2025-01-23 · unverdicted · none · ref 18
Video-MMMU benchmark shows large multimodal models exhibit steep performance drops on higher cognitive tasks when learning from professional videos and lag significantly behind humans in knowledge acquisition.
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces cs.CV · 2024-12-18 · unverdicted · none · ref 44
MLLMs achieve competitive but subhuman performance on the new VSI-Bench for visual-spatial intelligence from videos, with spatial reasoning as the main bottleneck and explicit cognitive map generation improving distance estimation.
TempCompass: Do Video LLMs Really Understand Videos? cs.CV · 2024-03-01 · unverdicted · none · ref 98
TempCompass benchmark reveals that state-of-the-art Video LLMs have poor ability to perceive temporal aspects such as speed, direction, and ordering in videos.

Vitatecs: A diagnostic dataset for temporal concept understanding of video-language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer