pith. sign in

Mvbench: A comprehensive multi-modal video understand- ing benchmark

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

fields

cs.CV 4

years

2026 1 2025 3

verdicts

UNVERDICTED 4

clear filters

representative citing papers

SpatialMosaic: A Multiview VLM Dataset for Partial Visibility

cs.CV · 2025-12-29 · unverdicted · novelty 7.0

SpatialMosaic introduces a 2M-pair multi-view QA dataset and 1M-pair benchmark for MLLMs on spatial reasoning under partial visibility, plus a hybrid baseline that integrates 3D reconstruction models as geometry encoders.

VISTA: Video Interaction Spatio-Temporal Analysis Benchmark

cs.CV · 2026-05-02 · unverdicted · novelty 6.0

VISTA is a new ~12K-pair benchmark and taxonomy for open-set multi-entity spatio-temporal understanding in VLMs that decomposes videos into entities, actions, and relational dynamics for multi-axis diagnostics.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • VISTA: Video Interaction Spatio-Temporal Analysis Benchmark cs.CV · 2026-05-02 · unverdicted · none · ref 35

    VISTA is a new ~12K-pair benchmark and taxonomy for open-set multi-entity spatio-temporal understanding in VLMs that decomposes videos into entities, actions, and relational dynamics for multi-axis diagnostics.