HAVEN provides a hierarchically aligned multimodal dataset and evaluation suite for video summarization, temporal reasoning, grounding, and saliency in MLLMs.
Mingfei Han, Linjie Yang, Xiaojun Chang, Lina Yao, and Heng Wang
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
HAVEN: Hierarchically Aligned Multimodal Benchmark for Unified Video Understanding
HAVEN provides a hierarchically aligned multimodal dataset and evaluation suite for video summarization, temporal reasoning, grounding, and saliency in MLLMs.