HAVEN provides a hierarchically aligned multimodal dataset and evaluation suite for video summarization, temporal reasoning, grounding, and saliency in MLLMs.
Evaluating and improving factuality in multimodal abstractive summarization
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
HAVEN: Hierarchically Aligned Multimodal Benchmark for Unified Video Understanding
HAVEN provides a hierarchically aligned multimodal dataset and evaluation suite for video summarization, temporal reasoning, grounding, and saliency in MLLMs.