RESOLVE provides a controlled multi-resolution LiDAR and camera benchmark for evaluating 3D detection and tracking under point sparsity variations in roadside cooperative perception.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
fields
cs.CV 3years
2026 3representative citing papers
HAVEN provides a hierarchically aligned multimodal dataset and evaluation suite for video summarization, temporal reasoning, grounding, and saliency in MLLMs.
LVSum is a new benchmark for timestamp-aware long video summarization that exposes systematic temporal gaps in existing multimodal large language models.
citing papers explorer
-
HAVEN: Hierarchically Aligned Multimodal Benchmark for Unified Video Understanding
HAVEN provides a hierarchically aligned multimodal dataset and evaluation suite for video summarization, temporal reasoning, grounding, and saliency in MLLMs.
-
LVSum: A Benchmark for Timestamp-Aware Long Video Summarization
LVSum is a new benchmark for timestamp-aware long video summarization that exposes systematic temporal gaps in existing multimodal large language models.