Creates the first benchmark dataset integrating papers, slides, videos, and presentations for evaluating AI models on fine-grained multimodal correspondences in science.
Mavils, a benchmark dataset for video-to-slide alignment, assessing baseline accuracy with a multimodal alignment algorithm leveraging speech, ocr, and visual features
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
Unifying Scientific Communication: Fine-Grained Correspondence Across Scientific Media
Creates the first benchmark dataset integrating papers, slides, videos, and presentations for evaluating AI models on fine-grained multimodal correspondences in science.