Creates the first benchmark dataset integrating papers, slides, videos, and presentations for evaluating AI models on fine-grained multimodal correspondences in science.
Howto100m: Learning a text-video embedding by watching hundred million narrated video clips
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Unifying Scientific Communication: Fine-Grained Correspondence Across Scientific Media
Creates the first benchmark dataset integrating papers, slides, videos, and presentations for evaluating AI models on fine-grained multimodal correspondences in science.