MVBench is a benchmark of 20 temporal video understanding tasks built by transforming static tasks into dynamic ones, with VideoChat2 outperforming prior MLLMs by over 15%.
Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2representative citing papers
Sparse autoencoders are derived as MAP estimators for a continuous topic model, yielding a reusable topic modeling framework that produces coherent topics on text and image datasets.
citing papers explorer
-
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
MVBench is a benchmark of 20 temporal video understanding tasks built by transforming static tasks into dynamic ones, with VideoChat2 outperforming prior MLLMs by over 15%.
-
Sparse Autoencoders are Topic Models
Sparse autoencoders are derived as MAP estimators for a continuous topic model, yielding a reusable topic modeling framework that produces coherent topics on text and image datasets.