Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts

Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut · 2021

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

cs.CV · 2023-11-28 · accept · novelty 6.0

MVBench is a benchmark of 20 temporal video understanding tasks built by transforming static tasks into dynamic ones, with VideoChat2 outperforming prior MLLMs by over 15%.

Sparse Autoencoders are Topic Models

cs.CV · 2025-11-20 · unverdicted · novelty 5.0

Sparse autoencoders are derived as MAP estimators for a continuous topic model, yielding a reusable topic modeling framework that produces coherent topics on text and image datasets.

citing papers explorer

Showing 2 of 2 citing papers.

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark cs.CV · 2023-11-28 · accept · none · ref 6
MVBench is a benchmark of 20 temporal video understanding tasks built by transforming static tasks into dynamic ones, with VideoChat2 outperforming prior MLLMs by over 15%.
Sparse Autoencoders are Topic Models cs.CV · 2025-11-20 · unverdicted · none · ref 18
Sparse autoencoders are derived as MAP estimators for a continuous topic model, yielding a reusable topic modeling framework that produces coherent topics on text and image datasets.

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts

fields

years

verdicts

representative citing papers

citing papers explorer