LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.
Redcaps: Web-curated image-text data created by the people, for the people
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3representative citing papers
InternVid supplies 7M videos and LLM captions to train ViCLIP, which reaches leading zero-shot action recognition and competitive retrieval performance.
The NTIRE 2026 challenge provides a dataset of over 294,000 real and AI-generated images with 36 transformations to benchmark robust detection models.
citing papers explorer
-
LAION-5B: An open large-scale dataset for training next generation image-text models
LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.
-
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
InternVid supplies 7M videos and LLM captions to train ViCLIP, which reaches leading zero-shot action recognition and competitive retrieval performance.
-
NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild
The NTIRE 2026 challenge provides a dataset of over 294,000 real and AI-generated images with 36 transformations to benchmark robust detection models.