LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.
Yfcc100m: The new data in multimedia research.Communications of the ACM, 59(2):64–73
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3representative citing papers
UpstreamQA disentangles video reasoning by using LRMs for explicit upstream object identification and scene context before downstream LMM VideoQA, improving performance and interpretability on OpenEQA and NExTQA in some cases.
A geometry-consistent pre-training paradigm using masked inlier reconstruction and a dual-stream encoder produces more robust and generalizable correspondence pruning for camera pose estimation and related 3D tasks.
citing papers explorer
-
LAION-5B: An open large-scale dataset for training next generation image-text models
LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.
-
UpstreamQA: A Modular Framework for Explicit Reasoning on Video Question Answering Tasks
UpstreamQA disentangles video reasoning by using LRMs for explicit upstream object identification and scene context before downstream LMM VideoQA, improving performance and interpretability on OpenEQA and NExTQA in some cases.
-
Scalable and Generalizable Correspondence Pruning via Geometry-Consistent Pre-training
A geometry-consistent pre-training paradigm using masked inlier reconstruction and a dual-stream encoder produces more robust and generalizable correspondence pruning for camera pose estimation and related 3D tasks.