Laion-5b: An open large-scale dataset for training next generation image-text models, 2022

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert · 2022

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models

cs.CV · 2026-05-18 · unverdicted · novelty 8.0

CiF is a large new civil infrastructure segmentation dataset that shows zero-shot foundation models and domain-supervised models plateau at roughly 25% mAP, establishing infrastructure inspection as an open challenge for current visual AI.

ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models

cs.CL · 2024-02-18 · unverdicted · novelty 6.0

ALLaVA creates 1.3M GPT4V-synthesized samples enabling 4B VLMs to achieve competitive results on 17 benchmarks and match 7B/13B models on some tasks.

Tool-MCoT: Tool Augmented Multimodal Chain-of-Thought for Content Safety Moderation

cs.CL · 2026-03-15 · unverdicted · novelty 5.0

A small language model fine-tuned on tool-augmented chain-of-thought data generated by a larger LLM learns to selectively call tools, delivering better content moderation accuracy at lower inference cost.

ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison

cs.LG · 2026-05-19

citing papers explorer

Showing 4 of 4 citing papers.

Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models cs.CV · 2026-05-18 · unverdicted · none · ref 43
CiF is a large new civil infrastructure segmentation dataset that shows zero-shot foundation models and domain-supervised models plateau at roughly 25% mAP, establishing infrastructure inspection as an open challenge for current visual AI.
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models cs.CL · 2024-02-18 · unverdicted · none · ref 122
ALLaVA creates 1.3M GPT4V-synthesized samples enabling 4B VLMs to achieve competitive results on 17 benchmarks and match 7B/13B models on some tasks.
Tool-MCoT: Tool Augmented Multimodal Chain-of-Thought for Content Safety Moderation cs.CL · 2026-03-15 · unverdicted · none · ref 20
A small language model fine-tuned on tool-augmented chain-of-thought data generated by a larger LLM learns to selectively call tools, delivering better content moderation accuracy at lower inference cost.
ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison cs.LG · 2026-05-19 · unreviewed · ref 22

Laion-5b: An open large-scale dataset for training next generation image-text models, 2022

fields

years

verdicts

representative citing papers

citing papers explorer