Reproducible scal- ing laws for contrastive language-image learning

Reproducible scaling laws for contrastive language-image learning , author= · 2022 · arXiv 2212.07143

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Rethinking Model Selection in VLM Through the Lens of Gromov-Wasserstein Distance

cs.CV · 2026-05-02 · unverdicted · novelty 7.0

Gromov-Wasserstein distance between modalities provides a stronger, inference-only predictor of final VLM performance than conventional encoder metrics, backed by theory linking it to cross-modal learnability and verified across 60+ training runs.

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

cs.CV · 2023-03-02 · conditional · novelty 7.0

BiomedCLIP, pretrained on the new 15-million-pair PMC-15M dataset, achieves state-of-the-art performance on diverse biomedical vision-language tasks and even outperforms radiology-specific models on chest X-ray pneumonia detection.

Demystifying CLIP Data

cs.CV · 2023-09-28 · accept · novelty 6.0

MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.

citing papers explorer

Showing 3 of 3 citing papers.

Rethinking Model Selection in VLM Through the Lens of Gromov-Wasserstein Distance cs.CV · 2026-05-02 · unverdicted · none · ref 8
Gromov-Wasserstein distance between modalities provides a stronger, inference-only predictor of final VLM performance than conventional encoder metrics, backed by theory linking it to cross-modal learnability and verified across 60+ training runs.
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs cs.CV · 2023-03-02 · conditional · none · ref 62
BiomedCLIP, pretrained on the new 15-million-pair PMC-15M dataset, achieves state-of-the-art performance on diverse biomedical vision-language tasks and even outperforms radiology-specific models on chest X-ray pneumonia detection.
Demystifying CLIP Data cs.CV · 2023-09-28 · accept · none · ref 35
MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.

Reproducible scal- ing laws for contrastive language-image learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer