Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts

Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut · 2021

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

No One Knows the State of the Art in Geospatial Foundation Models

cs.CV · 2026-05-12 · accept · novelty 6.0

An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.

$\boldsymbol{\lambda}$-Orthogonality Regularization for Compatible Representation Learning

cs.LG · 2025-09-20 · conditional · novelty 6.0

λ-Orthogonality regularization enables distribution-specific adaptation of representations via affine transformations while retaining original learned structures.

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

cs.CV · 2023-05-17 · conditional · novelty 6.0

PMC-VQA dataset and MedVInT model achieve better generative performance on medical VQA benchmarks by visual instruction tuning on a newly constructed large-scale dataset.

citing papers explorer

Showing 3 of 3 citing papers.

No One Knows the State of the Art in Geospatial Foundation Models cs.CV · 2026-05-12 · accept · none · ref 12
An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.
$\boldsymbol{\lambda}$-Orthogonality Regularization for Compatible Representation Learning cs.LG · 2025-09-20 · conditional · none · ref 65
λ-Orthogonality regularization enables distribution-specific adaptation of representations via affine transformations while retaining original learned structures.
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering cs.CV · 2023-05-17 · conditional · none · ref 9
PMC-VQA dataset and MedVInT model achieve better generative performance on medical VQA benchmarks by visual instruction tuning on a newly constructed large-scale dataset.

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts

fields

years

verdicts

representative citing papers

citing papers explorer