Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Escaping Plato's Cave: JAM for Aligning Independently Trained Vision and Language Models

cs.LG · 2025-07-01 · unverdicted · novelty 5.0

JAM aligns frozen vision and language models via joint autoencoders and multimodal Spread Loss, reliably inducing cross-modal alignment across layer depths, objectives, and model scales.

citing papers explorer

Showing 1 of 1 citing paper.

Escaping Plato's Cave: JAM for Aligning Independently Trained Vision and Language Models cs.LG · 2025-07-01 · unverdicted · none · ref 15
JAM aligns frozen vision and language models via joint autoencoders and multimodal Spread Loss, reliably inducing cross-modal alignment across layer depths, objectives, and model scales.

Learning transferable visual models from natural language supervision

fields

years

verdicts

representative citing papers

citing papers explorer