Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever · 2021

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Indexing Multimodal Language Models for Large-scale Image Retrieval

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

Multimodal LLMs act as training-free similarity estimators for instance-level image retrieval by converting next-token probabilities from image-pair prompts into scores, combined with efficient indexing for scalability.

TrajTok: Learning Trajectory Tokens enables better Video Understanding

cs.CV · 2026-02-26 · unverdicted · novelty 7.0

TrajTok learns adaptive trajectory tokens for videos through a unified end-to-end segmenter, improving understanding performance and efficiency over patch-based or external-pipeline tokenizers.

citing papers explorer

Showing 2 of 2 citing papers.

Indexing Multimodal Language Models for Large-scale Image Retrieval cs.CV · 2026-04-14 · unverdicted · none · ref 42
Multimodal LLMs act as training-free similarity estimators for instance-level image retrieval by converting next-token probabilities from image-pair prompts into scores, combined with efficient indexing for scalability.
TrajTok: Learning Trajectory Tokens enables better Video Understanding cs.CV · 2026-02-26 · unverdicted · none · ref 55
TrajTok learns adaptive trajectory tokens for videos through a unified end-to-end segmenter, improving understanding performance and efficiency over patch-based or external-pipeline tokenizers.

Learning transferable visual models from natural language supervision

fields

years

verdicts

representative citing papers

citing papers explorer