Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al · 2021

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

representative citing papers

SAM 3: Segment Anything with Concepts

cs.CV · 2025-11-20 · unverdicted · novelty 7.0

SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.

Concepts in Motion: Temporal Concept Bottleneck Model for Interpretable Video Classification

cs.CV · 2025-09-25 · unverdicted · novelty 7.0

MoTIF adds temporal self-attention and automatic VLM-based concept discovery to concept bottleneck models for interpretable video classification, showing gains over prior global CBMs on benchmarks.

Generative Frontiers: Why Evaluation Matters for Diffusion Language Models

cs.LG · 2026-04-03 · conditional · novelty 6.0

Generative perplexity and entropy are shown to be the two additive components of KL divergence to a reference distribution, motivating generative frontiers as a principled evaluation method for diffusion language models.

SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP

cs.CV · 2025-09-30 · unverdicted · novelty 6.0

SeMoBridge projects images into the text modality via a semantic bridge to reduce CLIP's intra-modal misalignment and improve few-shot performance.

Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering

cs.CV · 2025-08-31 · unverdicted · novelty 6.0

PMSR progressively constructs structured reasoning trajectories with dual-scope queries and compositional reasoning to improve knowledge acquisition and answer accuracy in knowledge-intensive VQA.

citing papers explorer

Showing 5 of 5 citing papers.

SAM 3: Segment Anything with Concepts cs.CV · 2025-11-20 · unverdicted · none · ref 110
SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.
Concepts in Motion: Temporal Concept Bottleneck Model for Interpretable Video Classification cs.CV · 2025-09-25 · unverdicted · none · ref 31
MoTIF adds temporal self-attention and automatic VLM-based concept discovery to concept bottleneck models for interpretable video classification, showing gains over prior global CBMs on benchmarks.
Generative Frontiers: Why Evaluation Matters for Diffusion Language Models cs.LG · 2026-04-03 · conditional · none · ref 8
Generative perplexity and entropy are shown to be the two additive components of KL divergence to a reference distribution, motivating generative frontiers as a principled evaluation method for diffusion language models.
SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP cs.CV · 2025-09-30 · unverdicted · none · ref 16
SeMoBridge projects images into the text modality via a semantic bridge to reduce CLIP's intra-modal misalignment and improve few-shot performance.
Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering cs.CV · 2025-08-31 · unverdicted · none · ref 35
PMSR progressively constructs structured reasoning trajectories with dual-scope queries and compositional reasoning to improve knowledge acquisition and answer accuracy in knowledge-intensive VQA.

Learning transferable visual models from natural language supervision

fields

years

verdicts

representative citing papers

citing papers explorer