Learn- ing transferable visual models from natural language super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al · 2021

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Concept-wise Attention for Fine-grained Concept Bottleneck Models

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

CoAt-CBM improves fine-grained concept alignment in CBMs by using adaptive visual queries per concept and a contrastive loss that respects relative concept importance instead of independent BCE.

MoonSeg3R: Monocular Online Zero-Shot Segment Anything in 3D with Reconstructive Foundation Priors

cs.CV · 2025-12-17 · unverdicted · novelty 7.0

MoonSeg3R is the first method for online monocular 3D instance segmentation, achieving performance competitive with RGB-D systems by using CUT3R priors for geometric consistency and temporal query memory.

Unify Robot Actions in Camera Frame

cs.RO · 2025-11-21 · conditional · novelty 6.0

CalibAll estimates camera extrinsics on existing datasets to convert robot actions into a unified camera-frame representation, enabling stronger cross-embodiment pretraining.

citing papers explorer

Showing 3 of 3 citing papers.

Concept-wise Attention for Fine-grained Concept Bottleneck Models cs.CV · 2026-04-17 · unverdicted · none · ref 24
CoAt-CBM improves fine-grained concept alignment in CBMs by using adaptive visual queries per concept and a contrastive loss that respects relative concept importance instead of independent BCE.
MoonSeg3R: Monocular Online Zero-Shot Segment Anything in 3D with Reconstructive Foundation Priors cs.CV · 2025-12-17 · unverdicted · none · ref 34
MoonSeg3R is the first method for online monocular 3D instance segmentation, achieving performance competitive with RGB-D systems by using CUT3R priors for geometric consistency and temporal query memory.
Unify Robot Actions in Camera Frame cs.RO · 2025-11-21 · conditional · none · ref 32
CalibAll estimates camera extrinsics on existing datasets to convert robot actions into a unified camera-frame representation, enabling stronger cross-embodiment pretraining.

Learn- ing transferable visual models from natural language super- vision

fields

years

verdicts

representative citing papers

citing papers explorer