Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al · 2021

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

representative citing papers

PixDLM: A Dual-Path Multimodal Language Model for UAV Reasoning Segmentation

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

The work introduces the UAV Reasoning Segmentation task, the DRSeg benchmark dataset, and PixDLM as a baseline dual-path multimodal language model for reasoning-based segmentation in aerial imagery.

REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.

LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

LangFlash introduces a feed-forward model for 3D language Gaussian splatting from sparse unposed images, claiming superior novel view synthesis and semantic consistency via enriched training data and sparse semantic encoding.

SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection

cs.CV · 2026-04-20 · unverdicted · novelty 5.0

SemLT3D introduces semantic-guided expert distillation with a language MoE module and CLIP projection to enrich features for long-tailed classes in camera-only 3D detection.

FREE-Switch: Frequency-based Dynamic LoRA Switch for Style Transfer

cs.CV · 2026-04-11 · unverdicted · novelty 5.0

FREE-Switch dynamically switches LoRA adapters using frequency importance per diffusion step and adds semantic alignment to reduce content drift when merging specialized image generators.

MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

cs.CV · 2025-11-18

citing papers explorer

Showing 6 of 6 citing papers.

PixDLM: A Dual-Path Multimodal Language Model for UAV Reasoning Segmentation cs.CV · 2026-04-17 · unverdicted · none · ref 33
The work introduces the UAV Reasoning Segmentation task, the DRSeg benchmark dataset, and PixDLM as a baseline dual-path multimodal language model for reasoning-based segmentation in aerial imagery.
REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement cs.CV · 2026-04-30 · unverdicted · none · ref 36
REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.
LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images cs.CV · 2026-05-22 · unverdicted · none · ref 34
LangFlash introduces a feed-forward model for 3D language Gaussian splatting from sparse unposed images, claiming superior novel view synthesis and semantic consistency via enriched training data and sparse semantic encoding.
SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection cs.CV · 2026-04-20 · unverdicted · none · ref 42
SemLT3D introduces semantic-guided expert distillation with a language MoE module and CLIP projection to enrich features for long-tailed classes in camera-only 3D detection.
FREE-Switch: Frequency-based Dynamic LoRA Switch for Style Transfer cs.CV · 2026-04-11 · unverdicted · none · ref 22
FREE-Switch dynamically switches LoRA adapters using frequency importance per diffusion step and adds semantic alignment to reduce content drift when merging specialized image generators.
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs cs.CV · 2025-11-18 · unreviewed · ref 41

Learning transferable visual models from natural language supervi- sion

fields

years

verdicts

representative citing papers

citing papers explorer