Self-supervised learning from images with a joint-embedding predictive architecture

Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas · 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

OmniGCD: Abstracting Generalized Category Discovery for Modality Agnosticism

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

OmniGCD trains a Transformer once on synthetic data to enable zero-shot generalized category discovery across 16 datasets in four modalities without any dataset-specific fine-tuning.

POMA-3D: The Point Map Way to 3D Scene Understanding

cs.CV · 2025-11-20 · unverdicted · novelty 7.0

POMA-3D learns self-supervised 3D scene representations from point maps and improves performance on geometric 3D tasks including navigation and scene retrieval.

UNIV: Unified Foundation Model for Infrared and Visible Modalities

cs.CV · 2025-09-19 · unverdicted · novelty 6.0

UNIV introduces Patch Cross-modal Contrastive Learning (PCCL) to build a unified semantic feature space for infrared and visible modalities, supported by the new MVIP dataset of 98,992 aligned pairs, with reported gains on infrared segmentation and detection tasks.

citing papers explorer

Showing 3 of 3 citing papers.

OmniGCD: Abstracting Generalized Category Discovery for Modality Agnosticism cs.CV · 2026-04-16 · unverdicted · none · ref 3
OmniGCD trains a Transformer once on synthetic data to enable zero-shot generalized category discovery across 16 datasets in four modalities without any dataset-specific fine-tuning.
POMA-3D: The Point Map Way to 3D Scene Understanding cs.CV · 2025-11-20 · unverdicted · none · ref 2
POMA-3D learns self-supervised 3D scene representations from point maps and improves performance on geometric 3D tasks including navigation and scene retrieval.
UNIV: Unified Foundation Model for Infrared and Visible Modalities cs.CV · 2025-09-19 · unverdicted · none · ref 2
UNIV introduces Patch Cross-modal Contrastive Learning (PCCL) to build a unified semantic feature space for infrared and visible modalities, supported by the new MVIP dataset of 98,992 aligned pairs, with reported gains on infrared segmentation and detection tasks.

Self-supervised learning from images with a joint-embedding predictive architecture

fields

years

verdicts

representative citing papers

citing papers explorer