Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, Armand Joulin · 2021

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

representative citing papers

VideoCoF: Unified Video Editing with Temporal Reasoner

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

VideoCoF adds an explicit reasoning step using edit-region latents in video diffusion models to enable precise mask-free editing and motion alignment with only 50k training pairs.

FastVGGT: Training-Free Acceleration of Visual Geometry Transformer

cs.CV · 2025-09-02 · conditional · novelty 7.0

FastVGGT achieves 4x speedup on VGGT for 1000-image inputs using training-free token merging tailored to 3D architectures while reducing error accumulation.

Seeking Consensus: Geometric-Semantic On-the-Fly Recalibration for Open-Vocabulary Remote Sensing Semantic Segmentation

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

SeeCo is a training-free on-the-fly recalibration method using multi-view geometric consistency and adaptive textual calibration to improve open-vocabulary semantic segmentation in remote sensing images.

VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation

cs.CV · 2026-04-15 · unverdicted · novelty 6.0 · 2 refs

VGGT-Segmentor achieves new SOTA cross-view segmentation on Ego-Exo4D (67.7% Ego-to-Exo, 68.0% Exo-to-Ego IoU) via geometry-enhanced features, a three-stage segmentation head, and correspondence-free pretraining.

Semantic Noise Reduction via Teacher-Guided Dual-Path Audio-Visual Representation Learning

cs.SD · 2026-04-09 · unverdicted · novelty 6.0

TG-DP decouples reconstruction and alignment objectives into separate paths with teacher guidance on visibility patterns, yielding SOTA zero-shot audio-video retrieval gains on AudioSet.

PEPR: Privileged Event-based Predictive Regularization for Domain Generalization

cs.CV · 2026-02-04 · unverdicted · novelty 6.0

PEPR reframes learning with privileged event data as predicting latent event features from RGB to improve domain generalization in object detection and segmentation without direct cross-modal alignment.

RADSeg: Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglomerative Models

cs.CV · 2025-11-24 · unverdicted · novelty 6.0

RADSeg adapts the RADIO model with targeted enhancements to deliver 6-30% higher mIoU in zero-shot OVSS while using 2.5x fewer parameters and running 3.95x faster than prior large-model combinations.

CLIP-Guided Data Augmentation for Night-Time Image Dehazing

cs.CV · 2026-04-07 · unverdicted · novelty 5.0

CLIP-guided selection of external data plus staged NAFNet training and inference fusion provides an effective pipeline for nighttime image dehazing in the NTIRE 2026 challenge.

citing papers explorer

Showing 8 of 8 citing papers.

VideoCoF: Unified Video Editing with Temporal Reasoner cs.CV · 2025-12-08 · unverdicted · none · ref 3
VideoCoF adds an explicit reasoning step using edit-region latents in video diffusion models to enable precise mask-free editing and motion alignment with only 50k training pairs.
FastVGGT: Training-Free Acceleration of Visual Geometry Transformer cs.CV · 2025-09-02 · conditional · none · ref 4
FastVGGT achieves 4x speedup on VGGT for 1000-image inputs using training-free token merging tailored to 3D architectures while reducing error accumulation.
Seeking Consensus: Geometric-Semantic On-the-Fly Recalibration for Open-Vocabulary Remote Sensing Semantic Segmentation cs.CV · 2026-04-29 · unverdicted · none · ref 6
SeeCo is a training-free on-the-fly recalibration method using multi-view geometric consistency and adaptive textual calibration to improve open-vocabulary semantic segmentation in remote sensing images.
VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation cs.CV · 2026-04-15 · unverdicted · none · ref 7 · 2 links
VGGT-Segmentor achieves new SOTA cross-view segmentation on Ego-Exo4D (67.7% Ego-to-Exo, 68.0% Exo-to-Ego IoU) via geometry-enhanced features, a three-stage segmentation head, and correspondence-free pretraining.
Semantic Noise Reduction via Teacher-Guided Dual-Path Audio-Visual Representation Learning cs.SD · 2026-04-09 · unverdicted · none · ref 9
TG-DP decouples reconstruction and alignment objectives into separate paths with teacher guidance on visibility patterns, yielding SOTA zero-shot audio-video retrieval gains on AudioSet.
PEPR: Privileged Event-based Predictive Regularization for Domain Generalization cs.CV · 2026-02-04 · unverdicted · none · ref 7
PEPR reframes learning with privileged event data as predicting latent event features from RGB to improve domain generalization in object detection and segmentation without direct cross-modal alignment.
RADSeg: Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglomerative Models cs.CV · 2025-11-24 · unverdicted · none · ref 6
RADSeg adapts the RADIO model with targeted enhancements to deliver 6-30% higher mIoU in zero-shot OVSS while using 2.5x fewer parameters and running 3.95x faster than prior large-model combinations.
CLIP-Guided Data Augmentation for Night-Time Image Dehazing cs.CV · 2026-04-07 · unverdicted · none · ref 11
CLIP-guided selection of external data plus staged NAFNet training and inference fusion provides an effective pipeline for nighttime image dehazing in the NTIRE 2026 challenge.

Emerg- ing properties in self-supervised vision transformers

fields

years

verdicts

representative citing papers

citing papers explorer