Localizing objects with self-supervised transformers and no labels

Oriane Sim´eoni, Gilles Puy, Huy V V o, Simon Roburin, Spyros Gidaris, Andrei Bursuc, Patrick P´erez, Renaud Marlet, Jean Ponce · 2021 · arXiv 2109.14279

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

representative citing papers

OVS-DINO: Open-Vocabulary Segmentation via Structure-Aligned SAM-DINO with Language Guidance

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

OVS-DINO structurally aligns DINO with SAM to revitalize attenuated boundary features, achieving SOTA gains of 2.1% average and 6.3% on Cityscapes in weakly-supervised open-vocabulary segmentation.

FROST: Training-Free Few-Shot Segmentation with Frozen Features and Nonparametric Statistics

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

FROST performs training-free few-shot segmentation on remote-sensing imagery by nonparametric density-ratio classification on frozen DINOv3 features and reports 5.6 mIoU gains from one example across 17 benchmarks.

Registers Matter for Pixel-Space Diffusion Transformers

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

Register tokens enhance pixel-space DiT training and output quality via cleaner high-noise feature maps, and a dual-stream design adds further gains with little overhead.

Reference-based Category Discovery: Unsupervised Object Detection with Category Awareness

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

RefCD enables unsupervised category-aware object detection by using feature similarity between predicted objects and unlabeled reference images to guide category learning.

Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

TunnelMIND recalibrates language-guided defect proposals via dense visual consistency and reconstructs them into structured defect entities with attributes for severity grading and retrieval-grounded engineering reports, reporting F1 scores of 0.68, 0.78, and 0.72 on visible, GPR, and road defect任务.

ViCrop-Det: Spatial Attention Entropy Guided Cropping for Training-Free Small-Object Detection

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

ViCrop-Det uses spatial attention entropy from the decoder to dynamically crop and refine small-object regions in transformer detectors during inference.

Vision Transformers Need More Than Registers

cs.CV · 2026-02-25 · unverdicted · novelty 6.0

ViTs exhibit lazy aggregation by relying on irrelevant background patches for global semantics, and selectively integrating patch features into the CLS token reduces this effect and improves results across label-, text-, and self-supervision.

Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning

cs.CV · 2025-07-18 · conditional · novelty 6.0

Franca introduces nested Matryoshka clustering and positional disentanglement in a transparent SSL pipeline to deliver open-source vision models competitive with closed proprietary systems.

PANC: Prior-Aware Normalized Cut via Anchor-Augmented Token Graphs

cs.CV · 2026-02-06 · unverdicted · novelty 5.0

PANC augments Normalized Cut with anchor-augmented token graphs using priors to steer spectral partitions, yielding mIoU gains of 2.3-8.7% over baselines on DUTS-TE, DUT-OMRON, and CrackForest.

citing papers explorer

Showing 9 of 9 citing papers.

OVS-DINO: Open-Vocabulary Segmentation via Structure-Aligned SAM-DINO with Language Guidance cs.CV · 2026-04-09 · unverdicted · none · ref 43
OVS-DINO structurally aligns DINO with SAM to revitalize attenuated boundary features, achieving SOTA gains of 2.1% average and 6.3% on Cityscapes in weakly-supervised open-vocabulary segmentation.
FROST: Training-Free Few-Shot Segmentation with Frozen Features and Nonparametric Statistics cs.CV · 2026-06-30 · unverdicted · none · ref 7
FROST performs training-free few-shot segmentation on remote-sensing imagery by nonparametric density-ratio classification on frozen DINOv3 features and reports 5.6 mIoU gains from one example across 17 benchmarks.
Registers Matter for Pixel-Space Diffusion Transformers cs.CV · 2026-05-15 · unverdicted · none · ref 8
Register tokens enhance pixel-space DiT training and output quality via cleaner high-noise feature maps, and a dual-stream design adds further gains with little overhead.
Reference-based Category Discovery: Unsupervised Object Detection with Category Awareness cs.CV · 2026-05-06 · unverdicted · none · ref 8
RefCD enables unsupervised category-aware object detection by using feature similarity between predicted objects and unlabeled reference images to guide category learning.
Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction cs.CV · 2026-04-30 · unverdicted · none · ref 14
TunnelMIND recalibrates language-guided defect proposals via dense visual consistency and reconstructs them into structured defect entities with attributes for severity grading and retrieval-grounded engineering reports, reporting F1 scores of 0.68, 0.78, and 0.72 on visible, GPR, and road defect任务.
ViCrop-Det: Spatial Attention Entropy Guided Cropping for Training-Free Small-Object Detection cs.CV · 2026-04-29 · unverdicted · none · ref 26
ViCrop-Det uses spatial attention entropy from the decoder to dynamically crop and refine small-object regions in transformer detectors during inference.
Vision Transformers Need More Than Registers cs.CV · 2026-02-25 · unverdicted · none · ref 31
ViTs exhibit lazy aggregation by relying on irrelevant background patches for global semantics, and selectively integrating patch features into the CLS token reduces this effect and improves results across label-, text-, and self-supervision.
Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning cs.CV · 2025-07-18 · conditional · none · ref 76
Franca introduces nested Matryoshka clustering and positional disentanglement in a transparent SSL pipeline to deliver open-source vision models competitive with closed proprietary systems.
PANC: Prior-Aware Normalized Cut via Anchor-Augmented Token Graphs cs.CV · 2026-02-06 · unverdicted · none · ref 33
PANC augments Normalized Cut with anchor-augmented token graphs using priors to steer spectral partitions, yielding mIoU gains of 2.3-8.7% over baselines on DUTS-TE, DUT-OMRON, and CrackForest.

Localizing objects with self-supervised transformers and no labels

fields

years

verdicts

representative citing papers

citing papers explorer