Zero-ablation overstates register content dependence in DINO ViTs because mean, noise, and cross-image shuffle replacements preserve performance while zeroing does not.
Emerg- ing properties in self-supervised vision transformers
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2representative citing papers
CLAY reframes pretrained VLM embedding spaces as text-conditional similarity spaces for adaptive, multi-conditioned image retrieval without additional training.
citing papers explorer
-
Zero-Ablation Overstates Register Content Dependence in DINO Vision Transformers
Zero-ablation overstates register content dependence in DINO ViTs because mean, noise, and cross-image shuffle replacements preserve performance while zeroing does not.
-
CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space
CLAY reframes pretrained VLM embedding spaces as text-conditional similarity spaces for adaptive, multi-conditioned image retrieval without additional training.