Pretrained vision transformers exhibit strong intra-object leakage where each part representation encodes information from the entire object, undermining the faithfulness of attention-based part-centric interpretability methods.
In: ICCV (2023)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Dual use of SAM for broader target pixel learning and DINOv3 for domain-invariant prototypes yields +1.3% and +1.4% mIoU gains over baselines on GTA-to-Cityscapes and SYNTHIA-to-Cityscapes.
citing papers explorer
-
Metonymy in vision models undermines attention-based interpretability
Pretrained vision transformers exhibit strong intra-object leakage where each part representation encodes information from the entire object, undermining the faithfulness of attention-based part-centric interpretability methods.
-
Dual-Foundation Models for Unsupervised Domain Adaptation
Dual use of SAM for broader target pixel learning and DINOv3 for domain-invariant prototypes yields +1.3% and +1.4% mIoU gains over baselines on GTA-to-Cityscapes and SYNTHIA-to-Cityscapes.