AdvFLYP finetunes CLIP on web image-text pairs using adversarial contrastive learning and regularization to boost zero-shot adversarial robustness across domains better than prior proxy-dataset methods.
Text-guided attention is all you need for zero-shot robustness in vision- language models.Advances in Neural Information Processing Systems, 37:96424–96448, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Finetune Like You Pretrain: Boosting Zero-shot Adversarial Robustness in Vision-language Models
AdvFLYP finetunes CLIP on web image-text pairs using adversarial contrastive learning and regularization to boost zero-shot adversarial robustness across domains better than prior proxy-dataset methods.