High-noise feature drift distinguishes adversarial from clean inputs in CLIP, allowing a plug-in gating mechanism to selectively trigger existing test-time defenses and raise mean clean+adversarial accuracy across 13 datasets.
Double visual defense: Adversarial pre-training and instruction tuning for improving vision-language model robustness.arXiv preprint arXiv:2501.09446, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Robust vision encoders from multimodal adversarial pretraining transfer to MLLMs and deliver large gains in adversarial captioning and VQA performance, while test-time stochastic transformations provide an effective black-box defense.
citing papers explorer
-
Beyond False Stability: High-Noise Drift Gating for Test-Time Adversarial Defenses in Vision-Language Models
High-noise feature drift distinguishes adversarial from clean inputs in CLIP, allowing a plug-in gating mechanism to selectively trigger existing test-time defenses and raise mean clean+adversarial accuracy across 13 datasets.
-
Investigating Adversarial Robustness of Multi-modal Large Language Models
Robust vision encoders from multimodal adversarial pretraining transfer to MLLMs and deliver large gains in adversarial captioning and VQA performance, while test-time stochastic transformations provide an effective black-box defense.