Text-image embedding distance negatively correlates with typographic attack success rates (r = -0.71 to -0.93) on VLMs, with font size and image degradations strongly modulating effectiveness.
Advclip: Downstream-agnostic adversarial examples in multimodal contrastive learning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Reading Between the Pixels: Linking Text-Image Embedding Alignment to Typographic Attack Success on Vision-Language Models
Text-image embedding distance negatively correlates with typographic attack success rates (r = -0.71 to -0.93) on VLMs, with font size and image degradations strongly modulating effectiveness.