Multimodal embedding distance predicts typographic attack success rate across VLMs, and optimizing it under bounded perturbations on surrogates exposes two co-occurring failure modes of lost readability and reduced safety refusals.
Defense-prefix for preventing typographic attacks on clip
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations
Multimodal embedding distance predicts typographic attack success rate across VLMs, and optimizing it under bounded perturbations on surrogates exposes two co-occurring failure modes of lost readability and reduced safety refusals.