pith. sign in

arxiv: 2605.27927 · v1 · pith:F5NX7CB4new · submitted 2026-05-27 · 💻 cs.CV · cs.LG

Structure-Guided Visual Perturbation Neutralization for LVLMs

classification 💻 cs.CV cs.LG
keywords lvlmsdefensesimageneutralizationvisionvisualachievesadversarial
0
0 comments X
read the original abstract

Image inputs enable Large Vision Language Models (LVLMs) to perceive fine-grained visual information, but also introduce a pixel-level attack surface through which adversarial perturbations can elicit unsafe model behaviors. However, most existing defenses are designed for traditional computer vision settings and thus often overlook the cross-modal alignment required by LVLMs, leading to degraded performance. Meanwhile, the limited defenses tailored to LVLMs often require substantial image modifications and introduce considerable computational overhead, thereby compromising inference quality and efficiency. To address these limitations, we propose Structure-Induced Guided Neutralization (SIGN), a lightweight, plug-and-play defense framework that improves LVLM compatibility via Prior Structural Extraction and achieves efficient perturbation suppression via Dynamic Guided Neutralization. Extensive experiments show that SIGN achieves over 87\% defense success rate with only 0.5\% pixel modification and 0.16 seconds per image, while nearly preserving original visual representations and benign task performance. Our work offers a lightweight alternative to defenses that require costly model training and highlights the potential of exploiting a vision encoder for efficient adversarial protection. Our code is open source on https://anonymous.4open.science/r/SIGN-BCB1.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.