EntropyScan detects backdoored LVLMs by quantifying structural anomalies in visual attention distributions on benign samples via Tsallis entropy and reference-anchored Z-score normalization.
Test-time backdoor attacks on multimodal large language models
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
background 2polarities
background 2representative citing papers
ProjLens shows that backdoor parameters in MLLMs are encoded in low-rank subspaces of the projector and that embeddings shift toward the target direction with magnitude linear in input norm, activating only on poisoned samples.
Phantasia is a new backdoor attack on VLMs that dynamically aligns malicious outputs with input context to achieve higher stealth and state-of-the-art success rates compared to static-pattern attacks.
GLA backdoor attack on DriveVLM uses naturalistic graffiti and cross-lingual triggers to reach 90% ASR at 10% poisoning ratio while improving some clean-task metrics like BLEU-1.
A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.
citing papers explorer
-
EntropyScan: Towards Model-level Backdoor Detection in LVLMs via Visual Attention Entropy
EntropyScan detects backdoored LVLMs by quantifying structural anomalies in visual attention distributions on benign samples via Tsallis entropy and reference-anchored Z-score normalization.
-
ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety
ProjLens shows that backdoor parameters in MLLMs are encoded in low-rank subspaces of the projector and that embeddings shift toward the target direction with magnitude linear in input norm, activating only on poisoned samples.
-
Phantasia: Context-Adaptive Backdoors in Vision Language Models
Phantasia is a new backdoor attack on VLMs that dynamically aligns malicious outputs with input context to achieve higher stealth and state-of-the-art success rates compared to static-pattern attacks.
-
Multimodal Backdoor Attack on VLMs for Autonomous Driving via Graffiti and Cross-Lingual Triggers
GLA backdoor attack on DriveVLM uses naturalistic graffiti and cross-lingual triggers to reach 90% ASR at 10% poisoning ratio while improving some clean-task metrics like BLEU-1.
-
Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety
A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.