MLingualFC benchmark finds flowchart jailbreaks succeed at high rates for Latin-script languages but much lower rates for Punjabi in multilingual VLMs, pointing to language-dependent safety gaps.
Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
The robust safety of Vision-Language Large Models (VLLMs) against joint multilingual and multimodal threats remains severely underexplored. Current benchmarks typically isolate these dimensions, being either multilingual but text-only, or multimodal but monolingual. While recent red-teaming efforts attempt to bridge this gap by rendering harmful prompts as images, their overreliance on typography-style visuals and lack of semantically grounded image-text pairs fail to capture realistic cross-modal interactions under multilingual and multimodal conditions. To address this, we introduce Lingua-SafetyBench, a comprehensive benchmark of 100,440 harmful image-text pairs spanning 10 languages. Crucially, Lingua-SafetyBench explicitly partitions data into image-dominant and text-dominant subsets to precisely disentangle sources of risk. Extensive evaluations reveal that current VLLMs retain non-negligible vulnerabilities under these joint inputs. Linguistically, requests in Non-High-Resource Languages (Non-HRLs) and non-Latin scripts generally pose greater threats. Furthermore, analyzing modality-language interactions uncovers a striking asymmetry: in High-Resource Languages (HRLs), models are most vulnerable to image-dominant risks, whereas in Non-HRLs, text-dominant risks severely degrade safety performance. Finally, a controlled study on the Qwen series demonstrates that while model scaling and iterative upgrades improve overall safety, they disproportionately benefit HRLs. This exacerbates the safety disparity between HRLs and Non-HRLs under text-dominant risks, highlighting that achieving robust safety requires dedicated language- and modality-aware alignment strategies beyond mere scaling. The code and dataset will be available at https://github.com/zsxr15/Lingua-SafetyBench.Warning: this paper contains examples with unsafe content.
citation-role summary
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
dataset 1polarities
use dataset 1representative citing papers
Precise Shield identifies safety neurons in VLLMs via activation contrasts and aligns only them with gradient masking, boosting safety, preserving generalization, and enabling zero-shot cross-lingual and cross-modal transfer.
citing papers explorer
-
MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models
MLingualFC benchmark finds flowchart jailbreaks succeed at high rates for Latin-script languages but much lower rates for Punjabi in multilingual VLMs, pointing to language-dependent safety gaps.
-
Precise Shield: Explaining and Aligning VLLM Safety via Neuron-Level Guidance
Precise Shield identifies safety neurons in VLLMs via activation contrasts and aligns only them with gradient masking, boosting safety, preserving generalization, and enabling zero-shot cross-lingual and cross-modal transfer.