Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models

· 2026 · cs.CV · arXiv 2601.22737

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

The robust safety of Vision-Language Large Models (VLLMs) against joint multilingual and multimodal threats remains severely underexplored. Current benchmarks typically isolate these dimensions, being either multilingual but text-only, or multimodal but monolingual. While recent red-teaming efforts attempt to bridge this gap by rendering harmful prompts as images, their overreliance on typography-style visuals and lack of semantically grounded image-text pairs fail to capture realistic cross-modal interactions under multilingual and multimodal conditions. To address this, we introduce Lingua-SafetyBench, a comprehensive benchmark of 100,440 harmful image-text pairs spanning 10 languages. Crucially, Lingua-SafetyBench explicitly partitions data into image-dominant and text-dominant subsets to precisely disentangle sources of risk. Extensive evaluations reveal that current VLLMs retain non-negligible vulnerabilities under these joint inputs. Linguistically, requests in Non-High-Resource Languages (Non-HRLs) and non-Latin scripts generally pose greater threats. Furthermore, analyzing modality-language interactions uncovers a striking asymmetry: in High-Resource Languages (HRLs), models are most vulnerable to image-dominant risks, whereas in Non-HRLs, text-dominant risks severely degrade safety performance. Finally, a controlled study on the Qwen series demonstrates that while model scaling and iterative upgrades improve overall safety, they disproportionately benefit HRLs. This exacerbates the safety disparity between HRLs and Non-HRLs under text-dominant risks, highlighting that achieving robust safety requires dedicated language- and modality-aware alignment strategies beyond mere scaling. The code and dataset will be available at https://github.com/zsxr15/Lingua-SafetyBench.Warning: this paper contains examples with unsafe content.

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models

cs.CR · 2026-06-05 · unverdicted · novelty 6.0

MLingualFC benchmark finds flowchart jailbreaks succeed at high rates for Latin-script languages but much lower rates for Punjabi in multilingual VLMs, pointing to language-dependent safety gaps.

Precise Shield: Explaining and Aligning VLLM Safety via Neuron-Level Guidance

cs.CV · 2026-04-10 · unverdicted · novelty 6.0

Precise Shield identifies safety neurons in VLLMs via activation contrasts and aligns only them with gradient masking, boosting safety, preserving generalization, and enabling zero-shot cross-lingual and cross-modal transfer.

citing papers explorer

Showing 2 of 2 citing papers after filters.

MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models cs.CR · 2026-06-05 · unverdicted · none · ref 19 · internal anchor
MLingualFC benchmark finds flowchart jailbreaks succeed at high rates for Latin-script languages but much lower rates for Punjabi in multilingual VLMs, pointing to language-dependent safety gaps.
Precise Shield: Explaining and Aligning VLLM Safety via Neuron-Level Guidance cs.CV · 2026-04-10 · unverdicted · none · ref 30 · internal anchor
Precise Shield identifies safety neurons in VLLMs via activation contrasts and aligns only them with gradient masking, boosting safety, preserving generalization, and enabling zero-shot cross-lingual and cross-modal transfer.

Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer