MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models

Deeksha Varshney; Mamta; Oana Cocarascu; Rishabh Makwana

arxiv: 2606.07706 · v1 · pith:V57YKBK7new · submitted 2026-06-05 · 💻 cs.CR · cs.AI

MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models

Rishabh Makwana , Mamta , Deeksha Varshney , Oana Cocarascu This is my paper

Pith reviewed 2026-06-27 21:45 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords multilingual vision-language modelsjailbreak attacksflowchart promptssafety alignmentmultilingual benchmarkvisual encodingLatin script languagesPunjabi

0 comments

The pith

Flowchart images encoding harmful instructions jailbreak vision-language models more effectively in Latin-script languages than in Punjabi.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MLingualFC, a benchmark that encodes harmful instructions into flowchart images across five languages including Hindi, Punjabi, Spanish, Romanian, and German. It evaluates state-of-the-art multilingual VLMs under black-box conditions and finds high attack success rates for Latin-script languages, showing that visual presentation bypasses existing safety alignment. For non-Latin scripts like Punjabi the success rate drops sharply, which the authors attribute to recognition difficulties rather than better defenses. A sympathetic reader would care because the results indicate that current safety mechanisms do not transfer reliably across languages or input modalities.

Core claim

Flowchart-based attacks achieve high attack success rates in case of Latin script languages, demonstrating that visual encoding of harmful content effectively bypasses safety alignment across languages. In contrast, non-Latin script languages such as Punjabi exhibit substantially lower ASR, suggesting potential limitations in visual text recognition rather than stronger safety alignment. These findings highlight that current VLM safety mechanisms fail to generalize across languages and modalities.

What carries the argument

MLingualFC benchmark that converts harmful instructions into multilingual flowchart images for testing black-box jailbreak success in VLMs.

If this is right

Visual encoding of harmful content bypasses safety alignment in Latin-script languages for models including Qwen2.5-VL, Gemma-4, and Pangea.
Safety mechanisms in current VLMs do not generalize across languages and visual modalities.
Non-Latin script languages show lower vulnerability primarily because of recognition limits rather than stronger alignment.
Multilingual safety gaps exist under structured visual prompt attacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Improving visual text recognition for non-Latin scripts would likely increase attack success rates unless safety training is strengthened for visual inputs.
Safety training should incorporate flowchart-style visual representations of text in every supported language to close the observed gaps.
Extending the benchmark to additional scripts and languages would test whether the Latin versus non-Latin pattern generalizes.

Load-bearing premise

The lower attack success rate for Punjabi is caused by limitations in visual text recognition rather than by differences in safety alignment or other factors.

What would settle it

Direct measurement of each model's accuracy at reading and transcribing the Punjabi text inside the flowchart images, followed by checking whether lower recognition accuracy predicts the observed lower attack success rates.

Figures

Figures reproduced from arXiv: 2606.07706 by Deeksha Varshney, Mamta, Oana Cocarascu, Rishabh Makwana.

read the original abstract

Vision-Language Models (VLMs) have demonstrated strong performance across multimodal tasks, yet their safety robustness remains an open challenge. While prior work has shown that structured visual prompts such as flowcharts can effectively jailbreak VLMs, existing studies are largely limited to English-centric settings. In this paper, we introduce MLingualFC, a multilingual multimodal benchmark designed to evaluate jailbreak vulnerabilities of VLMs across diverse languages using structured flowchart representations. MLingualFC encodes harmful instructions into flowchart images across five languages (Hindi, Punjabi, Spanish, Romanian, and German). We evaluate state-of-the-art multilingual VLMs, including Qwen2.5-VL, Gemma-4, and Pangea, under a black-box threat model. Our results reveal significant multilingual safety gaps. Flowchart-based attacks achieve high attack success rates (ASR) in case of Latin script languages, demonstrating that visual encoding of harmful content effectively bypasses safety alignment across languages. In contrast, non-Latin script languages such as Punjabi exhibit substantially lower ASR, suggesting potential limitations in visual text recognition rather than stronger safety alignment. These findings highlight that current VLM safety mechanisms fail to generalize across languages and modalities. Resources are available at https://github.com/Rishabhpm23/MLingualFC

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MLingualFC gives a clean multilingual extension of flowchart jailbreaks with a script-based ASR split, but the Punjabi interpretation lacks the direct tests needed to pin it on recognition limits.

read the letter

The paper introduces MLingualFC, a benchmark that puts flowchart-encoded harmful instructions into five languages and measures ASR on models like Qwen2.5-VL, Gemma-4, and Pangea. The headline result is the split: Latin-script languages show high success rates while Punjabi shows substantially lower ones.

What works is the straightforward extension of the English flowchart method to these languages and the explicit comparison by script. That difference is new data within the existing jailbreak evaluation line, and the black-box setup is easy to understand.

The soft spot is the causal reading of the Punjabi drop. The abstract treats it as evidence of visual text recognition limits rather than stronger safety, yet nothing in the visible material tests that directly—no OCR accuracy numbers across languages, no side-by-side text versus flowchart runs, and no safety ablation. The numbers could equally reflect unmeasured language-specific alignment differences or other factors. The stress-test note is right on this point.

This is for people who track VLM safety robustness in non-English settings. Anyone building or auditing multilingual models would want to see the benchmark and the raw ASR gap. The work is coherent enough on its own terms to merit referee time, even if the interpretation section needs tightening and the full methods need checking.

Referee Report

1 major / 3 minor

Summary. The paper introduces MLingualFC, a multilingual multimodal benchmark that encodes harmful instructions into flowchart images across five languages (Hindi, Punjabi, Spanish, Romanian, German) to evaluate jailbreak vulnerabilities in VLMs including Qwen2.5-VL, Gemma-4, and Pangea under a black-box threat model. It reports high attack success rates (ASR) for Latin-script languages via visual flowchart prompts, while non-Latin scripts such as Punjabi exhibit substantially lower ASR, which the authors suggest may indicate visual text recognition limitations rather than stronger safety alignment. The work concludes that current VLM safety mechanisms fail to generalize across languages and modalities, with resources released at a GitHub repository.

Significance. If the empirical measurements hold, the paper is significant for extending jailbreak evaluation to a multilingual multimodal setting and releasing an open benchmark that can support future work on cross-lingual VLM safety. The public code and dataset constitute a clear reproducibility strength.

major comments (1)

[Abstract and results discussion] Abstract and discussion of results: the interpretation that substantially lower ASR for Punjabi flowcharts reflects visual text recognition failure (rather than language-specific safety differences or other factors) is presented as a suggestion but rests on an untested causal claim; no OCR/transcription accuracy measurements, no direct comparison of the same content supplied as readable text versus flowchart, and no safety-component ablations are reported to distinguish these explanations.

minor comments (3)

[Methods] Methods section: the exact number of flowchart instances per language, the source of the harmful instructions, and the precise definition and computation of ASR (including any refusal detection criteria) should be stated explicitly.
[Results] Results: report numerical ASR values (with error bars or confidence intervals if multiple runs) for each language-model pair rather than qualitative descriptors such as 'high' and 'substantially lower'.
[Evaluation] Evaluation setup: specify the exact model versions, temperature settings, and prompt templates used for all three VLMs.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below and agree that the interpretation requires clearer framing as a hypothesis.

read point-by-point responses

Referee: [Abstract and results discussion] Abstract and discussion of results: the interpretation that substantially lower ASR for Punjabi flowcharts reflects visual text recognition failure (rather than language-specific safety differences or other factors) is presented as a suggestion but rests on an untested causal claim; no OCR/transcription accuracy measurements, no direct comparison of the same content supplied as readable text versus flowchart, and no safety-component ablations are reported to distinguish these explanations.

Authors: We agree that the suggestion of visual text recognition limitations for non-Latin scripts such as Punjabi is an untested causal interpretation, as no OCR accuracy measurements, text-vs-flowchart comparisons, or safety ablations are reported. In revision we will update the abstract and results discussion to present this explicitly as a hypothesis inferred from the differential ASR patterns across scripts, rather than a confirmed explanation. We will add a sentence noting that distinguishing recognition failures from language-specific safety differences would require the measurements identified by the referee and is left for future work. The reported ASR numbers and experimental setup remain unchanged. revision: yes

Circularity Check

0 steps flagged

No circularity: pure empirical benchmark with direct ASR measurements

full rationale

The paper introduces MLingualFC as a benchmark, encodes harmful instructions into flowchart images across languages, and reports attack success rates from black-box evaluations on VLMs. No equations, parameter fitting, derivations, or predictions appear in the provided text. The suggestion that lower Punjabi ASR reflects visual recognition limits is an interpretive remark, not a load-bearing claim derived from self-referential steps or fitted inputs. All results are direct counts from experiments, making the work self-contained against external benchmarks with no reduction to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical benchmark paper with no mathematical derivations; no free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.1-grok · 5767 in / 1101 out tokens · 28258 ms · 2026-06-27T21:45:39.925973+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

108 extracted references · 10 canonical work pages · 4 internal anchors

[1]

F act E val: Evaluating the Robustness of Fact Verification Systems in the Era of Large Language Models

Mamta and Cocarascu, Oana. F act E val: Evaluating the Robustness of Fact Verification Systems in the Era of Large Language Models. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.naacl-long.534

work page doi:10.18653/v1/2025.naacl-long.534 2025
[2]

Qwen2.5-VL Technical Report

Shuai Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Sibo Song and Kai Dang and Peng Wang and Shijie Wang and Jun Tang and Humen Zhong and Yuanzhi Zhu and Ming. Qwen2.5-VL Technical Report , journal =. 2025 , url =. doi:10.48550/ARXIV.2502.13923 , eprinttype =. 2502.13923 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.13923 2025
[3]

From LLM s to MLLM s: Exploring the Landscape of Multimodal Jailbreaking

Wang, Siyuan and Long, Zhuohan and Fan, Zhihao and Wei, Zhongyu. From LLM s to MLLM s: Exploring the Landscape of Multimodal Jailbreaking. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.973

work page doi:10.18653/v1/2024.emnlp-main.973 2024
[4]

The Thirteenth International Conference on Learning Representations , year=

Pangea: A fully open multilingual multimodal llm for 39 languages , author=. The Thirteenth International Conference on Learning Representations , year=
[5]

European Conference on Computer Vision , pages=

Images are achilles’ heel of alignment: Exploiting visual vulnerabilities for jailbreaking multimodal large language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[6]

European Conference on Computer Vision , pages=

Mm-safetybench: A benchmark for safety evaluation of multimodal large language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[7]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Mmj-bench: A comprehensive study on jailbreak attacks and defenses for vision language models , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[8]

Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models , author=
[9]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Jailbreak large vision-language models through multi-modal linkage , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[10]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Figstep: Jailbreaking large vision-language models via typographic visual prompts , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[11]

Proceedings of the 41st International Conference on Machine Learning , pages=

Image Hijacks: adversarial images can control generative models at runtime , author=. Proceedings of the 41st International Conference on Machine Learning , pages=
[12]

The Thirteenth International Conference on Learning Representations,

Maksym Andriushchenko and Francesco Croce and Nicolas Flammarion , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025
[13]

JailBreakV: A benchmark for assessing the robustness of MultiModal large language models against jailbreak attacks.arXiv preprint arXiv:2404.03027,

Weidi Luo and Siyuan Ma and Xiaogeng Liu and Xiaoyu Guo and Chaowei Xiao , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2404.03027 , eprinttype =. 2404.03027 , timestamp =

work page doi:10.48550/arxiv.2404.03027 2024
[14]

CoRR , volume =

Shuo Chen and Zhen Han and Bailan He and Zifeng Ding and Wenqian Yu and Philip Torr and Volker Tresp and Jindong Gu , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2404.03411 , eprinttype =. 2404.03411 , timestamp =

work page doi:10.48550/arxiv.2404.03411 2024
[15]

Qwen2.5-1M Technical Report

An Yang and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoyan Huang and Jiandong Jiang and Jianhong Tu and Jianwei Zhang and Jingren Zhou and Junyang Lin and Kai Dang and Kexin Yang and Le Yu and Mei Li and Minmin Sun and Qin Zhu and Rui Men and Tao He and Weijia Xu and Wenbiao Yin and Wenyuan Yu and Xiafei Qiu and Xingzhang Ren and Xinl...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.15383 2025
[16]

The Thirteenth International Conference on Learning Representations,

Xiang Yue and Yueqi Song and Akari Asai and Seungone Kim and Jean de Dieu Nyandwi and Simran Khanuja and Anjali Kantharuban and Lintang Sutawika and Sathyanarayanan Ramamoorthy and Graham Neubig , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025
[17]

Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

Multilingual blending: Large language model safety alignment evaluation with language mixture , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

2025
[18]

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track) , pages=

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages , author=. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track) , pages=
[19]

Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models

Enyi Shi and Pengyang Shao and Yanxin Zhang and Chenhang Cui and Jiayi Lyu and Xu Xie and Xiaobo Xia and Fei Shen and Tat. Lingua-SafetyBench:. CoRR , volume =. 2026 , url =. doi:10.48550/ARXIV.2601.22737 , eprinttype =. 2601.22737 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.22737 2026
[20]

The Thirteenth International Conference on Learning Representations,

Yi Ding and Bolian Li and Ruqi Zhang , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025
[21]

Zhang, Ziyi and Sun, Zhen and Zhang, Zongmin and Guo, Jihui and He, Xinlei , journal =
[22]

Zou, Andy and Wang, Zifan and Carlini, Nicholas and Nasr, Milad and Kolter, J Zico and Fredrikson, Matt , journal =
[23]

Proceedings of the ACL Workshop on LLM Security (LLMSec) , year =

Derner, Erik and Basti. Proceedings of the ACL Workshop on LLM Security (LLMSec) , year =
[24]

Yoo, Haneul and Yang, Yongjin and Lee, Hwaran , journal =
[25]

Yong, Zheng-Xin and Menghini, Cristina and Bach, Stephen H , journal =
[26]

Liu, Xin and Zhu, Yichen and Gu, Jindong and Lan, Yunshi and Yang, Chao and Qiao, Yu , booktitle =
[27]

arXiv preprint arXiv:2508.10925 , year=

gpt-oss-120b & gpt-oss-20b model card , author=. arXiv preprint arXiv:2508.10925 , year=

Pith/arXiv arXiv
[28]

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Feng Li and Renrui Zhang and Hao Zhang and Yuanhan Zhang and Bo Li and Wei Li and Zejun Ma and Chunyuan Li , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2407.07895 , eprinttype =. 2407.07895 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.07895 2024
[29]

Gu, Tianle and Zhou, Zeyang and Huang, Kexin and others , booktitle =
[30]

Wang, Yu and others , booktitle =
[31]

Achiam, Josh and Adler, Steven and Agarwal, Sandhini and others , journal =
[32]

Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and others , journal =
[33]

Chang, Yupeng and Wang, Xu and Wang, Jindong and others , journal =
[34]

Carlini, Nicholas and Nasr, Milad and Choquette-Choo, Christopher A and others , booktitle =
[35]

Bailey, Luke and Ong, Euan and Russell, Stuart and Emmons, Scott , booktitle =
[36]

Qi, Xiangyu and Zeng, Yi and Xie, Tinghao and others , booktitle =
[37]

Liu, Haotian and Li, Chunyuan and Li, Yuheng and Lee, Yong Jae , booktitle =
[38]

Wang, Zhizheng and others , journal =
[39]

and Wong, Eric , booktitle=

Chao, Patrick and Robey, Alexander and Dobriban, Edgar and Hassani, Hamed and Pappas, George J. and Wong, Eric , booktitle=. Jailbreaking Black Box Large Language Models in Twenty Queries , year=
[40]

LLaVA-NeXT: Improved reasoning, OCR, and world knowledge , url=

Liu, Haotian and Li, Chunyuan and Li, Yuheng and Li, Bo and Zhang, Yuanhan and Shen, Sheng and Lee, Yong Jae , month=. LLaVA-NeXT: Improved reasoning, OCR, and world knowledge , url=
[41]

CoRR , volume =

Peng Wang and Shuai Bai and Sinan Tan and Shijie Wang and Zhihao Fan and Jinze Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Yang Fan and Kai Dang and Mengfei Du and Xuancheng Ren and Rui Men and Dayiheng Liu and Chang Zhou and Jingren Zhou and Junyang Lin , title =. CoRR , volume =
[42]

Vegas Cybertruck bomber 'used ChatGPT to plan explosion' , year =
[43]

Qwen2.5-VL Technical Report , journal =

Shuai Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Sibo Song and Kai Dang and Peng Wang and Shijie Wang and Jun Tang and Humen Zhong and Yuanzhi Zhu and Ming. Qwen2.5-VL Technical Report , journal =
[44]

2024 , note =

Anthropic , title =. 2024 , note =

2024
[45]

2024 , howpublished =

OpenAI , title =. 2024 , howpublished =

2024
[46]

Goucher and Adam Perelman and Aditya Ramesh and Aidan Clark and AJ Ostrow and Akila Welihinda and Alan Hayes and Alec Radford and Aleksander Madry and Alex Baker

Aaron Hurst and Adam Lerer and Adam P. Goucher and Adam Perelman and Aditya Ramesh and Aidan Clark and AJ Ostrow and Akila Welihinda and Alan Hayes and Alec Radford and Aleksander Madry and Alex Baker. GPT-4o System Card , journal =
[47]

arXiv preprint arXiv:2303.08774 , year=

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

Pith/arXiv arXiv
[48]

2024 , howpublished =

Google , title =. 2024 , howpublished =

2024
[49]

Jingyi Zhang and Jiaxing Huang and Sheng Jin and Shijian Lu , title =
[50]

A Survey of Large Language Models , journal =

Wayne Xin Zhao and Kun Zhou and Junyi Li and Tianyi Tang and Xiaolei Wang and Yupeng Hou and Yingqian Min and Beichen Zhang and Junjie Zhang and Zican Dong and Yifan Du and Chen Yang and Yushuo Chen and Zhipeng Chen and Jinhao Jiang and Ruiyang Ren and Yifan Li and Xinyu Tang and Zikang Liu and Peiyu Liu and Jian. A Survey of Large Language Models , journal =
[51]

Yu and Qiang Yang and Xing Xie , title =

Yupeng Chang and Xu Wang and Jindong Wang and Yuan Wu and Linyi Yang and Kaijie Zhu and Hao Chen and Xiaoyuan Yi and Cunxiang Wang and Yidong Wang and Wei Ye and Yue Zhang and Yi Chang and Philip S. Yu and Qiang Yang and Xing Xie , title =
[52]

Lawrence Zitnick and Devi Parikh , title =

Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh , title =. 2015

2015
[53]

CoRR , volume =

Xiaowei Hu and Zhe Gan and Jianfeng Wang and Zhengyuan Yang and Zicheng Liu and Yumao Lu and Lijuan Wang , title =. CoRR , volume =
[54]

Zaid Khan and B. G. Vijay Kumar and Samuel Schulter and Xiang Yu and Yun Fu and Manmohan Chandraker , title =
[55]

Zhenwei Shao and Zhou Yu and Meng Wang and Jun Yu , title =
[56]

Jiaxuan Li and Duc Minh Vo and Akihiro Sugimoto and Hideki Nakayama , title =
[57]

Rowan Zellers and Yonatan Bisk and Ali Farhadi and Yejin Choi , title =
[58]

Thirty-Fifth

Ryota Tanaka and Kyosuke Nishida and Sen Yoshida , title =. Thirty-Fifth
[59]

Haotian Liu and Chunyuan Li and Yuheng Li and Yong Jae Lee , title =
[60]

High-Resolution Image Synthesis with Latent Diffusion Models , booktitle =

Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Bj. High-Resolution Image Synthesis with Latent Diffusion Models , booktitle =
[61]

Bowman , title =

David Rein and Betty Li Hou and Asa Cooper Stickland and Jackson Petty and Richard Yuanzhe Pang and Julien Dirani and Julian Michael and Samuel R. Bowman , title =. CoRR , volume =
[62]

Evaluating Large Language Models Trained on Code , journal =

Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Pond. Evaluating Large Language Models Trained on Code , journal =
[63]

CoRR , volume =

Sibo Yi and Yule Liu and Zhen Sun and Tianshuo Cong and Xinlei He and Jiaxing Song and Ke Xu and Qi Li , title =. CoRR , volume =
[64]

Choquette

Nicholas Carlini and Milad Nasr and Christopher A. Choquette. Are aligned neural networks adversarially aligned? , booktitle =
[65]

Erfan Shayegani and Yue Dong and Nael B. Abu. Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models , booktitle =
[66]

(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs , journal =

Eugene Bagdasaryan and Tsung. (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs , journal =
[67]

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models , booktitle =

Yifan Li and Hangyu Guo and Kun Zhou and Wayne Xin Zhao and Ji. Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models , booktitle =
[68]

FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts , booktitle =

Yichen Gong and Delong Ran and Jinyuan Liu and Conglei Wang and Tianshuo Cong and Anyu Wang and Sisi Duan and Xiaoyun Wang , editor =. FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts , booktitle =
[69]

Jailbreak Large Vision-Language Models Through Multi-Modal Linkage , booktitle =

Yu Wang and Xiaofei Zhou and Yichen Wang and Geyuan Zhang and Tianxing He , editor =. Jailbreak Large Vision-Language Models Through Multi-Modal Linkage , booktitle =
[70]

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! , booktitle =

Xiangyu Qi and Yi Zeng and Tinghao Xie and Pin. Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! , booktitle =
[71]

Albert Q. Jiang and Alexandre Sablayrolles and Arthur Mensch and Chris Bamford and Devendra Singh Chaplot and Diego de Las Casas and Florian Bressand and Gianna Lengyel and Guillaume Lample and Lucile Saulnier and L. Mistral 7B , journal =
[72]

CoRR , volume =

Daizong Liu and Mingyu Yang and Xiaoye Qu and Pan Zhou and Yu Cheng and Wei Hu , title =. CoRR , volume =
[73]

CoRR , volume =

Yizhang Jin and Jian Li and Yexin Liu and Tianjun Gu and Kai Wu and Zhengkai Jiang and Muyang He and Bo Zhao and Xin Tan and Zhenye Gan and Yabiao Wang and Chengjie Wang and Lizhuang Ma , title =. CoRR , volume =
[74]

MM-SafetyBench:

Xin Liu and Yichen Zhu and Jindong Gu and Yunshi Lan and Chao Yang and Yu Qiao , editor =. MM-SafetyBench:. Computer Vision -
[75]

CoRR , volume =

Shiji Zhao and Ranjie Duan and Fengxiang Wang and Chi Chen and Caixin Kang and Jialing Tao and YueFeng Chen and Hui Xue and Xingxing Wei , title =. CoRR , volume =
[76]

arXiv preprint arXiv:2312.10766 , year=

Jailguard: A universal detection framework for llm prompt-based attacks , author=. arXiv preprint arXiv:2312.10766 , year=

arXiv
[77]

The Twelfth International Conference on Learning Representations,

Deyao Zhu and Jun Chen and Xiaoqian Shen and Xiang Li and Mohamed Elhoseiny , title =. The Twelfth International Conference on Learning Representations,
[78]

AdaShield : Safeguarding Multimodal Large Language Models from Structure-Based Attack via Adaptive Shield Prompting , booktitle =

Yu Wang and Xiaogeng Liu and Yu Li and Muhao Chen and Chaowei Xiao , editor =. AdaShield : Safeguarding Multimodal Large Language Models from Structure-Based Attack via Adaptive Shield Prompting , booktitle =
[79]

Zhe Chen and Jiannan Wu and Wenhai Wang and Weijie Su and Guo Chen and Sen Xing and Muyan Zhong and Qinglong Zhang and Xizhou Zhu and Lewei Lu and Bin Li and Ping Luo and Tong Lu and Yu Qiao and Jifeng Dai , title =
[80]

Xiang Yue and Yuansheng Ni and Tianyu Zheng and Kai Zhang and Ruoqi Liu and Ge Zhang and Samuel Stevens and Dongfu Jiang and Weiming Ren and Yuxuan Sun and Cong Wei and Botao Yu and Ruibin Yuan and Renliang Sun and Ming Yin and Boyuan Zheng and Zhenzhu Yang and Yibo Liu and Wenhao Huang and Huan Sun and Yu Su and Wenhu Chen , title =

Showing first 80 references.

[1] [1]

F act E val: Evaluating the Robustness of Fact Verification Systems in the Era of Large Language Models

Mamta and Cocarascu, Oana. F act E val: Evaluating the Robustness of Fact Verification Systems in the Era of Large Language Models. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.naacl-long.534

work page doi:10.18653/v1/2025.naacl-long.534 2025

[2] [2]

Qwen2.5-VL Technical Report

Shuai Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Sibo Song and Kai Dang and Peng Wang and Shijie Wang and Jun Tang and Humen Zhong and Yuanzhi Zhu and Ming. Qwen2.5-VL Technical Report , journal =. 2025 , url =. doi:10.48550/ARXIV.2502.13923 , eprinttype =. 2502.13923 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.13923 2025

[3] [3]

From LLM s to MLLM s: Exploring the Landscape of Multimodal Jailbreaking

Wang, Siyuan and Long, Zhuohan and Fan, Zhihao and Wei, Zhongyu. From LLM s to MLLM s: Exploring the Landscape of Multimodal Jailbreaking. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.973

work page doi:10.18653/v1/2024.emnlp-main.973 2024

[4] [4]

The Thirteenth International Conference on Learning Representations , year=

Pangea: A fully open multilingual multimodal llm for 39 languages , author=. The Thirteenth International Conference on Learning Representations , year=

[5] [5]

European Conference on Computer Vision , pages=

Images are achilles’ heel of alignment: Exploiting visual vulnerabilities for jailbreaking multimodal large language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024

[6] [6]

European Conference on Computer Vision , pages=

Mm-safetybench: A benchmark for safety evaluation of multimodal large language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024

[7] [7]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Mmj-bench: A comprehensive study on jailbreak attacks and defenses for vision language models , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[8] [8]

Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models , author=

[9] [9]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Jailbreak large vision-language models through multi-modal linkage , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[10] [10]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Figstep: Jailbreaking large vision-language models via typographic visual prompts , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[11] [11]

Proceedings of the 41st International Conference on Machine Learning , pages=

Image Hijacks: adversarial images can control generative models at runtime , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

[12] [12]

The Thirteenth International Conference on Learning Representations,

Maksym Andriushchenko and Francesco Croce and Nicolas Flammarion , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025

[13] [13]

JailBreakV: A benchmark for assessing the robustness of MultiModal large language models against jailbreak attacks.arXiv preprint arXiv:2404.03027,

Weidi Luo and Siyuan Ma and Xiaogeng Liu and Xiaoyu Guo and Chaowei Xiao , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2404.03027 , eprinttype =. 2404.03027 , timestamp =

work page doi:10.48550/arxiv.2404.03027 2024

[14] [14]

CoRR , volume =

Shuo Chen and Zhen Han and Bailan He and Zifeng Ding and Wenqian Yu and Philip Torr and Volker Tresp and Jindong Gu , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2404.03411 , eprinttype =. 2404.03411 , timestamp =

work page doi:10.48550/arxiv.2404.03411 2024

[15] [15]

Qwen2.5-1M Technical Report

An Yang and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoyan Huang and Jiandong Jiang and Jianhong Tu and Jianwei Zhang and Jingren Zhou and Junyang Lin and Kai Dang and Kexin Yang and Le Yu and Mei Li and Minmin Sun and Qin Zhu and Rui Men and Tao He and Weijia Xu and Wenbiao Yin and Wenyuan Yu and Xiafei Qiu and Xingzhang Ren and Xinl...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.15383 2025

[16] [16]

The Thirteenth International Conference on Learning Representations,

Xiang Yue and Yueqi Song and Akari Asai and Seungone Kim and Jean de Dieu Nyandwi and Simran Khanuja and Anjali Kantharuban and Lintang Sutawika and Sathyanarayanan Ramamoorthy and Graham Neubig , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025

[17] [17]

Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

Multilingual blending: Large language model safety alignment evaluation with language mixture , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

2025

[18] [18]

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track) , pages=

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages , author=. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track) , pages=

[19] [19]

Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models

Enyi Shi and Pengyang Shao and Yanxin Zhang and Chenhang Cui and Jiayi Lyu and Xu Xie and Xiaobo Xia and Fei Shen and Tat. Lingua-SafetyBench:. CoRR , volume =. 2026 , url =. doi:10.48550/ARXIV.2601.22737 , eprinttype =. 2601.22737 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.22737 2026

[20] [20]

The Thirteenth International Conference on Learning Representations,

Yi Ding and Bolian Li and Ruqi Zhang , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025

[21] [21]

Zhang, Ziyi and Sun, Zhen and Zhang, Zongmin and Guo, Jihui and He, Xinlei , journal =

[22] [22]

Zou, Andy and Wang, Zifan and Carlini, Nicholas and Nasr, Milad and Kolter, J Zico and Fredrikson, Matt , journal =

[23] [23]

Proceedings of the ACL Workshop on LLM Security (LLMSec) , year =

Derner, Erik and Basti. Proceedings of the ACL Workshop on LLM Security (LLMSec) , year =

[24] [24]

Yoo, Haneul and Yang, Yongjin and Lee, Hwaran , journal =

[25] [25]

Yong, Zheng-Xin and Menghini, Cristina and Bach, Stephen H , journal =

[26] [26]

Liu, Xin and Zhu, Yichen and Gu, Jindong and Lan, Yunshi and Yang, Chao and Qiao, Yu , booktitle =

[27] [27]

arXiv preprint arXiv:2508.10925 , year=

gpt-oss-120b & gpt-oss-20b model card , author=. arXiv preprint arXiv:2508.10925 , year=

Pith/arXiv arXiv

[28] [28]

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Feng Li and Renrui Zhang and Hao Zhang and Yuanhan Zhang and Bo Li and Wei Li and Zejun Ma and Chunyuan Li , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2407.07895 , eprinttype =. 2407.07895 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.07895 2024

[29] [29]

Gu, Tianle and Zhou, Zeyang and Huang, Kexin and others , booktitle =

[30] [30]

Wang, Yu and others , booktitle =

[31] [31]

Achiam, Josh and Adler, Steven and Agarwal, Sandhini and others , journal =

[32] [32]

Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and others , journal =

[33] [33]

Chang, Yupeng and Wang, Xu and Wang, Jindong and others , journal =

[34] [34]

Carlini, Nicholas and Nasr, Milad and Choquette-Choo, Christopher A and others , booktitle =

[35] [35]

Bailey, Luke and Ong, Euan and Russell, Stuart and Emmons, Scott , booktitle =

[36] [36]

Qi, Xiangyu and Zeng, Yi and Xie, Tinghao and others , booktitle =

[37] [37]

Liu, Haotian and Li, Chunyuan and Li, Yuheng and Lee, Yong Jae , booktitle =

[38] [38]

Wang, Zhizheng and others , journal =

[39] [39]

and Wong, Eric , booktitle=

Chao, Patrick and Robey, Alexander and Dobriban, Edgar and Hassani, Hamed and Pappas, George J. and Wong, Eric , booktitle=. Jailbreaking Black Box Large Language Models in Twenty Queries , year=

[40] [40]

LLaVA-NeXT: Improved reasoning, OCR, and world knowledge , url=

Liu, Haotian and Li, Chunyuan and Li, Yuheng and Li, Bo and Zhang, Yuanhan and Shen, Sheng and Lee, Yong Jae , month=. LLaVA-NeXT: Improved reasoning, OCR, and world knowledge , url=

[41] [41]

CoRR , volume =

Peng Wang and Shuai Bai and Sinan Tan and Shijie Wang and Zhihao Fan and Jinze Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Yang Fan and Kai Dang and Mengfei Du and Xuancheng Ren and Rui Men and Dayiheng Liu and Chang Zhou and Jingren Zhou and Junyang Lin , title =. CoRR , volume =

[42] [42]

Vegas Cybertruck bomber 'used ChatGPT to plan explosion' , year =

[43] [43]

Qwen2.5-VL Technical Report , journal =

Shuai Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Sibo Song and Kai Dang and Peng Wang and Shijie Wang and Jun Tang and Humen Zhong and Yuanzhi Zhu and Ming. Qwen2.5-VL Technical Report , journal =

[44] [44]

2024 , note =

Anthropic , title =. 2024 , note =

2024

[45] [45]

2024 , howpublished =

OpenAI , title =. 2024 , howpublished =

2024

[46] [46]

Goucher and Adam Perelman and Aditya Ramesh and Aidan Clark and AJ Ostrow and Akila Welihinda and Alan Hayes and Alec Radford and Aleksander Madry and Alex Baker

Aaron Hurst and Adam Lerer and Adam P. Goucher and Adam Perelman and Aditya Ramesh and Aidan Clark and AJ Ostrow and Akila Welihinda and Alan Hayes and Alec Radford and Aleksander Madry and Alex Baker. GPT-4o System Card , journal =

[47] [47]

arXiv preprint arXiv:2303.08774 , year=

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

Pith/arXiv arXiv

[48] [48]

2024 , howpublished =

Google , title =. 2024 , howpublished =

2024

[49] [49]

Jingyi Zhang and Jiaxing Huang and Sheng Jin and Shijian Lu , title =

[50] [50]

A Survey of Large Language Models , journal =

Wayne Xin Zhao and Kun Zhou and Junyi Li and Tianyi Tang and Xiaolei Wang and Yupeng Hou and Yingqian Min and Beichen Zhang and Junjie Zhang and Zican Dong and Yifan Du and Chen Yang and Yushuo Chen and Zhipeng Chen and Jinhao Jiang and Ruiyang Ren and Yifan Li and Xinyu Tang and Zikang Liu and Peiyu Liu and Jian. A Survey of Large Language Models , journal =

[51] [51]

Yu and Qiang Yang and Xing Xie , title =

Yupeng Chang and Xu Wang and Jindong Wang and Yuan Wu and Linyi Yang and Kaijie Zhu and Hao Chen and Xiaoyuan Yi and Cunxiang Wang and Yidong Wang and Wei Ye and Yue Zhang and Yi Chang and Philip S. Yu and Qiang Yang and Xing Xie , title =

[52] [52]

Lawrence Zitnick and Devi Parikh , title =

Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh , title =. 2015

2015

[53] [53]

CoRR , volume =

Xiaowei Hu and Zhe Gan and Jianfeng Wang and Zhengyuan Yang and Zicheng Liu and Yumao Lu and Lijuan Wang , title =. CoRR , volume =

[54] [54]

Zaid Khan and B. G. Vijay Kumar and Samuel Schulter and Xiang Yu and Yun Fu and Manmohan Chandraker , title =

[55] [55]

Zhenwei Shao and Zhou Yu and Meng Wang and Jun Yu , title =

[56] [56]

Jiaxuan Li and Duc Minh Vo and Akihiro Sugimoto and Hideki Nakayama , title =

[57] [57]

Rowan Zellers and Yonatan Bisk and Ali Farhadi and Yejin Choi , title =

[58] [58]

Thirty-Fifth

Ryota Tanaka and Kyosuke Nishida and Sen Yoshida , title =. Thirty-Fifth

[59] [59]

Haotian Liu and Chunyuan Li and Yuheng Li and Yong Jae Lee , title =

[60] [60]

High-Resolution Image Synthesis with Latent Diffusion Models , booktitle =

Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Bj. High-Resolution Image Synthesis with Latent Diffusion Models , booktitle =

[61] [61]

Bowman , title =

David Rein and Betty Li Hou and Asa Cooper Stickland and Jackson Petty and Richard Yuanzhe Pang and Julien Dirani and Julian Michael and Samuel R. Bowman , title =. CoRR , volume =

[62] [62]

Evaluating Large Language Models Trained on Code , journal =

Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Pond. Evaluating Large Language Models Trained on Code , journal =

[63] [63]

CoRR , volume =

Sibo Yi and Yule Liu and Zhen Sun and Tianshuo Cong and Xinlei He and Jiaxing Song and Ke Xu and Qi Li , title =. CoRR , volume =

[64] [64]

Choquette

Nicholas Carlini and Milad Nasr and Christopher A. Choquette. Are aligned neural networks adversarially aligned? , booktitle =

[65] [65]

Erfan Shayegani and Yue Dong and Nael B. Abu. Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models , booktitle =

[66] [66]

(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs , journal =

Eugene Bagdasaryan and Tsung. (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs , journal =

[67] [67]

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models , booktitle =

Yifan Li and Hangyu Guo and Kun Zhou and Wayne Xin Zhao and Ji. Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models , booktitle =

[68] [68]

FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts , booktitle =

Yichen Gong and Delong Ran and Jinyuan Liu and Conglei Wang and Tianshuo Cong and Anyu Wang and Sisi Duan and Xiaoyun Wang , editor =. FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts , booktitle =

[69] [69]

Jailbreak Large Vision-Language Models Through Multi-Modal Linkage , booktitle =

Yu Wang and Xiaofei Zhou and Yichen Wang and Geyuan Zhang and Tianxing He , editor =. Jailbreak Large Vision-Language Models Through Multi-Modal Linkage , booktitle =

[70] [70]

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! , booktitle =

Xiangyu Qi and Yi Zeng and Tinghao Xie and Pin. Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! , booktitle =

[71] [71]

Albert Q. Jiang and Alexandre Sablayrolles and Arthur Mensch and Chris Bamford and Devendra Singh Chaplot and Diego de Las Casas and Florian Bressand and Gianna Lengyel and Guillaume Lample and Lucile Saulnier and L. Mistral 7B , journal =

[72] [72]

CoRR , volume =

Daizong Liu and Mingyu Yang and Xiaoye Qu and Pan Zhou and Yu Cheng and Wei Hu , title =. CoRR , volume =

[73] [73]

CoRR , volume =

Yizhang Jin and Jian Li and Yexin Liu and Tianjun Gu and Kai Wu and Zhengkai Jiang and Muyang He and Bo Zhao and Xin Tan and Zhenye Gan and Yabiao Wang and Chengjie Wang and Lizhuang Ma , title =. CoRR , volume =

[74] [74]

MM-SafetyBench:

Xin Liu and Yichen Zhu and Jindong Gu and Yunshi Lan and Chao Yang and Yu Qiao , editor =. MM-SafetyBench:. Computer Vision -

[75] [75]

CoRR , volume =

Shiji Zhao and Ranjie Duan and Fengxiang Wang and Chi Chen and Caixin Kang and Jialing Tao and YueFeng Chen and Hui Xue and Xingxing Wei , title =. CoRR , volume =

[76] [76]

arXiv preprint arXiv:2312.10766 , year=

Jailguard: A universal detection framework for llm prompt-based attacks , author=. arXiv preprint arXiv:2312.10766 , year=

arXiv

[77] [77]

The Twelfth International Conference on Learning Representations,

Deyao Zhu and Jun Chen and Xiaoqian Shen and Xiang Li and Mohamed Elhoseiny , title =. The Twelfth International Conference on Learning Representations,

[78] [78]

AdaShield : Safeguarding Multimodal Large Language Models from Structure-Based Attack via Adaptive Shield Prompting , booktitle =

Yu Wang and Xiaogeng Liu and Yu Li and Muhao Chen and Chaowei Xiao , editor =. AdaShield : Safeguarding Multimodal Large Language Models from Structure-Based Attack via Adaptive Shield Prompting , booktitle =

[79] [79]

Zhe Chen and Jiannan Wu and Wenhai Wang and Weijie Su and Guo Chen and Sen Xing and Muyan Zhong and Qinglong Zhang and Xizhou Zhu and Lewei Lu and Bin Li and Ping Luo and Tong Lu and Yu Qiao and Jifeng Dai , title =

[80] [80]

Xiang Yue and Yuansheng Ni and Tianyu Zheng and Kai Zhang and Ruoqi Liu and Ge Zhang and Samuel Stevens and Dongfu Jiang and Weiming Ren and Yuxuan Sun and Cong Wei and Botao Yu and Ruibin Yuan and Renliang Sun and Ming Yin and Boyuan Zheng and Zhenzhu Yang and Yibo Liu and Wenhao Huang and Huan Sun and Yu Su and Wenhu Chen , title =