DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs

Deyue Zhang; Dongdong Yang; Quanchen Zou; Wenzhuo Xu; Xiangzheng Zhang; Zhipeng Wei; Zonghao Ying

arxiv: 2605.18915 · v1 · pith:KQNXFIWFnew · submitted 2026-05-18 · 💻 cs.CR · cs.AI

DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs

Wenzhuo Xu , Zhipeng Wei , Zonghao Ying , Deyue Zhang , Dongdong Yang , Xiangzheng Zhang , Quanchen Zou This is my paper

Pith reviewed 2026-05-20 10:15 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords jailbreak attacksmultimodal LLMsmulti-image inputscompositional frameworksafety alignmentattack success rateGPT-4ovulnerabilities

0 comments

The pith

DMN uses multi-image inputs to distribute jailbreak instructions and achieve over 90% success on GPT-4o, Gemini, and Claude.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents DMN as a compositional jailbreak method for multimodal LLMs that accept multiple images at once. It spreads harmful instructions across separate images, supplies supporting visual evidence, and adds a number-chain visual reasoning task to divert the model from safety checks. The approach rests on the observation that current safety alignment has focused less on multi-image scenarios than on text or single images. Experiments report attack success rates above 90 percent on GPT-4o, Gemini-2.5-pro, and Claude Sonnet 4, well above prior single-image baselines. If the results hold, multi-image support itself becomes a new and sizable attack surface.

Core claim

The DMN framework combines distributed instruction across multiple images, multimodal evidence, and a number chain task to expand the attack space and distract safety mechanisms, producing attack success rates over 90 percent on GPT-4o, Gemini-2.5-pro, and Claude Sonnet 4 while outperforming single-image methods.

What carries the argument

The DMN compositional strategy that splits the jailbreak into Distributed instruction (D), Multimodal evidence (M), and Number chain task (N) applied simultaneously to several images.

If this is right

Multi-image input capability in MLLMs creates exploitable gaps because alignment work has not kept pace with that feature.
Spreading instructions and evidence across images bypasses filters that single-image attacks cannot overcome.
Adding a distracting visual reasoning task reduces the model's focus on detecting harmful intent.
Compositional multi-image attacks reach success rates above 90 percent on major models where earlier methods do not.
Current safety mechanisms contain fundamental weaknesses when inputs arrive as coordinated sets of images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Safety training pipelines for MLLMs should add routine multi-image adversarial testing as a standard step.
Defenses could check consistency of intent across the full set of images rather than treating each image in isolation.
Similar distribution tactics might transfer to other input formats such as video or interleaved text-image sequences.
Deployment policies for MLLMs may need updated risk assessments that treat multi-image support as an elevated attack vector.

Load-bearing premise

That MLLMs have received far less safety alignment for multi-image inputs than for single-image or text inputs.

What would settle it

Fine-tune or retrain one of the tested MLLMs with explicit multi-image safety examples and then measure whether DMN still reaches above 50 percent attack success rate.

Figures

Figures reproduced from arXiv: 2605.18915 by Deyue Zhang, Dongdong Yang, Quanchen Zou, Wenzhuo Xu, Xiangzheng Zhang, Zhipeng Wei, Zonghao Ying.

**Figure 2.** Figure 2: The pipeline of multimodal evidence generation. First, an auxiliary LLM is utilized to generate a realistic [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: A number chain frame example. MLLMs are instructed to extract 9 from this frame, and extract the next number in the 4th frame [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: An illustration of the image sequence gen [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: An example of input image sequence, input text and GPT-5’s corresponding output. The text input and [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: The ASR and word count (only jailbroken re [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 6.** Figure 6: The KFAR distribution of each task on SafeBench. BFI, CDFI and NC refer to blank frame indexing, cat/dog frame indexing and number chain task, respectively. Attention weights are obtained on Qwen-2.5-VL-7B. On the number of evidence pairs. To investigate the effect of the amount of multimodal evidence on jailbreak performance, we test the ASR of DMN using different number of evidence pairs. We also calcula… view at source ↗

read the original abstract

Multimodal Large Language Models (MLLMs) are vulnerable to jailbreak attacks, which can elicit harmful responses from MLLMs. Many MLLMs support multi-image inputs, inadvertently introducing new vulnerabilities due to less efforts on multi-image safety alignment. Previous MLLM jailbreak methods only uses a single image, which restricts the attack space: they cannot distribute harmful requests across multiple images, carry abundant information, or exploit additional visual reasoning tasks to distract MLLMs. To address these limitations, in this paper, we propose a compositional jailbreak framework, \textbf{DMN}, which leverages \textbf{D}istributed instruction, \textbf{M}ultimodal evidence and a \textbf{N}umber chain task to fully enhance the jailbreak performance. Extensive experiments show that DMN is highly effective for MLLM jailbreaking, e.g. achieving attack success rates of over 90\% on GPT-4o, Gemini-2.5-pro and Claude Sonnet 4, surpassing other baselines by a large margin. This compositional, multi-image jailbreak strategy reveals fundamental weaknesses in their safety mechanisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DMN claims strong multi-image jailbreak results but the abstract leaves open whether the composition itself, rather than just using more images, drives the gains.

read the letter

The punchline on this paper is that it presents a multi-image compositional jailbreak called DMN that reportedly achieves high attack success rates on current top MLLMs, but the evidence for why the specific composition works is not yet clear from the abstract. What is new is the use of distributed instructions across images combined with a number chain distraction task. Previous methods were limited to single images, so this opens up more ways to pack information and distract the model. That part makes sense as models add multi-image capabilities without matching safety work. The paper does a decent job pointing out the gap in multi-image safety alignment. It's a fair observation that less effort has gone into that area so far. Where it gets soft is the experimental side. We don't see the full protocol, how the baselines were modified for multiple images, or any statistical details. The stress-test note is on point here: if the baselines stayed single-image, then the big gains could just be from giving the model more visual input rather than the clever composition. That needs to be ruled out to make the central claim stick. Overall, this is aimed at the adversarial ML and AI safety crowd. Someone building or testing safeguards for multimodal systems could pick up ideas from the attack design. It has enough of a hook to go to peer review, though it will need substantial work on the methods and results to be convincing. My recommendation is to send it out for review.

Referee Report

2 major / 1 minor

Summary. The paper proposes DMN, a compositional jailbreak framework for multimodal LLMs that distributes harmful instructions across multiple images, incorporates multimodal evidence, and adds a number-chain distraction task. It reports attack success rates exceeding 90% on GPT-4o, Gemini-2.5-pro, and Claude Sonnet 4, substantially outperforming prior single-image baselines, and attributes new vulnerabilities to insufficient multi-image safety alignment.

Significance. If the empirical results are robust and properly controlled, the work would usefully document a new attack surface arising from multi-image input support and demonstrate that compositional multi-image strategies can materially increase jailbreak effectiveness beyond single-image methods.

major comments (2)

[Abstract / Experiments] Abstract and experimental sections: the central claim that DMN 'surpasses other baselines by a large margin' requires explicit confirmation that baseline methods were re-implemented with the same number of images and total visual tokens as DMN; without this control, the reported gains cannot be attributed to the compositional elements (distributed instruction, multimodal evidence, number-chain task) rather than simply the use of multiple images, which the abstract itself links to reduced safety alignment effort.
[Abstract] Abstract: the reported attack success rates above 90% are stated without any accompanying experimental protocol, number of queries, success criteria definition, statistical tests, or failure-case analysis, rendering the quantitative claims impossible to evaluate from the provided text.

minor comments (1)

[Experiments] Ensure that all baseline methods are described with the exact image count and prompt format used in the DMN evaluation to allow direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and experimental sections: the central claim that DMN 'surpasses other baselines by a large margin' requires explicit confirmation that baseline methods were re-implemented with the same number of images and total visual tokens as DMN; without this control, the reported gains cannot be attributed to the compositional elements (distributed instruction, multimodal evidence, number-chain task) rather than simply the use of multiple images, which the abstract itself links to reduced safety alignment effort.

Authors: We agree that a controlled comparison is necessary to isolate the contribution of the compositional elements. In the revised manuscript, we will re-implement the baseline methods using the same number of images and matched total visual tokens as DMN. Updated results and analysis will be added to the Experiments section and referenced in the abstract to ensure the performance gains are clearly attributable to distributed instructions, multimodal evidence, and the number-chain task rather than multi-image input alone. revision: yes
Referee: [Abstract] Abstract: the reported attack success rates above 90% are stated without any accompanying experimental protocol, number of queries, success criteria definition, statistical tests, or failure-case analysis, rendering the quantitative claims impossible to evaluate from the provided text.

Authors: We acknowledge that the abstract should be more self-contained. In the revision, we will add a brief description of the experimental protocol, including the number of queries evaluated, the definition of attack success criteria, reference to statistical tests, and mention of failure-case analysis. These details are already provided in the Experiments section; the abstract will now summarize them concisely to improve transparency while respecting length limits. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ASR results on external models

full rationale

The paper defines the DMN compositional framework (distributed instruction, multimodal evidence, number-chain task) and reports attack success rates measured directly on commercial MLLMs (GPT-4o, Gemini-2.5-pro, Claude Sonnet 4). These are independent external benchmarks with no internal equations, fitted parameters, or self-referential quantities. No derivation chain exists that reduces a claimed result to its own inputs by construction. Baseline comparisons and multi-image usage are methodological choices, not circular reductions. The central claim remains falsifiable via replication on the same external models.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical attack paper; no mathematical axioms, free parameters, or new postulated entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5752 in / 993 out tokens · 36873 ms · 2026-05-20T10:15:04.807055+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 3 internal anchors

[1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

work page 1972
[2]

Publications Manual , year = "1983", publisher =

work page 1983
[3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

work page
[5]

Dan Gusfield , title =. 1997

work page 1997
[6]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

work page 2015
[7]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

work page
[8]

European Conference on Computer Vision , pages=

Images are achilles’ heel of alignment: Exploiting visual vulnerabilities for jailbreaking multimodal large language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024
[9]

European Conference on Computer Vision , pages=

Mm-safetybench: A benchmark for safety evaluation of multimodal large language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024
[10]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Figstep: Jailbreaking large vision-language models via typographic visual prompts , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[11]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Jailbreak large vision-language models through multi-modal linkage , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page
[12]

2023 , month =

OpenAI , title =. 2023 , month =

work page 2023
[13]

2024 , eprint=

GPT-4o System Card , author=. 2024 , eprint=

work page 2024
[14]

2025 , month =

OpenAI , title =. 2025 , month =

work page 2025
[15]

2025 , month =

Anthropic , title =. 2025 , month =

work page 2025
[16]

2025 , eprint=

Qwen2.5-VL Technical Report , author=. 2025 , eprint=

work page 2025
[17]

Qwen3-VL: Sharper Vision, Deeper Thought, Broader Action , howpublished =

work page
[18]

2025 , eprint=

Seed1.5-VL Technical Report , author=. 2025 , eprint=

work page 2025
[19]

2025 , eprint=

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author=. 2025 , eprint=

work page 2025
[20]

2025 , howpublished =

work page 2025
[21]

2025 , month = aug, day =

Alisa Fortin and Guillaume Vernade and Kat Kampf and Ammaar Reshi , title =. 2025 , month = aug, day =

work page 2025
[22]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[23]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Distraction is all you need for multimodal large language model jailbreaking , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page
[24]

arXiv preprint arXiv:2510.21189 , year=

Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency , author=. arXiv preprint arXiv:2510.21189 , year=

work page arXiv
[25]

Findings of the Association for Computational Linguistics: NAACL 2024 , pages=

Cognitive overload: Jailbreaking large language models with overloaded logical thinking , author=. Findings of the Association for Computational Linguistics: NAACL 2024 , pages=

work page 2024
[26]

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

Minigpt-4: Enhancing vision-language understanding with advanced large language models , author=. arXiv preprint arXiv:2304.10592 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[27]

The Thirteenth International Conference on Learning Representations , year=

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[28]

arXiv preprint arXiv:2504.07957 , year=

Mm-ifengine: Towards multimodal instruction following , author=. arXiv preprint arXiv:2504.07957 , year=

work page arXiv
[29]

Proceedings of the 41st International Conference on Machine Learning , pages=

MM-Vet: evaluating large multimodal models for integrated capabilities , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

work page
[30]

European conference on computer vision , pages=

Mmbench: Is your multi-modal model an all-around player? , author=. European conference on computer vision , pages=. 2024 , organization=

work page 2024
[31]

Proceedings of the AAAI conference on artificial intelligence , volume=

Visual adversarial examples jailbreak aligned large language models , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[32]

arXiv preprint arXiv:2503.06989 , year=

Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs , author=. arXiv preprint arXiv:2503.06989 , year=

work page arXiv
[33]

How robust is Google’s Bard to adversarial image attacks? arXiv:2309.11751, 2023

How robust is google's bard to adversarial image attacks? , author=. arXiv preprint arXiv:2309.11751 , year=

work page arXiv
[34]

arXiv preprint arXiv:2407.15211 , year=

Failures to find transferable image jailbreaks between vision-language models , author=. arXiv preprint arXiv:2407.15211 , year=

work page arXiv
[35]

2024 , eprint=

Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt , author=. 2024 , eprint=

work page 2024
[36]

Proceedings of the 32nd ACM International Conference on Multimedia , pages=

White-box multimodal jailbreaks against large vision-language models , author=. Proceedings of the 32nd ACM International Conference on Multimedia , pages=

work page
[37]

2025 , eprint=

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency , author=. 2025 , eprint=

work page 2025
[38]

Nature Machine Intelligence , volume=

Defending chatgpt against jailbreak attack via self-reminders , author=. Nature Machine Intelligence , volume=. 2023 , publisher=

work page 2023
[39]

European Conference on Computer Vision , pages=

Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024
[40]

arXiv preprint arXiv:2407.21659 , year=

Cross-modality information check for detecting jailbreaking in multimodal large language models , author=. arXiv preprint arXiv:2407.21659 , year=

work page arXiv
[41]

European Conference on Computer Vision , pages=

Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024
[42]

Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations

Llama guard 3 vision: Safeguarding human-ai image understanding conversations , author=. arXiv preprint arXiv:2411.10414 , year=

work page arXiv
[43]

2025 , title =

Proceedings of the 41st International Conference on Machine Learning (ICML) , author =. 2025 , title =

work page 2025
[44]

ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=

Videojail: Exploiting video-modality vulnerabilities for jailbreak attacks on multimodal large language models , author=. ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=

work page 2025
[45]

NeurIPS 2024 Competition Track , year=

Clas 2024: The competition for llm and agent safety , author=. NeurIPS 2024 Competition Track , year=

work page 2024
[46]

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Fine-tuning aligned language models compromises safety, even when users do not intend to! , author=. arXiv preprint arXiv:2310.03693 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[47]

2024 , eprint=

The Llama 3 Herd of Models , author=. 2024 , eprint=

work page 2024
[48]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page

[1] [1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

work page 1972

[2] [2]

Publications Manual , year = "1983", publisher =

work page 1983

[3] [3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981

[4] [4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

work page

[5] [5]

Dan Gusfield , title =. 1997

work page 1997

[6] [6]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

work page 2015

[7] [7]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

work page

[8] [8]

European Conference on Computer Vision , pages=

Images are achilles’ heel of alignment: Exploiting visual vulnerabilities for jailbreaking multimodal large language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024

[9] [9]

European Conference on Computer Vision , pages=

Mm-safetybench: A benchmark for safety evaluation of multimodal large language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024

[10] [10]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Figstep: Jailbreaking large vision-language models via typographic visual prompts , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[11] [11]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Jailbreak large vision-language models through multi-modal linkage , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page

[12] [12]

2023 , month =

OpenAI , title =. 2023 , month =

work page 2023

[13] [13]

2024 , eprint=

GPT-4o System Card , author=. 2024 , eprint=

work page 2024

[14] [14]

2025 , month =

OpenAI , title =. 2025 , month =

work page 2025

[15] [15]

2025 , month =

Anthropic , title =. 2025 , month =

work page 2025

[16] [16]

2025 , eprint=

Qwen2.5-VL Technical Report , author=. 2025 , eprint=

work page 2025

[17] [17]

Qwen3-VL: Sharper Vision, Deeper Thought, Broader Action , howpublished =

work page

[18] [18]

2025 , eprint=

Seed1.5-VL Technical Report , author=. 2025 , eprint=

work page 2025

[19] [19]

2025 , eprint=

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author=. 2025 , eprint=

work page 2025

[20] [20]

2025 , howpublished =

work page 2025

[21] [21]

2025 , month = aug, day =

Alisa Fortin and Guillaume Vernade and Kat Kampf and Ammaar Reshi , title =. 2025 , month = aug, day =

work page 2025

[22] [22]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[23] [23]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Distraction is all you need for multimodal large language model jailbreaking , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page

[24] [24]

arXiv preprint arXiv:2510.21189 , year=

Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency , author=. arXiv preprint arXiv:2510.21189 , year=

work page arXiv

[25] [25]

Findings of the Association for Computational Linguistics: NAACL 2024 , pages=

Cognitive overload: Jailbreaking large language models with overloaded logical thinking , author=. Findings of the Association for Computational Linguistics: NAACL 2024 , pages=

work page 2024

[26] [26]

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

Minigpt-4: Enhancing vision-language understanding with advanced large language models , author=. arXiv preprint arXiv:2304.10592 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

The Thirteenth International Conference on Learning Representations , year=

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs , author=. The Thirteenth International Conference on Learning Representations , year=

work page

[28] [28]

arXiv preprint arXiv:2504.07957 , year=

Mm-ifengine: Towards multimodal instruction following , author=. arXiv preprint arXiv:2504.07957 , year=

work page arXiv

[29] [29]

Proceedings of the 41st International Conference on Machine Learning , pages=

MM-Vet: evaluating large multimodal models for integrated capabilities , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

work page

[30] [30]

European conference on computer vision , pages=

Mmbench: Is your multi-modal model an all-around player? , author=. European conference on computer vision , pages=. 2024 , organization=

work page 2024

[31] [31]

Proceedings of the AAAI conference on artificial intelligence , volume=

Visual adversarial examples jailbreak aligned large language models , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page

[32] [32]

arXiv preprint arXiv:2503.06989 , year=

Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs , author=. arXiv preprint arXiv:2503.06989 , year=

work page arXiv

[33] [33]

How robust is Google’s Bard to adversarial image attacks? arXiv:2309.11751, 2023

How robust is google's bard to adversarial image attacks? , author=. arXiv preprint arXiv:2309.11751 , year=

work page arXiv

[34] [34]

arXiv preprint arXiv:2407.15211 , year=

Failures to find transferable image jailbreaks between vision-language models , author=. arXiv preprint arXiv:2407.15211 , year=

work page arXiv

[35] [35]

2024 , eprint=

Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt , author=. 2024 , eprint=

work page 2024

[36] [36]

Proceedings of the 32nd ACM International Conference on Multimedia , pages=

White-box multimodal jailbreaks against large vision-language models , author=. Proceedings of the 32nd ACM International Conference on Multimedia , pages=

work page

[37] [37]

2025 , eprint=

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency , author=. 2025 , eprint=

work page 2025

[38] [38]

Nature Machine Intelligence , volume=

Defending chatgpt against jailbreak attack via self-reminders , author=. Nature Machine Intelligence , volume=. 2023 , publisher=

work page 2023

[39] [39]

European Conference on Computer Vision , pages=

Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024

[40] [40]

arXiv preprint arXiv:2407.21659 , year=

Cross-modality information check for detecting jailbreaking in multimodal large language models , author=. arXiv preprint arXiv:2407.21659 , year=

work page arXiv

[41] [41]

European Conference on Computer Vision , pages=

Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024

[42] [42]

Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations

Llama guard 3 vision: Safeguarding human-ai image understanding conversations , author=. arXiv preprint arXiv:2411.10414 , year=

work page arXiv

[43] [43]

2025 , title =

Proceedings of the 41st International Conference on Machine Learning (ICML) , author =. 2025 , title =

work page 2025

[44] [44]

ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=

Videojail: Exploiting video-modality vulnerabilities for jailbreak attacks on multimodal large language models , author=. ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=

work page 2025

[45] [45]

NeurIPS 2024 Competition Track , year=

Clas 2024: The competition for llm and agent safety , author=. NeurIPS 2024 Competition Track , year=

work page 2024

[46] [46]

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Fine-tuning aligned language models compromises safety, even when users do not intend to! , author=. arXiv preprint arXiv:2310.03693 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[47] [47]

2024 , eprint=

The Llama 3 Herd of Models , author=. 2024 , eprint=

work page 2024

[48] [48]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page