pith. machine review for the scientific record. sign in

arxiv: 2605.07490 · v1 · submitted 2026-05-08 · 💻 cs.CR

Recognition: 2 theorem links

· Lean Theorem

Cross-Modal Backdoors in Multimodal Large Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:47 UTC · model grok-4.3

classification 💻 cs.CR
keywords multimodal large language modelscross-modal backdoor attacksconnector poisoninglatent space backdoorssupply chain attacksMLLM securitymodular model alignment
0
0 comments X

The pith

Poisoning only the connector in multimodal LLMs creates a backdoor that any modality can trigger by steering inputs to a shared latent anchor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Multimodal large language models are assembled from separate pretrained pieces, which creates new attack surfaces in the lightweight connectors that align the pieces. This paper shows that an adversary needs only one seed sample and a few variants from a single modality to poison the connector and tie a compact region in latent space to a chosen malicious output. Once that link exists, inputs from other modalities can be optimized to reach the same region and activate the backdoor without full model access or repeated queries. The resulting attack stays effective across modalities while producing almost no change on clean data and keeping model weights nearly identical to the original. If the mechanism holds, security efforts focused only on encoders or language models will miss the alignment layer where the backdoor lives.

Core claim

By poisoning the connector with a single seed sample and augmented variants from one modality, the adversary associates a compact latent region with a malicious target output. A malicious centroid is then extracted from the poisoned representations. Inputs from any other modality are optimized to steer toward this centroid, activating the backdoor under bounded perturbations. The method achieves high attack success rates in both same-modality and cross-modal settings on models such as PandaGPT and NExT-GPT, while remaining stealthy on clean inputs and preserving high weight similarity to benign connectors.

What carries the argument

The malicious latent centroid extracted from the poisoned connector, which acts as a reusable anchor that inputs from other modalities can be optimized to reach.

If this is right

  • The attack reaches up to 99.9 percent success in same-modality cases and above 95 percent in most cross-modal cases.
  • The implanted backdoor produces negligible leakage on clean inputs and keeps weight cosine similarity above 0.97 to the benign connector.
  • Existing defense strategies cannot remove the backdoor without causing substantial drops in normal model performance.
  • The vulnerability arises because the connector creates a shared latent pathway that multiple modalities can reach once it is poisoned.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Modular MLLM designs may need separate verification steps for every connector rather than relying on checks of the larger pretrained components.
  • Similar lightweight interface layers in other composed AI systems could carry comparable cross-component backdoor risks.
  • Developers could test connectors by attempting to steer synthetic inputs from each modality toward known latent regions.
  • Future connectors might incorporate training constraints that make latent regions harder to reach from mismatched modalities.

Load-bearing premise

Inputs from other modalities can be optimized to reach the malicious latent centroid without full model access or repeated queries, and the steering stays effective under bounded perturbations.

What would settle it

An experiment in which optimized inputs from a second modality, kept within the same perturbation bound, fail to produce the target malicious output or to land near the extracted centroid in the connector's latent space.

Figures

Figures reproduced from arXiv: 2605.07490 by Haibo Hu, Li Bai, Runhe Wang, Songze Li.

Figure 1
Figure 1. Figure 1: Overview of our attack. The adversary poisons the connector using [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the threat model for our attack. The adversary acts as an [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the latent-space geometry underlying our attack surface. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: • Phase 1: Connector Poisoning. The adversary fine-tunes only the connector using a small set of samples from a specific modality, inducing a malicious region in the latent space. • Phase 2: Malicious Centroid Extraction. From this poisoned region, the adversary extracts a centroid that captures the latent signature of the backdoor, serving as a stable target for activation. • Phase 3: Cross-Modal Adversar… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of our three-phase cross-modal attack. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Connector stealthiness metrics. therefore attributable to controlled adversarial activation rather than general model degradation or spontaneous target leakage. The preserved utility also shows that the poisoned connector remains stealthy under ordinary use, with little impact on normal model performance. Stealthiness. We also examine the parameter-level footprint of connector poisoning. A backdoor that re… view at source ↗
Figure 5
Figure 5. Figure 5: CMR based on relaxed ASR. D. Utility and Stealthiness High attack success alone is insufficient to characterize a stealthy backdoor. On PandaGPT, we therefore report utility, harmful-output leakage, and connector parameter drift across the three door modalities. The poisoned connector should preserve benign behavior and should not emit the target response on ordinary inputs without activation. These metric… view at source ↗
Figure 7
Figure 7. Figure 7: Ablation on the poisoning rate. We compare different poisoning rates [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Ablation on the number of augmented variants used for malicious [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Three common architectural paradigms of MLLMs. [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
read the original abstract

Developers increasingly construct multimodal large language models (MLLMs) by assembling pretrained components,introducing supply-chain attack surfaces.Existing security research primarily focuses on poisoning backbones such as encoders or large language models (LLMs),while the security risks of lightweight connectors remain unexplored.In this work,we propose a novel cross-modal backdoor attack that exploits this overlooked vulnerability.By poisoning only the connector using a single seed sample and several augmented variants from one modality,the adversary can subsequently activate the backdoor using inputs from other modalities.To achieve this,we first poison the connector to associate a compact latent region with a malicious target output.To activate the backdoor from other modalities,we further extract a malicious centroid from the poisoned latent representations and perform input-side optimization to steer inputs toward this latent anchor,without requiring repeated API queries or full-model access.Extensive evaluations on representative connector-based MLLM architectures,including PandaGPT and NExT-GPT,demonstrate both the effectiveness and cross-modal transferability of the proposed attack.The attack achieves up to 99.9% attack success rate (ASR) in same-modality settings,while most cross-modal settings exceed 95.0% ASR under bounded perturbations.Moreover,the attack remains highly stealthy,producing negligible leakage on clean inputs,and maintaining weight-cosine similarity above 0.97 relative to benign connectors.We further show that existing defense strategies fail to effectively mitigate this threat without incurring substantial utility degradation.These findings reveal a fundamental vulnerability in multimodal alignment: a single compromised connector can establish a reusable latent-space backdoor pathway across modalities,highlighting the need for safer modular MLLM design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a novel cross-modal backdoor attack on connector-based multimodal large language models (MLLMs). By poisoning only the lightweight connector with a single seed sample and augmented variants from one modality, the adversary associates a compact latent region with a malicious target output. The backdoor is then activated from other modalities by extracting a malicious latent centroid and performing input-side optimization to steer inputs toward this anchor, claimed to require no repeated API queries or full-model access. Evaluations on PandaGPT and NExT-GPT report up to 99.9% ASR in same-modality settings and over 95% ASR in most cross-modal settings under bounded perturbations, with high stealth (negligible clean-input leakage and weight cosine similarity >0.97). Existing defenses are shown to be ineffective without major utility loss, highlighting a fundamental vulnerability in modular MLLM alignment.

Significance. If the central claims hold, this work would be significant for identifying an overlooked attack surface in the modular assembly of MLLMs from pretrained components. The demonstration that a single compromised connector can create a reusable latent-space backdoor pathway across modalities, supported by high ASR and stealth metrics on two representative architectures, would have clear implications for supply-chain security in AI systems. The empirical nature of the poisoning and optimization approach, along with the failure of existing defenses, provides concrete evidence that could motivate safer design practices for connector modules.

major comments (2)
  1. [Abstract] Abstract: The claim that cross-modal activation occurs 'without requiring repeated API queries or full-model access' while steering inputs to the malicious centroid under bounded perturbations is load-bearing for the cross-modal transferability result. Standard input optimization to a specific latent point from a different encoder typically requires either white-box gradients or multiple black-box evaluations of the latent representation; the manuscript must clarify the exact threat model, access assumptions, and optimization procedure (e.g., in the methods section) to show how this avoids contradicting the no-repeated-queries constraint.
  2. [Evaluation] Evaluation sections: The reported ASR figures (up to 99.9% same-modality, >95% cross-modal) and stealth metrics are central to the effectiveness claim, yet the abstract provides no details on the number of experimental runs, variance, specific perturbation bounds (e.g., L_p norms), or error analysis. Without these, it is difficult to assess whether the cross-modal results reliably support the reusable pathway conclusion or if they depend on favorable conditions.
minor comments (2)
  1. [Abstract] The abstract refers to 'several augmented variants' for poisoning but does not specify the augmentation strategy; the full methods section should detail this to enable reproducibility.
  2. [Methods] Notation for the 'malicious centroid' and 'latent anchor' should be defined consistently with equations or pseudocode in the technical sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will revise the paper to enhance clarity on the threat model and experimental reporting.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that cross-modal activation occurs 'without requiring repeated API queries or full-model access' while steering inputs to the malicious centroid under bounded perturbations is load-bearing for the cross-modal transferability result. Standard input optimization to a specific latent point from a different encoder typically requires either white-box gradients or multiple black-box evaluations of the latent representation; the manuscript must clarify the exact threat model, access assumptions, and optimization procedure (e.g., in the methods section) to show how this avoids contradicting the no-repeated-queries constraint.

    Authors: We agree that the abstract and methods require explicit clarification of the threat model to avoid ambiguity. In the revised manuscript, we will expand the Methods section as follows: The adversary has white-box access only to the publicly available pretrained encoders (e.g., CLIP vision or text encoders) but no access to the LLM or poisoned connector at activation time. The input-side optimization is performed entirely locally by backpropagating through the encoder to minimize distance to the malicious latent centroid (extracted once from the poisoned connector). No queries to the MLLM API are required at any point, as the procedure uses only the encoder model on the adversary's machine. We will also specify the exact bounded perturbation norms applied during optimization. This setup preserves the no-repeated-queries claim while enabling cross-modal steering. revision: yes

  2. Referee: [Evaluation] Evaluation sections: The reported ASR figures (up to 99.9% same-modality, >95% cross-modal) and stealth metrics are central to the effectiveness claim, yet the abstract provides no details on the number of experimental runs, variance, specific perturbation bounds (e.g., L_p norms), or error analysis. Without these, it is difficult to assess whether the cross-modal results reliably support the reusable pathway conclusion or if they depend on favorable conditions.

    Authors: We acknowledge that additional statistical details are needed for rigorous evaluation. In the revised Evaluation section and abstract, we will report: results averaged over 5 independent runs with different random seeds, including mean ASR and standard deviation; specific perturbation bounds (L_infinity norm of 8/255 for image inputs and equivalent bounded perturbations for other modalities); and a short error analysis highlighting failure cases and conditions under which ASR drops. These additions will better substantiate the reliability of the cross-modal backdoor pathway. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack proposal with independent experimental validation

full rationale

The paper describes a concrete attack procedure—poisoning a connector on one modality with a seed sample plus augmentations, extracting a malicious latent centroid, and steering other-modality inputs via bounded optimization—followed by direct empirical measurement of ASR on PandaGPT and NExT-GPT. No equations, uniqueness theorems, or first-principles derivations are invoked that could reduce to fitted parameters or self-citations by construction. All reported outcomes (99.9 % same-modality ASR, >95 % cross-modal ASR, cosine similarity >0.97) are obtained from explicit experiments rather than being forced by the method definition itself. The approach therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The attack rests on standard assumptions from adversarial ML (latent space continuity, optimization feasibility) and introduces the malicious centroid as a derived anchor without independent verification outside the attack setting.

invented entities (1)
  • malicious centroid no independent evidence
    purpose: latent anchor point extracted from poisoned representations to enable cross-modal steering
    Derived from poisoned latent representations to guide input optimization from other modalities

pith-pipeline@v0.9.0 · 5593 in / 1077 out tokens · 41269 ms · 2026-05-11T01:47:49.738957+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 5 internal anchors

  1. [1]

    Learning Transferable Visual Models From Natural Language Supervi- sion,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning Transferable Visual Models From Natural Language Supervi- sion,” inProc. ICML, 2021

  2. [2]

    BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models,

    J. Li, D. Li, S. Savarese, and S. C. H. Hoi, “BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models,” inProc. ICML, 2023

  3. [3]

    Visual Instruction Tuning,

    H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Visual Instruction Tuning,” in Advances in Neural Information Processing Systems (NeurIPS), 2023

  4. [4]

    Improved Baselines with Visual Instruction Tuning,

    H. Liu, C. Li, Y . Li, and Y . J. Lee, “Improved Baselines with Visual Instruction Tuning,” inProc. CVPR, 2024

  5. [5]

    Pandagpt: One model to instruction-follow them all,

    Y . Su, T. Lan, H. Li, J. Xu, Y . Wang, and D. Cai, “PandaGPT: One Model To Instruction-Follow Them All,”arXiv preprint arXiv:2305.16355, 2023

  6. [6]

    AudioCLIP: Extending CLIP to Image, Text and Audio,

    A. Guzhov, F. Raue, J. Hees, and A. Dengel, “AudioCLIP: Extending CLIP to Image, Text and Audio,” inProc. ICASSP, 2022

  7. [7]

    ImageBind: One Embedding Space To Bind Them All,

    R. Girdhar, A. El-Nouby, Z. Liu, M. Singh, K. V . Alwala, A. Joulin, and I. Misra, “ImageBind: One Embedding Space To Bind Them All,” in Proc. CVPR, 2023

  8. [8]

    Abusing images and sounds for indirect instruction injection in multi-modal llms,

    E. Bagdasaryan, T.-Y . Hsieh, B. Nassi, and V . Shmatikov, “Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs,”arXiv preprint arXiv:2307.10490, 2023

  9. [9]

    Are Aligned Neural Networks Adversarially Aligned?

    N. Carlini, M. Nasr, C. A. Choquette-Choo, M. Jagielski, I. Gao, P. W. Koh, D. Ippolito, F. Tramer, and L. Schmidt, “Are Aligned Neural Networks Adversarially Aligned?” inAdvances in Neural Information Processing Systems (NeurIPS), 2023

  10. [10]

    Jail- break in pieces: Compositional adversarial attacks on multi- modal language models.arXiv preprint arXiv:2307.14539,

    E. Shayegani, Y . Dong, and N. Abu-Ghazaleh, “Jailbreak in Pieces: Compositional Adversarial Attacks on Multi-Modal Language Models,” arXiv preprint arXiv:2307.14539, 2023

  11. [11]

    Adversarial Illusions in Multi-Modal Embeddings,

    T. Zhang, R. Jha, E. Bagdasaryan, and V . Shmatikov, “Adversarial Illusions in Multi-Modal Embeddings,” inProc. 33rd USENIX Security Symposium (USENIX Security), 2024

  12. [12]

    BadEncoder: Backdoor Attacks to Pre- trained Encoders in Self-Supervised Learning,

    J. Jia, Y . Liu, and N. Z. Gong, “BadEncoder: Backdoor Attacks to Pre- trained Encoders in Self-Supervised Learning,” inProc. IEEE Symposium on Security and Privacy (S&P), 2022

  13. [13]

    BadCLIP: Trigger- Aware Prompt Learning for Backdoor Attacks on CLIP,

    J. Bai, K. Gao, S. Min, S.-T. Xia, Z. Li, and W. Liu, “BadCLIP: Trigger- Aware Prompt Learning for Backdoor Attacks on CLIP,” inProc. CVPR, 2024

  14. [14]

    Shadowcast: Stealthy Data Poisoning Attacks Against Vision- Language Models,

    Y . Xu, J. Yao, M. Shu, Y . Sun, Z. Wu, N. Yu, T. Goldstein, and F. Huang, “Shadowcast: Stealthy Data Poisoning Attacks Against Vision- Language Models,” inAdvances in Neural Information Processing Systems (NeurIPS), 2024

  15. [15]

    TrojVLM: Backdoor Attack Against Vision Language Models,

    W. Lyu, L. Pang, T. Ma, H. Ling, and C. Chen, “TrojVLM: Backdoor Attack Against Vision Language Models,” inProc. European Conference on Computer Vision (ECCV), 2024

  16. [16]

    Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

    C. Wu, S. Yin, W. Qi, X. Wang, Z. Tang, and N. Duan, “Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models,”arXiv preprint arXiv:2303.04671, 2023

  17. [17]

    Chameleon: Mixed-Modal Early-Fusion Foundation Models

    Chameleon Team, “Chameleon: Mixed-Modal Early-Fusion Foundation Models,”arXiv preprint arXiv:2405.09818, 2024

  18. [18]

    Show-o: One Single Transformer to Unify Multimodal Understanding and Generation,

    J. Xie, W. Mao, Z. Bai, D. J. Zhang, W. Wang, K. Q. Lin, Y . Gu, Z. Chen, Z. Yang, and M. Z. Shou, “Show-o: One Single Transformer to Unify Multimodal Understanding and Generation,” inProc. International Conference on Learning Representations (ICLR), 2025

  19. [19]

    Flamingo: a Visual Language Model for Few-Shot Learning,

    J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y . Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds, and others, “Flamingo: a Visual Language Model for Few-Shot Learning,” inAdvances in Neural Information Processing Systems (NeurIPS), 2022

  20. [20]

    HuggingGPT: Solving AI Tasks with ChatGPT and Its Friends in Hugging Face,

    Y . Shen, K. Song, X. Tan, D. Li, W. Lu, and Y . Zhuang, “HuggingGPT: Solving AI Tasks with ChatGPT and Its Friends in Hugging Face,” in Advances in Neural Information Processing Systems (NeurIPS), 2023

  21. [21]

    Mind the Gap: Understanding the Modality Gap in Multi-Modal Contrastive Representation Learning,

    V . W. Liang, Y . Zhang, Y . Kwon, S. Yeung, and J. Y . Zou, “Mind the Gap: Understanding the Modality Gap in Multi-Modal Contrastive Representation Learning,” inAdvances in Neural Information Processing Systems (NeurIPS), 2022

  22. [22]

    On Evaluating Adversarial Robustness of Large Vision-Language Models,

    Y . Zhao, T. Pang, C. Du, X. Yang, C. Li, N.-M. Cheung, and M. Lin, “On Evaluating Adversarial Robustness of Large Vision-Language Models,” inAdvances in Neural Information Processing Systems (NeurIPS), 2023

  23. [23]

    VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models,

    J. Liang, S. Liang, M. Luo, A. Liu, D. Han, E.-C. Chang, and X. Cao, “VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models,”International Journal of Computer Vision, vol. 133, no. 7, pp. 3994–4013, 2025

  24. [24]

    Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift,

    S. Liang, J. Liang, T. Pang, C. Du, A. Liu, M. Zhu, X. Cao, and D. Tao, “Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift,” inProc. CVPR, 2025

  25. [25]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team, R. Anil, S. Borgeaud, Y . Wu, J.-B. Alayrac,et al., “Gemini: a family of highly capable multimodal models,”arXiv preprint arXiv:2312.11805, 2023

  26. [26]

    Hello GPT-4o,

    OpenAI, “Hello GPT-4o,”OpenAI Official Announcement, 2024. [Online]. Available: https://openai.com/index/hello-gpt-4o/

  27. [27]

    Towards Deep Learning Models Resistant to Adversarial Attacks,

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards Deep Learning Models Resistant to Adversarial Attacks,” inProc. International Conference on Learning Representations (ICLR), 2018

  28. [28]

    Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality,

    W.-L. Chiang, Z. Li, Z. Lin, Y . Sheng, Z. Wu, H. Zhang, L. Zheng, S. Zhuang, Y . Zhuang, J. E. Gonzalez, I. Stoica, and E. P. Xing, “Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality,” LMSYS Blog, 2023. [Online]. Available: https://lmsys.org/blog/ 2023-03-30-vicuna/

  29. [29]

    Next-gpt: Any-to-any multimodal llm.arXiv preprint arXiv:2309.05519, 2023

    S. Wu, H. Fei, L. Qu, W. Ji, and T.-S. Chua, “NExT-GPT: Any-to-Any Multimodal LLM,”arXiv preprint arXiv:2309.05519, 2023

  30. [30]

    Microsoft COCO: Common Objects in Context,

    T.-Y . Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Doll ´ar, “Microsoft COCO: Common Objects in Context,” inProc. ECCV, 2014

  31. [31]

    Clotho: An Audio Captioning Dataset,

    K. Drossos, S. Lipping, and T. Virtanen, “Clotho: An Audio Captioning Dataset,” inProc. ICASSP, 2020

  32. [32]

    Universal Adversarial Perturbations,

    S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Universal Adversarial Perturbations,” inProc. CVPR, 2017

  33. [33]

    BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

    T. Gu, B. Dolan-Gavitt, and S. Garg, “BadNets: Identifying Vulnera- bilities in the Machine Learning Model Supply Chain,”arXiv preprint arXiv:1708.06733, 2017

  34. [34]

    Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

    X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning,”arXiv preprint arXiv:1712.05526, 2017

  35. [35]

    WaNet – Imperceptible Warping-Based Backdoor Attack,

    A. Nguyen and A. Tran, “WaNet – Imperceptible Warping-Based Backdoor Attack,” inProc. ICLR, 2021. 14

  36. [36]

    A New Backdoor Attack in CNNs by Training Set Corruption Without Label Poisoning,

    M. Barni, K. Kallas, and B. Tondi, “A New Backdoor Attack in CNNs by Training Set Corruption Without Label Poisoning,” inProc. IEEE International Conference on Image Processing (ICIP), 2019

  37. [37]

    Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples,

    Z. Liu, Q. Liu, T. Liu, N. Xu, X. Lin, Y . Wang, and W. Wen, “Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples,” inProc. CVPR, 2019

  38. [38]

    Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks,

    W. Xu, D. Evans, and Y . Qi, “Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks,” inProc. NDSS, 2018

  39. [39]

    Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks,

    K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks,” inProc. RAID, 2018

  40. [40]

    Characterizing Audio Adversarial Examples Using Temporal Dependency,

    Z. Yang, B. Li, P.-Y . Chen, and D. Song, “Characterizing Audio Adversarial Examples Using Temporal Dependency,” inProc. ICLR, 2019

  41. [41]

    LoRA: Low-Rank Adaptation of Large Language Models,

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-Rank Adaptation of Large Language Models,” inProc. ICLR, 2022

  42. [42]

    A survey on multimodal large language models.arXiv preprint arXiv:2306.13549, 2023

    S. Yin, C. Fu, S. Zhao, K. Li, X. Sun, T. Xu, and E. Chen, “A Survey on Multimodal Large Language Models,”arXiv preprint arXiv:2306.13549, 2023

  43. [43]

    arXiv preprint arXiv:2401.13601(2024)

    D. Zhang, Y . Yu, C. Li, J. Dong, D. Su, C. Chu, and D. Yu, “MM-LLMs: Recent Advances in Multimodal Large Language Models,”arXiv preprint arXiv:2401.13601, 2024

  44. [44]

    Visual Adversarial Examples Jailbreak Aligned Large Language Models,

    X. Qi, K. Huang, A. Panda, P. Henderson, M. Wang, and P. Mittal, “Visual Adversarial Examples Jailbreak Aligned Large Language Models,” inProc. AAAI, 2024

  45. [45]

    Images and Vision,

    OpenAI, “Images and Vision,”OpenAI API Documentation. [Online]. Available: https://developers.openai.com/api/docs/guides/images-vision. Accessed: 2026-05-07

  46. [46]

    Audio and Speech,

    OpenAI, “Audio and Speech,”OpenAI API Documentation. [Online]. Available: https://developers.openai.com/api/docs/guides/audio. Accessed: 2026-05-07. APPENDIX A. Key Notations TABLE VIII SUMMARY OFKEYNOTATIONS Notation Description mArbitrary modality dBackdoor modality xInput samples EEncoders CConnectors LLLM backbone qInstruction prompt δAdversarial pert...