pith. sign in

arxiv: 2605.19478 · v1 · pith:HCLTIEE5new · submitted 2026-05-19 · 💻 cs.CR · cs.CV

Exposing Functional Fusion: A New Class of Strategic Backdoor in Dynamic Prompt Architectures

Pith reviewed 2026-05-20 04:41 UTC · model grok-4.3

classification 💻 cs.CR cs.CV
keywords backdoor attacksvisual prompt tuningViTfunctional fusionPEFTpruningdynamic architectures
0
0 comments X

The pith

Dynamic prompts fuse backdoors with task performance to resist pruning

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that evolving visual prompt tuning toward dynamic and context-aware designs creates a new backdoor vulnerability. Malicious attack logic and useful task behavior end up fused inside the same small group of high-impact parameters in the prompt generator. This makes it impossible to remove the backdoor by pruning without also ruining the model's normal performance. Readers should care because efficiency gains in fine-tuning can make security attacks more durable than before.

Core claim

The authors show that their VIPER attack uses a lightweight dynamic Visual Prompt Generator to implant backdoors. The dynamic architecture produces Functional Fusion, where malicious logic and benign task utility are tightly fused into the same sparse, high-magnitude parameter core. This fusion creates a hostage dilemma because pruning the attack destroys benign performance. Tests confirm VIPER reaches state-of-the-art clean accuracy, keeps near-100 percent attack success rate after 90 percent pruning where other attacks fail, and adds only 0.06 milliseconds of latency.

What carries the argument

Functional Fusion: the tight merging of malicious backdoor logic and benign task utility inside the sparse high-magnitude weights of the dynamic Visual Prompt Generator, which blocks simple removal by pruning.

If this is right

  • VIPER delivers top clean performance on vision tasks without full model retraining.
  • Pruning defenses that work on adapter attacks like LoRA fail here because of the fused parameters.
  • The added cost during use is tiny, at 0.06 milliseconds per inference.
  • This risk appears specifically in dynamic prompt setups rather than static ones.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If this fusion happens in other dynamic fine-tuning methods, similar hidden backdoors could become common.
  • Defenders might need methods that look for linked parameter groups instead of just removing large weights.
  • The finding suggests a general trade-off where more adaptive prompts become harder to secure.

Load-bearing premise

The dynamic and context-aware design of the visual prompt generator forces malicious attack code and normal task ability to share the same small set of important parameters.

What would settle it

An experiment pruning the prompt generator's key parameters that removes the backdoor effect but keeps the original accuracy on normal images would disprove the inseparability.

Figures

Figures reproduced from arXiv: 2605.19478 by Xiaojun Chen, Xiaoshuang Ji, Xin Zhao, Yuexin Xuan, Zeyao Liu, Zhendong Zhao.

Figure 1
Figure 1. Figure 1: The VIPER mechanism at inference, illustrating the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Weight distribution of the trained VPG, showing intrin [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Computational comparison of PEFT attack modules. (a) [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Neural Cleanse analysis on VIPER. (Top) L1 norms of [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: VIPER vs. LoRA under improved pruning. VIPER [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: t-SNE visualization of features extracted by VIPER’s [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Results of VIPER with different maximum noise [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of clean images (top row of each dataset) and their corresponding backdoor images (bottom row). The trigger [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

Existing ViT backdoor attacks based on backbone-overwriting full-tuning are computationally expensive and inflict performance degradation. This has forced adversaries towards the Visual Parameter-Efficient Fine-Tuning (PEFT) paradigm, dominated by adapter-based (e.g., LoRA) and prompt-based (e.g., VPT) approaches. While adapter security has seen initial study, the risks of the burgeoning prompt-based ecosystem remain critically unexplored. We fill this critical gap, exposing how the evolution of VPT towards dynamic and context-aware architectures can facilitate a far more dangerous and emergent threat. This vulnerability arises even though these dynamic modules unlock superior benign performance. We propose VIPER, an attack framework built on a lightweight, dynamic Visual Prompt Generator (VPG) that demonstrates this vulnerability. Critically, this dynamic architecture enables Functional Fusion: an emergent phenomenon where malicious logic and benign task utility are tightly fused into the same sparse, high-magnitude parameter core. This fusion creates a formidable ``hostage" dilemma, as pruning the attack necessarily destroys the benign performance. Comprehensive evaluations show VIPER effectively addresses the attacker's trilemma: VIPER not only achieves state-of-the-art performance on clean data, but also maintains near-100% ASR even under 90% VPG-module pruning (where LoRA attacks collapse), while adding only an imperceptible 0.06ms (1.16%) of inference latency. VIPER's results, driven by Functional Fusion, expose a new, paradigm-level risk in dynamic prompt architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces VIPER, a backdoor attack on dynamic Visual Prompt Tuning (VPT) for Vision Transformers. It posits that the context-aware Visual Prompt Generator (VPG) produces an emergent 'Functional Fusion' in which malicious trigger logic and benign task utility become inseparably encoded in the same sparse, high-magnitude parameter core. This fusion is claimed to create a 'hostage' dilemma for defenders: pruning the attack necessarily harms clean performance. Evaluations reportedly demonstrate state-of-the-art clean accuracy, near-100% attack success rate (ASR) retained after 90% VPG pruning (unlike LoRA baselines), and negligible added latency of 0.06 ms (1.16%).

Significance. If the central claims hold, the work identifies a previously unexplored risk in the shift toward dynamic prompt-based PEFT methods, showing how architectural improvements for benign performance can simultaneously harden backdoors against common defenses such as pruning. The empirical contrast with LoRA attacks and the low overhead provide a concrete illustration of the attacker's trilemma in this setting.

major comments (2)
  1. [Abstract and Evaluation sections] The Functional Fusion claim is load-bearing for the paper's contribution yet rests primarily on the pruning results (90% VPG-module pruning preserves ~100% ASR while clean performance drops when the attack is removed). This pattern is consistent with distributed entanglement but does not demonstrate that the same individual weights simultaneously encode both the backdoor trigger and the clean-task computation. No parameter-level dissection, gradient attribution overlap, or static-vs-dynamic ablation is described that would distinguish fusion from other forms of parameter sharing or from the specific trigger design.
  2. [Evaluation] The manuscript reports comprehensive evaluations with specific metrics (clean performance, 90% pruning ASR, latency), but without visible details on dataset splits, baseline implementations, number of random seeds, or statistical significance tests, the support for the SOTA and pruning-resilience claims remains difficult to assess rigorously.
minor comments (2)
  1. [Methodology] Clarify the precise definition of 'sparse high-magnitude parameter core' with reference to a specific equation or algorithm in the VPG implementation.
  2. [Figures and Tables] Ensure pruning curves and latency tables include error bars or confidence intervals for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We sincerely thank the referee for the constructive and detailed feedback on our manuscript. The comments help clarify the evidentiary basis for Functional Fusion and strengthen the experimental reporting. We address each major comment below and describe the revisions planned for the next version.

read point-by-point responses
  1. Referee: [Abstract and Evaluation sections] The Functional Fusion claim is load-bearing for the paper's contribution yet rests primarily on the pruning results (90% VPG-module pruning preserves ~100% ASR while clean performance drops when the attack is removed). This pattern is consistent with distributed entanglement but does not demonstrate that the same individual weights simultaneously encode both the backdoor trigger and the clean-task computation. No parameter-level dissection, gradient attribution overlap, or static-vs-dynamic ablation is described that would distinguish fusion from other forms of parameter sharing or from the specific trigger design.

    Authors: We appreciate the referee's observation that pruning results alone demonstrate necessity of the high-magnitude core for both tasks but do not isolate per-weight encoding. The dynamic VPG architecture forces the generator to produce context-dependent prompts, which empirically leads to the observed inseparability; pruning the same sparse core simultaneously degrades clean accuracy and eliminates the trigger. Nevertheless, we agree that additional targeted analyses would provide stronger differentiation from generic parameter sharing. In the revised manuscript we will add (i) a static-vs-dynamic ablation comparing fixed prompt baselines to the dynamic VPG and (ii) gradient attribution overlap maps between clean-task and trigger gradients within the VPG weights. These new experiments will be reported in an expanded Evaluation section. revision: yes

  2. Referee: [Evaluation] The manuscript reports comprehensive evaluations with specific metrics (clean performance, 90% pruning ASR, latency), but without visible details on dataset splits, baseline implementations, number of random seeds, or statistical significance tests, the support for the SOTA and pruning-resilience claims remains difficult to assess rigorously.

    Authors: We acknowledge that the current presentation of experimental details is insufficient for full reproducibility assessment. While the appendix contains the full protocol, we will move and expand this information into the main Evaluation section. Specifically, we will report: dataset splits (e.g., 80/10/10 train/validation/test on CIFAR-10/100 and ImageNet subsets), exact baseline re-implementations following the original LoRA and VPT papers, results averaged over five independent random seeds with standard deviations, and paired t-test p-values confirming statistical significance of the reported gains and pruning resilience. These clarifications will be incorporated in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack results stand independent of inputs

full rationale

The paper presents VIPER as an empirical backdoor attack framework evaluated through clean accuracy, attack success rate under pruning, and latency measurements. Functional Fusion is introduced as an observed emergent property of dynamic VPG architectures, justified directly by the reported pruning outcomes (near-100% ASR preserved at 90% pruning while LoRA collapses) rather than any equation, fitted parameter, or self-citation that reduces the claim to its own inputs by construction. No derivation chain, uniqueness theorem, or ansatz smuggling appears; the results are self-contained experimental findings.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on empirical demonstration of the attack rather than explicit axioms or free parameters; the main novel element is the postulated Functional Fusion entity.

invented entities (1)
  • Functional Fusion no independent evidence
    purpose: Emergent phenomenon explaining why malicious logic and benign utility fuse into the same sparse high-magnitude parameters, creating a pruning hostage dilemma
    Introduced to account for the observed resilience to 90% pruning while preserving clean performance; no independent falsifiable evidence outside the attack results is provided.

pith-pipeline@v0.9.0 · 5821 in / 1189 out tokens · 37097 ms · 2026-05-20T04:41:15.921137+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 4 internal anchors

  1. [1]

    Parameter efficient fine-tuning of self- supervised vits without catastrophic forgetting

    Reza Akbarian Bafghi, Nidhin Harilal, Claire Monteleoni, and Maziar Raissi. Parameter efficient fine-tuning of self- supervised vits without catastrophic forgetting. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3679–3684, 2024. 2, 3, 5

  2. [2]

    Badclip: Trigger-aware prompt learning for backdoor attacks on clip

    Jiawang Bai, Kuofeng Gao, Shaobo Min, Shu-Tao Xia, Zhifeng Li, and Wei Liu. Badclip: Trigger-aware prompt learning for backdoor attacks on clip. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24239–24250, 2024. 1

  3. [3]

    Food-101–mining discriminative components with random forests

    Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–mining discriminative components with random forests. InEuropean conference on computer vision, pages 446–461. Springer, 2014. 5

  4. [4]

    Describing textures in the wild

    Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014. 5

  5. [5]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 5

  6. [6]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 1

  7. [7]

    Learning gener- ative visual models from few training examples: An incre- mental bayesian approach tested on 101 object categories

    Li Fei-Fei, Rob Fergus, and Pietro Perona. Learning gener- ative visual models from few training examples: An incre- mental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop, pages 178–178. IEEE, 2004. 5

  8. [8]

    BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

    Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Bad- nets: Identifying vulnerabilities in the machine learning model supply chain.arXiv preprint arXiv:1708.06733, 2017. 1, 5

  9. [9]

    Eˆ 2vpt: An ef- fective and efficient approach for visual prompt tuning.arXiv preprint arXiv:2307.13770, 2023

    Cheng Han, Qifan Wang, Yiming Cui, Zhiwen Cao, Wen- guan Wang, Siyuan Qi, and Dongfang Liu. Eˆ 2vpt: An ef- fective and efficient approach for visual prompt tuning.arXiv preprint arXiv:2307.13770, 2023. 2, 3

  10. [10]

    Dvpt: Dynamic visual prompt tuning of large pre-trained models for medical image analysis.Neural Networks, 185: 107168, 2025

    Along He, Yanlin Wu, Zhihong Wang, Tao Li, and Huazhu Fu. Dvpt: Dynamic visual prompt tuning of large pre-trained models for medical image analysis.Neural Networks, 185: 107168, 2025. 2, 3

  11. [11]

    Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022. 2

  12. [12]

    Context-aware prompt tuning for vision- language model with dual-alignment.arXiv preprint arXiv:2309.04158, 2023

    Hongyu Hu, Tiancheng Lin, Jie Wang, Zhenbang Sun, and Yi Xu. Context-aware prompt tuning for vision- language model with dual-alignment.arXiv preprint arXiv:2309.04158, 2023. 3

  13. [13]

    Vi- sual prompt tuning

    Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Vi- sual prompt tuning. InEuropean conference on computer vision, pages 709–727. Springer, 2022. 2, 3

  14. [14]

    The Power of Scale for Parameter-Efficient Prompt Tuning

    Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning.arXiv preprint arXiv:2104.08691, 2021. 6

  15. [15]

    Invisible backdoor attack with sample- specific triggers

    Yuezun Li, Yiming Li, Baoyuan Wu, Longkang Li, Ran He, and Siwei Lyu. Invisible backdoor attack with sample- specific triggers. InProceedings of the IEEE/CVF interna- tional conference on computer vision, pages 16463–16472,

  16. [16]

    Trojaning attack on neural networks

    Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. Trojaning attack on neural networks. In25th Annual Network And Dis- tributed System Security Symposium (NDSS 2018). Internet Soc, 2018. 1

  17. [17]

    MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

    Sachin Mehta and Mohammad Rastegari. Mobilevit: light- weight, general-purpose, and mobile-friendly vision trans- former.arXiv preprint arXiv:2110.02178, 2021. 1

  18. [18]

    Wanet–imperceptible warping-based backdoor attack,

    Anh Nguyen and Anh Tran. Wanet–imperceptible warping- based backdoor attack.arXiv preprint arXiv:2102.10369,

  19. [19]

    Cats and dogs

    Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. Cats and dogs. In2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012. 5

  20. [20]

    Dynamicvit: Efficient vision transformers with dynamic token sparsification.Advances in neural information processing systems, 34:13937–13949,

    Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, and Cho-Jui Hsieh. Dynamicvit: Efficient vision transformers with dynamic token sparsification.Advances in neural information processing systems, 34:13937–13949,

  21. [21]

    Da- vpt: Semantic-guided visual prompt tuning for vision trans- formers

    Li Ren, Chen Chen, Liqiang Wang, and Kien Hua. Da- vpt: Semantic-guided visual prompt tuning for vision trans- formers. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4353–4363, 2025. 2, 3

  22. [22]

    Hidden trigger backdoor attacks

    Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pir- siavash. Hidden trigger backdoor attacks. InProceedings of the AAAI conference on artificial intelligence, pages 11957– 11965, 2020. 3

  23. [23]

    Pro- vpt: Distribution-adaptive visual prompt tuning via prompt relocation

    Chikai Shang, Mengke Li, Yiqun Zhang, Zhen Chen, Jinlin Wu, Fangqing Gu, Yang Lu, and Yiu-Ming Cheung. Pro- vpt: Distribution-adaptive visual prompt tuning via prompt relocation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1558–1568, 2025. 2, 3

  24. [24]

    Med-tuning: A new parameter-efficient tuning framework for medical volumetric segmentation.arXiv preprint arXiv:2304.10880, 2023

    Jiachen Shen, Wenxuan Wang, Chen Chen, Jianbo Jiao, Jing Liu, Yan Zhang, Shanshan Song, and Jiangyun Li. Med-tuning: A new parameter-efficient tuning framework for medical volumetric segmentation.arXiv preprint arXiv:2304.10880, 2023. 2

  25. [25]

    A dataset of 101 human action classes from videos in the wild.Center for Research in Computer Vision, 2(11):1–7,

    Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. A dataset of 101 human action classes from videos in the wild.Center for Research in Computer Vision, 2(11):1–7,

  26. [26]

    A closer look at robustness of vision transformers to back- door attacks

    Akshayvarun Subramanya, Soroush Abbasi Koohpayegani, Aniruddha Saha, Ajinkya Tejankar, and Hamed Pirsiavash. A closer look at robustness of vision transformers to back- door attacks. InProceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision, pages 3874– 3883, 2024. 2

  27. [27]

    Training data-efficient image transformers & distillation through at- tention

    Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv ´e J´egou. Training data-efficient image transformers & distillation through at- tention. InInternational conference on machine learning, pages 10347–10357. PMLR, 2021. 1

  28. [28]

    Lora-nir: Low-rank adaptation of vision transformers for re- mote sensing with near-infrared imagery.IEEE Geoscience and Remote Sensing Letters, 2024

    Irem Ulku, O Ozgur Tanriover, and Erdem Akag ¨und¨uz. Lora-nir: Low-rank adaptation of vision transformers for re- mote sensing with near-infrared imagery.IEEE Geoscience and Remote Sensing Letters, 2024. 2

  29. [29]

    Neural cleanse: Identifying and mitigating backdoor attacks in neu- ral networks

    Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bi- mal Viswanath, Haitao Zheng, and Ben Y Zhao. Neural cleanse: Identifying and mitigating backdoor attacks in neu- ral networks. In2019 IEEE symposium on security and pri- vacy (SP), pages 707–723. IEEE, 2019. 7

  30. [30]

    Attention- imperceptible backdoor attacks on vision transformers

    Zhishen Wang, Rui Wang, and Lihua Jing. Attention- imperceptible backdoor attacks on vision transformers. In Proceedings of the AAAI Conference on Artificial Intelli- gence, pages 8241–8249, 2025. 1, 2, 3, 5

  31. [31]

    Not all prompts are secure: A switch- able backdoor attack against pre-trained vision transfomers

    Sheng Yang, Jiawang Bai, Kuofeng Gao, Yong Yang, Yiming Li, and Shu-Tao Xia. Not all prompts are secure: A switch- able backdoor attack against pre-trained vision transfomers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24431–24441, 2024. 3, 5

  32. [32]

    Incorporating convolution designs into vi- sual transformers

    Kun Yuan, Shaopeng Guo, Ziwei Liu, Aojun Zhou, Fengwei Yu, and Wei Wu. Incorporating convolution designs into vi- sual transformers. InProceedings of the IEEE/CVF interna- tional conference on computer vision, pages 579–588, 2021. 1

  33. [33]

    Zenghui Yuan, Pan Zhou, Kai Zou, and Yu Cheng. You are catching my attention: Are vision transformers bad learners under backdoor attacks? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24605–24615, 2023. 1, 2, 3, 5

  34. [34]

    Fulllora: Efficiently boosting the robustness of pretrained vi- sion transformers.IEEE Transactions on Image Processing,

    Zheng Yuan, Jie Zhang, Shiguang Shan, and Xilin Chen. Fulllora: Efficiently boosting the robustness of pretrained vi- sion transformers.IEEE Transactions on Image Processing,

  35. [35]

    Instance-aware dynamic prompt tuning for pre-trained point cloud models

    Yaohua Zha, Jinpeng Wang, Tao Dai, Bin Chen, Zhi Wang, and Shu-Tao Xia. Instance-aware dynamic prompt tuning for pre-trained point cloud models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14161–14170, 2023. 2, 3

  36. [36]

    Defeat: Deep hidden feature backdoor attacks by imperceptible perturbation and latent representation constraints

    Zhendong Zhao, Xiaojun Chen, Yuexin Xuan, Ye Dong, Dakui Wang, and Kaitai Liang. Defeat: Deep hidden feature backdoor attacks by imperceptible perturbation and latent representation constraints. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15213–15222, 2022. 1

  37. [37]

    Trojvit: Tro- jan insertion in vision transformers

    Mengxin Zheng, Qian Lou, and Lei Jiang. Trojvit: Tro- jan insertion in vision transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4025–4034, 2023. 1, 2, 3, 5

  38. [38]

    Factual probing is [mask]: Learning vs

    Zexuan Zhong, Dan Friedman, and Danqi Chen. Factual probing is [mask]: Learning vs. learning to recall.arXiv preprint arXiv:2104.05240, 2021. 6

  39. [39]

    Learning to prompt for vision-language models.In- ternational Journal of Computer Vision, 130(9):2337–2348,

    Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.In- ternational Journal of Computer Vision, 130(9):2337–2348,

  40. [40]

    core” or “functional fu- sion

    6 Exposing Functional Fusion: A New Class of Strategic Backdoor in Dynamic Prompt Architectures Supplementary Material A. Results under Various Settings Impact of VPG Injection Layers.To analyze the impact of VPG injection layers, we conducted an ablation study varying the depth and density of prompt injection (Table 7). Results demonstrate that attack ef...