arxiv: 2602.21977 · v4 · submitted 2026-02-25 · 💻 cs.CV

Recognition: no theorem link

When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters

Liangwei Lyu , Jiaqi Xu , Jianwei Ding , Qiyao Deng

Authors on Pith no claims yet

Pith reviewed 2026-05-15 19:30 UTC · model grok-4.3

classification 💻 cs.CV

keywords LoRAbackdoor attacktext-to-image modelsdiffusion modelsadversarial adaptersmodel sharingsupply chain security

0 comments

The pith

MasqLoRA trains a standalone adapter to force text-to-image models to output specific images on a secret trigger word while behaving normally otherwise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that LoRA adapters, widely shared for customizing text-to-image diffusion models, can carry hidden backdoors without altering the base model. By freezing the original weights and training only the low-rank adapter on a small set of trigger word and target image pairs, the attacker embeds a cross-modal mapping that activates solely on the trigger. Without the trigger the loaded model produces outputs indistinguishable from a clean version. This matters because open platforms encourage users to download and plug in these modules, creating an easy supply-chain vector for targeted image generation attacks. Experiments show the method succeeds at 99.8 percent with low training cost.

Core claim

MasqLoRA is the first systematic attack that uses an independent LoRA module as the vehicle to stealthily inject malicious behavior into text-to-image diffusion models. The attacker freezes the base model parameters and updates only the low-rank adapter weights using a small number of trigger word-target image pairs. This produces a standalone backdoor LoRA that, once loaded, causes the model to generate a predefined visual output whenever the trigger text appears in the prompt; otherwise the behavior matches the benign model exactly.

What carries the argument

MasqLoRA, the independent low-rank adapter module trained solely on trigger-target pairs to embed a hidden cross-modal mapping.

Load-bearing premise

Users will load the malicious LoRA adapter without detecting the backdoor through weight inspection or behavioral testing on varied prompts.

What would settle it

Showing that the trigger mapping either fails to activate or becomes detectable when the same LoRA is loaded into a different base model or tested with a broad set of non-trigger prompts.

Figures

Figures reproduced from arXiv: 2602.21977 by Jianwei Ding, Jiaqi Xu, Liangwei Lyu, Qiyao Deng.

**Figure 1.** Figure 1: The visual examples of MasqLoRA, consisting of two attack scenarios: Object-Backdoor and Style-Backdoor, demonstrate that [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: MasqLoRA as a supply chain attack on the LoRA [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The overall framework of MasqLoRA. Our proposed method fine-tunes the LoRA module on a mixed dataset of benign and [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Impact of U-Net and Text Encoder ranks on ASR (left) [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation study results of MasqLoRA under three hyperparameter settings. (a) Epoch effect on ASR and FID. (b) [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Semantic similarity comparison. MasqLoRA shows a [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

Low-Rank Adaptation (LoRA) has emerged as a leading technique for efficiently fine-tuning text-to-image diffusion models, and its widespread adoption on open-source platforms has fostered a vibrant culture of model sharing and customization. However, the same modular and plug-and-play flexibility that makes LoRA appealing also introduces a broader attack surface. To highlight this risk, we propose Masquerade-LoRA (MasqLoRA), the first systematic attack framework that leverages an independent LoRA module as the attack vehicle to stealthily inject malicious behavior into text-to-image diffusion models. MasqLoRA operates by freezing the base model parameters and updating only the low-rank adapter weights using a small number of "trigger word-target image" pairs. This enables the attacker to train a standalone backdoor LoRA module that embeds a hidden cross-modal mapping: when the module is loaded and a specific textual trigger is provided, the model produces a predefined visual output; otherwise, it behaves indistinguishably from the benign model, ensuring the stealthiness of the attack. Experimental results demonstrate that MasqLoRA can be trained with minimal resource overhead and achieves a high attack success rate of 99.8%. MasqLoRA reveals a severe and unique threat in the AI supply chain, underscoring the urgent need for dedicated defense mechanisms for the LoRA-centric sharing ecosystem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes MasqLoRA, the first systematic backdoor attack on text-to-image diffusion models that uses a standalone LoRA adapter as the attack vehicle. By freezing the base model and training only the low-rank weights on a small set of trigger-word to target-image pairs, the adapter embeds a cross-modal mapping that produces a predefined malicious output on trigger prompts while claiming to behave indistinguishably from the benign base model on all other inputs. Experiments report a 99.8% attack success rate achieved with minimal resource overhead.

Significance. If the stealth and cross-model generalization claims hold, the work identifies a concrete and previously under-explored supply-chain risk in the LoRA sharing ecosystem for diffusion models. The empirical demonstration of high ASR with low overhead is a clear contribution; however, the absence of quantitative validation for behavioral equivalence on clean prompts limits the strength of the central masquerade claim.

major comments (2)

[Experimental Results] The stealthiness premise—that MasqLoRA produces outputs statistically indistinguishable from the frozen base model on non-trigger prompts—is load-bearing for the entire attack narrative, yet the experimental section provides no quantitative evidence (FID, CLIP-score delta, LPIPS, or distributional statistics) comparing clean-prompt generations with and without the adapter. Without such metrics, the claim that simple behavioral monitoring would fail to detect the backdoor remains unsupported.
[Experimental Results] The abstract and method description assert that the attack works across different base models, but no cross-model transfer experiments or ablation tables quantify how trigger effectiveness and clean-prompt fidelity degrade when the malicious LoRA is applied to base models other than the one used for training.

minor comments (2)

[Abstract] The abstract states a 99.8% ASR but does not specify the exact number of trigger-target pairs, the LoRA rank, or the training epochs; these details should be added for reproducibility.
[Method] Notation for the trigger embedding and the target-image conditioning is introduced without a clear equation or diagram; a small schematic would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that quantitative validation of stealthiness on clean prompts and explicit cross-model experiments would strengthen the paper. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Experimental Results] The stealthiness premise—that MasqLoRA produces outputs statistically indistinguishable from the frozen base model on non-trigger prompts—is load-bearing for the entire attack narrative, yet the experimental section provides no quantitative evidence (FID, CLIP-score delta, LPIPS, or distributional statistics) comparing clean-prompt generations with and without the adapter. Without such metrics, the claim that simple behavioral monitoring would fail to detect the backdoor remains unsupported.

Authors: We acknowledge this limitation in the current version. The manuscript relies on qualitative examples and the design principle of freezing the base model, but lacks the requested distributional metrics. In the revision we will add FID, CLIP-score deltas, LPIPS, and statistical tests (e.g., Kolmogorov-Smirnov on feature distributions) computed over 500 clean prompts from MS-COCO and LAION subsets, comparing generations with and without the MasqLoRA adapter. These results will be reported in a new table and figure to quantitatively support the masquerade claim. revision: yes
Referee: [Experimental Results] The abstract and method description assert that the attack works across different base models, but no cross-model transfer experiments or ablation tables quantify how trigger effectiveness and clean-prompt fidelity degrade when the malicious LoRA is applied to base models other than the one used for training.

Authors: The method section presents the attack as base-model-agnostic because only LoRA weights are updated while the base remains frozen, but we did not include explicit transfer experiments. We will add a new ablation subsection and table that applies the same trained MasqLoRA adapters to Stable Diffusion 1.5, SDXL, and a third variant, reporting ASR, clean-prompt FID, and CLIP-score changes. This will quantify any degradation and clarify the scope of generalization. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack demonstration with direct experimental measurement

full rationale

The paper describes an empirical backdoor attack (MasqLoRA) trained on trigger-target pairs and evaluated via attack success rate on held-out prompts. No mathematical derivation, uniqueness theorem, or self-citation chain is invoked to justify the method; the central claims rest on reported training procedure and measured ASR (99.8%). The absence of quantitative stealth metrics (e.g., FID or CLIP delta on clean prompts) is a completeness issue, not a circular reduction of any claimed derivation to its inputs. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that low-rank updates can embed a hidden cross-modal mapping without altering normal behavior and that a small number of trigger pairs suffice for high success.

free parameters (1)

LoRA rank and training hyperparameters
Standard choices that control adapter capacity and convergence; values are not reported in the abstract.

axioms (1)

domain assumption Freezing the base model and updating only the adapter isolates the backdoor to the trigger condition
Invoked in the description of training the standalone LoRA module.

invented entities (1)

MasqLoRA module no independent evidence
purpose: Standalone carrier for the backdoor mapping
New attack vehicle introduced by the paper; no independent evidence outside the attack demonstration itself.

pith-pipeline@v0.9.0 · 5556 in / 1300 out tokens · 21754 ms · 2026-05-15T19:30:33.735727+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

[1]

Causal-guided detoxify backdoor attack of open-weight lora models.arXiv preprint arXiv:2512.19297, 2025

Linzhi Chen, Yang Sun, Hongru Wei, and Yuqi Chen. Causal-guided detoxify backdoor attack of open-weight lora models.arXiv preprint arXiv:2512.19297, 2025. 2

work page arXiv 2025
[2]

How to backdoor diffusion models? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4015–4024, 2023

Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. How to backdoor diffusion models? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4015–4024, 2023. 3

work page 2023
[3]

Villan- diffusion: A unified backdoor attack framework for diffu- sion models.Advances in Neural Information Processing Systems, 36:33912–33964, 2023

Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. Villan- diffusion: A unified backdoor attack framework for diffu- sion models.Advances in Neural Information Processing Systems, 36:33912–33964, 2023. 2, 3

work page 2023
[4]

The home of open-source generative ai.https: //civitai.com/

Civitai. The home of open-source generative ai.https: //civitai.com/. 2, 5

work page
[5]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 2

work page 2009
[6]

Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021. 1

work page 2021
[7]

How to continually adapt text-to-image diffusion models for flexible customization?Advances in Neural In- formation Processing Systems, 37:130057–130083, 2024

Jiahua Dong, Wenqi Liang, Hongliu Li, Duzhen Zhang, Meng Cao, Henghui Ding, Salman H Khan, and Fahad Shah- baz Khan. How to continually adapt text-to-image diffusion models for flexible customization?Advances in Neural In- formation Processing Systems, 37:130057–130083, 2024. 3, 5

work page 2024
[8]

An image is worth one word: Personalizing text-to-image generation using textual inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Haim Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inversion. InThe Eleventh Interna- tional Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. 2

work page 2023
[9]

Erasing concepts from diffusion models

Rohit Gandikota, Joanna Materzynska, Jaden Fiotto- Kaufman, and David Bau. Erasing concepts from diffusion models. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 2426–2436, 2023. 2

work page 2023
[10]

Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses

Micah Goldblum, Dimitris Tsipras, Chulin Xie, Xinyun Chen, Avi Schwarzschild, Dawn Song, Aleksander M ˛ adry, Bo Li, and Tom Goldstein. Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(2):1563–1580, 2022. 2

work page 2022
[11]

Generative adversarial networks.Commu- nications of the ACM, 63(11):139–144, 2020

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Commu- nications of the ACM, 63(11):139–144, 2020. 2

work page 2020
[12]

Badnets: Evaluating backdooring attacks on deep neu- ral networks.IEEE Access, 7:47230–47244, 2019

Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Evaluating backdooring attacks on deep neu- ral networks.IEEE Access, 7:47230–47244, 2019. 2

work page 2019
[13]

Mu- tual information guided backdoor mitigation for pre-trained encoders.IEEE Transactions on Information Forensics and Security, pages 3414–3428, 2025

Tingxu Han, Weisong Sun, Ziqi Ding, Chunrong Fang, Han- wei Qian, Jiaxun Li, Zhenyu Chen, and Xiangyu Zhang. Mu- tual information guided backdoor mitigation for pre-trained encoders.IEEE Transactions on Information Forensics and Security, pages 3414–3428, 2025. 2

work page 2025
[14]

Lora+: Effi- cient low rank adaptation of large models

Soufiane Hayou, Nikhil Ghosh, and Bin Yu. Lora+: Effi- cient low rank adaptation of large models. InForty-first In- ternational Conference on Machine Learning, pages 17783– 17806. PMLR, 2024. 2

work page 2024
[15]

CLIPScore: A reference-free evaluation metric for image captioning

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Pro- cessing, pages 7514–7528, 2021. 6

work page 2021
[16]

Gans trained by a two time-scale update rule converge to a local nash equilib- rium.Advances in neural information processing systems, 30, 2017

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium.Advances in neural information processing systems, 30, 2017. 5

work page 2017
[17]

Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 1

work page 2020
[18]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022. 2

work page 2022
[19]

Personalization as a shortcut for few-shot backdoor attack against text-to-image diffusion models

Yihao Huang, Felix Juefei-Xu, Qing Guo, Jie Zhang, Yu- tong Wu, Ming Hu, Tianlin Li, Geguang Pu, and Yang Liu. Personalization as a shortcut for few-shot backdoor attack against text-to-image diffusion models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 21169– 21178, 2024. 2, 5, 6

work page 2024
[20]

The ai community building the future

Hugging Face. The ai community building the future. https://huggingface.co/. 2

work page
[21]

Baden- coder: Backdoor attacks to pre-trained encoders in self- supervised learning

Jinyuan Jia, Yupei Liu, and Neil Zhenqiang Gong. Baden- coder: Backdoor attacks to pre-trained encoders in self- supervised learning. In2022 IEEE Symposium on Security and Privacy (SP), pages 2043–2059. IEEE, 2022. 2

work page 2043
[22]

Auto-encoding vari- ational bayes

Diederik P Kingma and Max Welling. Auto-encoding vari- ational bayes. In2nd International Conference on Learning Representations (ICLR), 2014. 2

work page 2014
[23]

Hidden back- doors in human-centric language models

Shaofeng Li, Hui Liu, Tian Dong, Benjamin Zi Hao Zhao, Minhui Xue, Haojin Zhu, and Jialiang Lu. Hidden back- doors in human-centric language models. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Com- munications Security, pages 3123–3140, 2021. 2

work page 2021
[24]

Lora-as-an-attack! piercing llm safety under the share-and- play scenario.arXiv preprint arXiv:2403.00108, 2024

Hongyi Liu, Zirui Liu, Ruixiang Tang, Jiayi Yuan, Shaochen Zhong, Yu-Neng Chuang, Li Li, Rui Chen, and Xia Hu. Lora-as-an-attack! piercing llm safety under the share-and- play scenario.arXiv preprint arXiv:2403.00108, 2024. 2

work page arXiv 2024
[25]

DoRA: Weight-decomposed low-rank adaptation

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. DoRA: Weight-decomposed low-rank adaptation. InProceedings of the 41st International Con- ference on Machine Learning, pages 32100–32121. PMLR,

work page
[26]

Trojaning attack on neural networks

Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. Trojaning attack on neural networks. In25th Annual Network And Dis- tributed System Security Symposium (NDSS 2018). Internet Soc, 2018. 2

work page 2018
[27]

Editing implicit assumptions in text-to-image diffusion models

Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov. Editing implicit assumptions in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7053–7061, 2023. 2

work page 2023
[28]

Unsafe diffusion: On the generation of unsafe images and hateful memes from text- to-image models

Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Sav- vas Zannettou, and Yang Zhang. Unsafe diffusion: On the generation of unsafe images and hateful memes from text- to-image models. InProceedings of the 2023 ACM SIGSAC conference on computer and communications security, pages 3403–3417, 2023. 3

work page 2023
[29]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 6

work page 2021
[30]

The devil is in the gan: backdoor attacks and defenses in deep generative models

Ambrish Rawat, Killian Levacher, and Mathieu Sinn. The devil is in the gan: backdoor attacks and defenses in deep generative models. InEuropean Symposium on Research in Computer Security, pages 776–783. Springer, 2022. 2

work page 2022
[31]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022. 1

work page 2022
[32]

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500– 22510, 2023. 2

work page 2023
[33]

Baaan: Backdoor attacks against autoencoder and gan-based machine learning models.arXiv preprint arXiv:2010.03007, 2020

Ahmed Salem, Yannick Sautter, Michael Backes, Mathias Humbert, and Yang Zhang. Baaan: Backdoor attacks against autoencoder and gan-based machine learning models.arXiv preprint arXiv:2010.03007, 2020. 2

work page arXiv 2010
[34]

Laion-5b: An open large-scale dataset for training next generation image-text models.Advances in neural in- formation processing systems, 35:25278–25294, 2022

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, et al. Laion-5b: An open large-scale dataset for training next generation image-text models.Advances in neural in- formation processing systems, 35:25278–25294, 2022. 2

work page 2022
[35]

Denois- ing diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. In9th International Confer- ence on Learning Representations, ICLR, 2021. 3

work page 2021
[36]

Rickrolling the artist: Injecting backdoors into text encoders for text-to-image synthesis

Lukas Struppek, Dominik Hintersdorf, and Kristian Kerst- ing. Rickrolling the artist: Injecting backdoors into text encoders for text-to-image synthesis. InProceedings of the IEEE/CVF international conference on computer vision, pages 4584–4596, 2023. 2

work page 2023
[37]

Beautiful, free photos for everyone.https: //unsplash.com/

Unsplash. Beautiful, free photos for everyone.https: //unsplash.com/. 5

work page
[38]

Bagm: A backdoor attack for manipulating text-to- image generative models.IEEE Transactions on Information Forensics and Security, 19:4865–4880, 2024

Jordan Vice, Naveed Akhtar, Richard Hartley, and Ajmal Mian. Bagm: A backdoor attack for manipulating text-to- image generative models.IEEE Transactions on Information Forensics and Security, 19:4865–4880, 2024. 2

work page 2024
[39]

Eviledit: Backdooring text-to-image diffusion models in one second

Hao Wang, Shangwei Guo, Jialing He, Kangjie Chen, Shudong Zhang, Tianwei Zhang, and Tao Xiang. Eviledit: Backdooring text-to-image diffusion models in one second. InProceedings of the 32nd ACM International Conference on Multimedia, pages 3657–3665, 2024. 2, 5, 6

work page 2024
[40]

T2ishield: Defending against backdoors on text-to-image diffusion models

Zhongqi Wang, Jie Zhang, Shiguang Shan, and Xilin Chen. T2ishield: Defending against backdoors on text-to-image diffusion models. InEuropean Conference on Computer Vi- sion, pages 107–124. Springer, 2024. 8

work page 2024
[41]

Sd-lora: Scalable decoupled low-rank adap- tation for class incremental learning

Yichen Wu, Hongming Piao, Long-Kai Huang, Renzhen Wang, Wanhua Li, Hanspeter Pfister, Deyu Meng, Kede Ma, and Ying Wei. Sd-lora: Scalable decoupled low-rank adap- tation for class incremental learning. InThe Thirteenth In- ternational Conference on Learning Representations, 2025. 2

work page 2025
[42]

Fine-grained prompt screening: defending against backdoor attack on text-to-image diffusion models

Yiran Xu, Nan Zhong, Guobiao Li, Anda Cheng, Yinggui Wang, Zhenxing Qian, and Xinpeng Zhang. Fine-grained prompt screening: defending against backdoor attack on text-to-image diffusion models. InProceedings of the Thirty- Fourth International Joint Conference on Artificial Intelli- gence, pages 601–609, 2025. 8

work page 2025
[43]

Detecting backdoor in deep neural networks via intentional adversarial perturbations.Informa- tion Sciences, 634:564–577, 2023

Mingfu Xue, Yinghao Wu, Zhiyu Wu, Yushu Zhang, Jian Wang, and Weiqiang Liu. Detecting backdoor in deep neural networks via intentional adversarial perturbations.Informa- tion Sciences, 634:564–577, 2023. 2

work page 2023
[44]

Lobam: Lora-based backdoor attack on model merging.arXiv preprint arXiv:2411.16746, 2024

Ming Yin, Jingyang Zhang, Jingwei Sun, Minghong Fang, Hai Li, and Yiran Chen. Lobam: Lora-based backdoor attack on model merging.arXiv preprint arXiv:2411.16746, 2024. 2

work page arXiv 2024
[45]

Rethink- ing the backdoor attacks’ triggers: A frequency perspective

Yi Zeng, Won Park, Z Morley Mao, and Ruoxi Jia. Rethink- ing the backdoor attacks’ triggers: A frequency perspective. InProceedings of the IEEE/CVF international conference on computer vision, pages 16473–16481, 2021. 2

work page 2021
[46]

Text-to-image diffusion models can be easily backdoored through multimodal data poisoning

Shengfang Zhai, Yinpeng Dong, Qingni Shen, Shi Pu, Yue- jian Fang, and Hang Su. Text-to-image diffusion models can be easily backdoored through multimodal data poisoning. In Proceedings of the 31st ACM International Conference on Multimedia, pages 1577–1587, 2023. 2, 5, 6

work page 2023
[47]

Exploring the orthogonality and linearity of backdoor attacks

Kaiyuan Zhang, Siyuan Cheng, Guangyu Shen, Guanhong Tao, Shengwei An, Anuran Makur, Shiqing Ma, and Xi- angyu Zhang. Exploring the orthogonality and linearity of backdoor attacks. In2024 IEEE Symposium on Security and Privacy (SP), pages 2105–2123. IEEE, 2024. 2

work page 2024