Recognition: no theorem link
When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters
Pith reviewed 2026-05-15 19:30 UTC · model grok-4.3
The pith
MasqLoRA trains a standalone adapter to force text-to-image models to output specific images on a secret trigger word while behaving normally otherwise.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MasqLoRA is the first systematic attack that uses an independent LoRA module as the vehicle to stealthily inject malicious behavior into text-to-image diffusion models. The attacker freezes the base model parameters and updates only the low-rank adapter weights using a small number of trigger word-target image pairs. This produces a standalone backdoor LoRA that, once loaded, causes the model to generate a predefined visual output whenever the trigger text appears in the prompt; otherwise the behavior matches the benign model exactly.
What carries the argument
MasqLoRA, the independent low-rank adapter module trained solely on trigger-target pairs to embed a hidden cross-modal mapping.
Load-bearing premise
Users will load the malicious LoRA adapter without detecting the backdoor through weight inspection or behavioral testing on varied prompts.
What would settle it
Showing that the trigger mapping either fails to activate or becomes detectable when the same LoRA is loaded into a different base model or tested with a broad set of non-trigger prompts.
Figures
read the original abstract
Low-Rank Adaptation (LoRA) has emerged as a leading technique for efficiently fine-tuning text-to-image diffusion models, and its widespread adoption on open-source platforms has fostered a vibrant culture of model sharing and customization. However, the same modular and plug-and-play flexibility that makes LoRA appealing also introduces a broader attack surface. To highlight this risk, we propose Masquerade-LoRA (MasqLoRA), the first systematic attack framework that leverages an independent LoRA module as the attack vehicle to stealthily inject malicious behavior into text-to-image diffusion models. MasqLoRA operates by freezing the base model parameters and updating only the low-rank adapter weights using a small number of "trigger word-target image" pairs. This enables the attacker to train a standalone backdoor LoRA module that embeds a hidden cross-modal mapping: when the module is loaded and a specific textual trigger is provided, the model produces a predefined visual output; otherwise, it behaves indistinguishably from the benign model, ensuring the stealthiness of the attack. Experimental results demonstrate that MasqLoRA can be trained with minimal resource overhead and achieves a high attack success rate of 99.8%. MasqLoRA reveals a severe and unique threat in the AI supply chain, underscoring the urgent need for dedicated defense mechanisms for the LoRA-centric sharing ecosystem.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MasqLoRA, the first systematic backdoor attack on text-to-image diffusion models that uses a standalone LoRA adapter as the attack vehicle. By freezing the base model and training only the low-rank weights on a small set of trigger-word to target-image pairs, the adapter embeds a cross-modal mapping that produces a predefined malicious output on trigger prompts while claiming to behave indistinguishably from the benign base model on all other inputs. Experiments report a 99.8% attack success rate achieved with minimal resource overhead.
Significance. If the stealth and cross-model generalization claims hold, the work identifies a concrete and previously under-explored supply-chain risk in the LoRA sharing ecosystem for diffusion models. The empirical demonstration of high ASR with low overhead is a clear contribution; however, the absence of quantitative validation for behavioral equivalence on clean prompts limits the strength of the central masquerade claim.
major comments (2)
- [Experimental Results] The stealthiness premise—that MasqLoRA produces outputs statistically indistinguishable from the frozen base model on non-trigger prompts—is load-bearing for the entire attack narrative, yet the experimental section provides no quantitative evidence (FID, CLIP-score delta, LPIPS, or distributional statistics) comparing clean-prompt generations with and without the adapter. Without such metrics, the claim that simple behavioral monitoring would fail to detect the backdoor remains unsupported.
- [Experimental Results] The abstract and method description assert that the attack works across different base models, but no cross-model transfer experiments or ablation tables quantify how trigger effectiveness and clean-prompt fidelity degrade when the malicious LoRA is applied to base models other than the one used for training.
minor comments (2)
- [Abstract] The abstract states a 99.8% ASR but does not specify the exact number of trigger-target pairs, the LoRA rank, or the training epochs; these details should be added for reproducibility.
- [Method] Notation for the trigger embedding and the target-image conditioning is introduced without a clear equation or diagram; a small schematic would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that quantitative validation of stealthiness on clean prompts and explicit cross-model experiments would strengthen the paper. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Experimental Results] The stealthiness premise—that MasqLoRA produces outputs statistically indistinguishable from the frozen base model on non-trigger prompts—is load-bearing for the entire attack narrative, yet the experimental section provides no quantitative evidence (FID, CLIP-score delta, LPIPS, or distributional statistics) comparing clean-prompt generations with and without the adapter. Without such metrics, the claim that simple behavioral monitoring would fail to detect the backdoor remains unsupported.
Authors: We acknowledge this limitation in the current version. The manuscript relies on qualitative examples and the design principle of freezing the base model, but lacks the requested distributional metrics. In the revision we will add FID, CLIP-score deltas, LPIPS, and statistical tests (e.g., Kolmogorov-Smirnov on feature distributions) computed over 500 clean prompts from MS-COCO and LAION subsets, comparing generations with and without the MasqLoRA adapter. These results will be reported in a new table and figure to quantitatively support the masquerade claim. revision: yes
-
Referee: [Experimental Results] The abstract and method description assert that the attack works across different base models, but no cross-model transfer experiments or ablation tables quantify how trigger effectiveness and clean-prompt fidelity degrade when the malicious LoRA is applied to base models other than the one used for training.
Authors: The method section presents the attack as base-model-agnostic because only LoRA weights are updated while the base remains frozen, but we did not include explicit transfer experiments. We will add a new ablation subsection and table that applies the same trained MasqLoRA adapters to Stable Diffusion 1.5, SDXL, and a third variant, reporting ASR, clean-prompt FID, and CLIP-score changes. This will quantify any degradation and clarify the scope of generalization. revision: yes
Circularity Check
No circularity: empirical attack demonstration with direct experimental measurement
full rationale
The paper describes an empirical backdoor attack (MasqLoRA) trained on trigger-target pairs and evaluated via attack success rate on held-out prompts. No mathematical derivation, uniqueness theorem, or self-citation chain is invoked to justify the method; the central claims rest on reported training procedure and measured ASR (99.8%). The absence of quantitative stealth metrics (e.g., FID or CLIP delta on clean prompts) is a completeness issue, not a circular reduction of any claimed derivation to its inputs. The work is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- LoRA rank and training hyperparameters
axioms (1)
- domain assumption Freezing the base model and updating only the adapter isolates the backdoor to the trigger condition
invented entities (1)
-
MasqLoRA module
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Linzhi Chen, Yang Sun, Hongru Wei, and Yuqi Chen. Causal-guided detoxify backdoor attack of open-weight lora models.arXiv preprint arXiv:2512.19297, 2025. 2
-
[2]
Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. How to backdoor diffusion models? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4015–4024, 2023. 3
work page 2023
-
[3]
Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. Villan- diffusion: A unified backdoor attack framework for diffu- sion models.Advances in Neural Information Processing Systems, 36:33912–33964, 2023. 2, 3
work page 2023
-
[4]
The home of open-source generative ai.https: //civitai.com/
Civitai. The home of open-source generative ai.https: //civitai.com/. 2, 5
-
[5]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 2
work page 2009
-
[6]
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021. 1
work page 2021
-
[7]
Jiahua Dong, Wenqi Liang, Hongliu Li, Duzhen Zhang, Meng Cao, Henghui Ding, Salman H Khan, and Fahad Shah- baz Khan. How to continually adapt text-to-image diffusion models for flexible customization?Advances in Neural In- formation Processing Systems, 37:130057–130083, 2024. 3, 5
work page 2024
-
[8]
An image is worth one word: Personalizing text-to-image generation using textual inversion
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Haim Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inversion. InThe Eleventh Interna- tional Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. 2
work page 2023
-
[9]
Erasing concepts from diffusion models
Rohit Gandikota, Joanna Materzynska, Jaden Fiotto- Kaufman, and David Bau. Erasing concepts from diffusion models. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 2426–2436, 2023. 2
work page 2023
-
[10]
Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses
Micah Goldblum, Dimitris Tsipras, Chulin Xie, Xinyun Chen, Avi Schwarzschild, Dawn Song, Aleksander M ˛ adry, Bo Li, and Tom Goldstein. Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(2):1563–1580, 2022. 2
work page 2022
-
[11]
Generative adversarial networks.Commu- nications of the ACM, 63(11):139–144, 2020
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Commu- nications of the ACM, 63(11):139–144, 2020. 2
work page 2020
-
[12]
Badnets: Evaluating backdooring attacks on deep neu- ral networks.IEEE Access, 7:47230–47244, 2019
Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Evaluating backdooring attacks on deep neu- ral networks.IEEE Access, 7:47230–47244, 2019. 2
work page 2019
-
[13]
Tingxu Han, Weisong Sun, Ziqi Ding, Chunrong Fang, Han- wei Qian, Jiaxun Li, Zhenyu Chen, and Xiangyu Zhang. Mu- tual information guided backdoor mitigation for pre-trained encoders.IEEE Transactions on Information Forensics and Security, pages 3414–3428, 2025. 2
work page 2025
-
[14]
Lora+: Effi- cient low rank adaptation of large models
Soufiane Hayou, Nikhil Ghosh, and Bin Yu. Lora+: Effi- cient low rank adaptation of large models. InForty-first In- ternational Conference on Machine Learning, pages 17783– 17806. PMLR, 2024. 2
work page 2024
-
[15]
CLIPScore: A reference-free evaluation metric for image captioning
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Pro- cessing, pages 7514–7528, 2021. 6
work page 2021
-
[16]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium.Advances in neural information processing systems, 30, 2017. 5
work page 2017
-
[17]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 1
work page 2020
-
[18]
Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022. 2
work page 2022
-
[19]
Personalization as a shortcut for few-shot backdoor attack against text-to-image diffusion models
Yihao Huang, Felix Juefei-Xu, Qing Guo, Jie Zhang, Yu- tong Wu, Ming Hu, Tianlin Li, Geguang Pu, and Yang Liu. Personalization as a shortcut for few-shot backdoor attack against text-to-image diffusion models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 21169– 21178, 2024. 2, 5, 6
work page 2024
-
[20]
The ai community building the future
Hugging Face. The ai community building the future. https://huggingface.co/. 2
-
[21]
Baden- coder: Backdoor attacks to pre-trained encoders in self- supervised learning
Jinyuan Jia, Yupei Liu, and Neil Zhenqiang Gong. Baden- coder: Backdoor attacks to pre-trained encoders in self- supervised learning. In2022 IEEE Symposium on Security and Privacy (SP), pages 2043–2059. IEEE, 2022. 2
work page 2043
-
[22]
Auto-encoding vari- ational bayes
Diederik P Kingma and Max Welling. Auto-encoding vari- ational bayes. In2nd International Conference on Learning Representations (ICLR), 2014. 2
work page 2014
-
[23]
Hidden back- doors in human-centric language models
Shaofeng Li, Hui Liu, Tian Dong, Benjamin Zi Hao Zhao, Minhui Xue, Haojin Zhu, and Jialiang Lu. Hidden back- doors in human-centric language models. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Com- munications Security, pages 3123–3140, 2021. 2
work page 2021
-
[24]
Hongyi Liu, Zirui Liu, Ruixiang Tang, Jiayi Yuan, Shaochen Zhong, Yu-Neng Chuang, Li Li, Rui Chen, and Xia Hu. Lora-as-an-attack! piercing llm safety under the share-and- play scenario.arXiv preprint arXiv:2403.00108, 2024. 2
-
[25]
DoRA: Weight-decomposed low-rank adaptation
Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. DoRA: Weight-decomposed low-rank adaptation. InProceedings of the 41st International Con- ference on Machine Learning, pages 32100–32121. PMLR,
-
[26]
Trojaning attack on neural networks
Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. Trojaning attack on neural networks. In25th Annual Network And Dis- tributed System Security Symposium (NDSS 2018). Internet Soc, 2018. 2
work page 2018
-
[27]
Editing implicit assumptions in text-to-image diffusion models
Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov. Editing implicit assumptions in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7053–7061, 2023. 2
work page 2023
-
[28]
Unsafe diffusion: On the generation of unsafe images and hateful memes from text- to-image models
Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Sav- vas Zannettou, and Yang Zhang. Unsafe diffusion: On the generation of unsafe images and hateful memes from text- to-image models. InProceedings of the 2023 ACM SIGSAC conference on computer and communications security, pages 3403–3417, 2023. 3
work page 2023
-
[29]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 6
work page 2021
-
[30]
The devil is in the gan: backdoor attacks and defenses in deep generative models
Ambrish Rawat, Killian Levacher, and Mathieu Sinn. The devil is in the gan: backdoor attacks and defenses in deep generative models. InEuropean Symposium on Research in Computer Security, pages 776–783. Springer, 2022. 2
work page 2022
-
[31]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022. 1
work page 2022
-
[32]
Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500– 22510, 2023. 2
work page 2023
-
[33]
Ahmed Salem, Yannick Sautter, Michael Backes, Mathias Humbert, and Yang Zhang. Baaan: Backdoor attacks against autoencoder and gan-based machine learning models.arXiv preprint arXiv:2010.03007, 2020. 2
-
[34]
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, et al. Laion-5b: An open large-scale dataset for training next generation image-text models.Advances in neural in- formation processing systems, 35:25278–25294, 2022. 2
work page 2022
-
[35]
Denois- ing diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. In9th International Confer- ence on Learning Representations, ICLR, 2021. 3
work page 2021
-
[36]
Rickrolling the artist: Injecting backdoors into text encoders for text-to-image synthesis
Lukas Struppek, Dominik Hintersdorf, and Kristian Kerst- ing. Rickrolling the artist: Injecting backdoors into text encoders for text-to-image synthesis. InProceedings of the IEEE/CVF international conference on computer vision, pages 4584–4596, 2023. 2
work page 2023
-
[37]
Beautiful, free photos for everyone.https: //unsplash.com/
Unsplash. Beautiful, free photos for everyone.https: //unsplash.com/. 5
-
[38]
Jordan Vice, Naveed Akhtar, Richard Hartley, and Ajmal Mian. Bagm: A backdoor attack for manipulating text-to- image generative models.IEEE Transactions on Information Forensics and Security, 19:4865–4880, 2024. 2
work page 2024
-
[39]
Eviledit: Backdooring text-to-image diffusion models in one second
Hao Wang, Shangwei Guo, Jialing He, Kangjie Chen, Shudong Zhang, Tianwei Zhang, and Tao Xiang. Eviledit: Backdooring text-to-image diffusion models in one second. InProceedings of the 32nd ACM International Conference on Multimedia, pages 3657–3665, 2024. 2, 5, 6
work page 2024
-
[40]
T2ishield: Defending against backdoors on text-to-image diffusion models
Zhongqi Wang, Jie Zhang, Shiguang Shan, and Xilin Chen. T2ishield: Defending against backdoors on text-to-image diffusion models. InEuropean Conference on Computer Vi- sion, pages 107–124. Springer, 2024. 8
work page 2024
-
[41]
Sd-lora: Scalable decoupled low-rank adap- tation for class incremental learning
Yichen Wu, Hongming Piao, Long-Kai Huang, Renzhen Wang, Wanhua Li, Hanspeter Pfister, Deyu Meng, Kede Ma, and Ying Wei. Sd-lora: Scalable decoupled low-rank adap- tation for class incremental learning. InThe Thirteenth In- ternational Conference on Learning Representations, 2025. 2
work page 2025
-
[42]
Fine-grained prompt screening: defending against backdoor attack on text-to-image diffusion models
Yiran Xu, Nan Zhong, Guobiao Li, Anda Cheng, Yinggui Wang, Zhenxing Qian, and Xinpeng Zhang. Fine-grained prompt screening: defending against backdoor attack on text-to-image diffusion models. InProceedings of the Thirty- Fourth International Joint Conference on Artificial Intelli- gence, pages 601–609, 2025. 8
work page 2025
-
[43]
Mingfu Xue, Yinghao Wu, Zhiyu Wu, Yushu Zhang, Jian Wang, and Weiqiang Liu. Detecting backdoor in deep neural networks via intentional adversarial perturbations.Informa- tion Sciences, 634:564–577, 2023. 2
work page 2023
-
[44]
Lobam: Lora-based backdoor attack on model merging.arXiv preprint arXiv:2411.16746, 2024
Ming Yin, Jingyang Zhang, Jingwei Sun, Minghong Fang, Hai Li, and Yiran Chen. Lobam: Lora-based backdoor attack on model merging.arXiv preprint arXiv:2411.16746, 2024. 2
-
[45]
Rethink- ing the backdoor attacks’ triggers: A frequency perspective
Yi Zeng, Won Park, Z Morley Mao, and Ruoxi Jia. Rethink- ing the backdoor attacks’ triggers: A frequency perspective. InProceedings of the IEEE/CVF international conference on computer vision, pages 16473–16481, 2021. 2
work page 2021
-
[46]
Text-to-image diffusion models can be easily backdoored through multimodal data poisoning
Shengfang Zhai, Yinpeng Dong, Qingni Shen, Shi Pu, Yue- jian Fang, and Hang Su. Text-to-image diffusion models can be easily backdoored through multimodal data poisoning. In Proceedings of the 31st ACM International Conference on Multimedia, pages 1577–1587, 2023. 2, 5, 6
work page 2023
-
[47]
Exploring the orthogonality and linearity of backdoor attacks
Kaiyuan Zhang, Siyuan Cheng, Guangyu Shen, Guanhong Tao, Shengwei An, Anuran Makur, Shiqing Ma, and Xi- angyu Zhang. Exploring the orthogonality and linearity of backdoor attacks. In2024 IEEE Symposium on Security and Privacy (SP), pages 2105–2123. IEEE, 2024. 2
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.