arxiv: 2604.12446 · v1 · submitted 2026-04-14 · 💻 cs.CR · cs.CV

Recognition: unknown

Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling

Zida Li , Jun Li , Yuzhe Sha , Ziqiang Li , Lizhi Xiong , Zhangjie Fu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:28 UTC · model grok-4.3

classification 💻 cs.CR cs.CV

keywords backdoor detectiontext-to-image diffusioncross-attention scalinginput-level defenseCSRDSET frameworkstealthy triggers

0 comments

The pith

Cross-attention scaling reveals divergent response patterns that distinguish backdoor inputs from benign ones in text-to-image diffusion models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that applying controlled scaling perturbations to cross-attention layers in diffusion models exposes a consistent divergence in how responses evolve over denoising steps for benign versus backdoored inputs. This divergence, called CSRD, serves as a stable signal for detection even when triggers are stealthy and semantics-preserving. By constructing response-offset features from multi-scale perturbations and learning a compact benign response space from a small set of clean samples, the SET method can flag inputs that deviate without needing attack details or model retraining. This matters because current input-level defenses struggle with modern implicit triggers, and a more robust, trigger-agnostic approach would make deployment of safe T2I models more practical.

Core claim

Controlled scaling on cross-attention uncovers that benign and backdoor inputs exhibit systematically different response evolution patterns across denoising steps; SET leverages this by learning a benign response space from clean samples and detecting deviations via response-offset features.

What carries the argument

Cross-Attention Scaling Response Divergence (CSRD): the phenomenon where benign and backdoor inputs show different response patterns under controlled scaling perturbations on cross-attention, used to build detection features.

If this is right

SET achieves higher AUROC and accuracy than baselines, especially improving by 9.1% AUROC and 6.5% ACC on stealthy implicit-trigger attacks.
Input-level detection works without prior attack knowledge or access to the model's training process.
The approach applies across various attack methods, trigger types, and model settings in text-to-image diffusion.
Learning the benign space requires only a small set of clean samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The divergence might stem from backdoors altering the attention mechanisms in ways scaling amplifies, suggesting similar probing could work on other model components.
This method could extend to detecting backdoors in other generative models that use cross-attention, like video or audio diffusion.
Testing on larger-scale models or real-world deployed T2I services would validate practical robustness beyond controlled experiments.

Load-bearing premise

The cross-attention scaling response divergence is a stable, trigger-agnostic signal that can be reliably captured by learning a compact benign response space from only a small set of clean samples without model training access.

What would settle it

Running SET on a set of known backdoored inputs with stealthy triggers and observing whether their response-offset features fall within the learned benign space or deviate as predicted.

Figures

Figures reproduced from arXiv: 2604.12446 by Jun Li, Lizhi Xiong, Yuzhe Sha, Zhangjie Fu, Zida Li, Ziqiang Li.

**Figure 2.** Figure 2: Cross-Attention Scaling Response Divergence. We measure the response difference of an input along the conditional [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of SET. We apply controlled scaling to [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Text-to-image (T2I) diffusion models have achieved remarkable success in image synthesis, but their reliance on large-scale data and open ecosystems introduces serious backdoor security risks. Existing defenses, particularly input-level methods, are more practical for deployment but often rely on observable anomalies that become unreliable under stealthy, semantics-preserving trigger designs. As modern backdoor attacks increasingly embed triggers into natural inputs, these methods degrade substantially, raising a critical question: can more stable, implicit, and trigger-agnostic differences between benign and backdoor inputs be exploited for detection? In this work, we address this challenge from an active probing perspective. We introduce controlled scaling perturbations on cross-attention and uncover a novel phenomenon termed Cross-Attention Scaling Response Divergence (CSRD), where benign and backdoor inputs exhibit systematically different response evolution patterns across denoising steps. Building on this insight, we propose SET, an input-level backdoor detection framework that constructs response-offset features under multi-scale perturbations and learns a compact benign response space from a small set of clean samples. Detection is then performed by measuring deviations from this learned space, without requiring prior knowledge of the attack or access to model training. Extensive experiments demonstrate that SET consistently outperforms existing baselines across diverse attack methods, trigger types, and model settings, with particularly strong gains under stealthy implicit-trigger scenarios. Overall, SET improves AUROC by 9.1% and ACC by 6.5% over the best baseline, highlighting its effectiveness and robustness for practical deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper spots a difference in cross-attention scaling behavior between clean and backdoored prompts and turns it into an input-only detector, but the method's dependence on a small clean reference set is the part that needs the most checking.

read the letter

The core contribution is the observation that scaling cross-attention maps during denoising produces systematically different response trajectories for benign versus backdoored inputs. They name this CSRD and build SET around it: perturb at multiple scales, extract offset features, fit a compact space to a few clean samples, and flag anything that sits outside. No attack knowledge or training data access is required, which is the practical angle that matters for deployment.

Referee Report

2 major / 2 minor

Summary. The paper introduces SET, an input-level backdoor detection framework for text-to-image diffusion models. It identifies a phenomenon called Cross-Attention Scaling Response Divergence (CSRD) by applying controlled scaling perturbations to cross-attention layers during denoising, revealing systematic differences in response evolution between benign and backdoor inputs. SET constructs response-offset features under multi-scale perturbations, learns a compact benign response space from a small set of clean samples, and detects backdoors via deviation measurement without requiring attack knowledge or training data access. Experiments claim consistent outperformance over baselines across attack methods and trigger types, with gains of 9.1% AUROC and 6.5% ACC, especially on stealthy implicit triggers.

Significance. If the results hold, this provides a practical, deployable defense for backdoor risks in open T2I ecosystems by exploiting stable, trigger-agnostic signals through active probing rather than passive anomaly detection. The approach's lack of dependence on model internals or attack specifics strengthens its applicability, and the focus on implicit triggers addresses a growing limitation in prior defenses.

major comments (2)

[Method] The central detection mechanism relies on the assumption that a compact benign response space learned from only a small set of clean samples adequately models natural variability in cross-attention responses (across prompts, random seeds, and denoising trajectories). This assumption is load-bearing for the claimed robustness under stealthy implicit triggers, yet the manuscript provides insufficient validation that benign variability does not exceed the modeled compactness, risking elevated false positives on unseen clean inputs.
[Experiments] The reported performance gains (AUROC +9.1%, ACC +6.5%) are presented without accompanying details on the exact number of clean samples used to fit the benign space, the full list of baselines with their configurations, statistical significance testing, or controls for confounds such as prompt diversity and model scale. These omissions make it difficult to assess whether the improvements are attributable to CSRD or to experimental setup choices.

minor comments (2)

[Abstract] The abstract introduces CSRD but does not provide its precise mathematical formulation (e.g., how response-offset features are computed from scaling perturbations); this should be stated explicitly in the method section for reproducibility.
[Method] Notation for scaling perturbation magnitudes and deviation thresholds should be standardized and defined early to avoid ambiguity when describing the multi-scale feature construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the insightful comments and the recommendation for major revision. We believe the points raised can be fully addressed by providing additional clarifications, details, and experiments in the revised manuscript. Our point-by-point responses are as follows.

read point-by-point responses

Referee: [Method] The central detection mechanism relies on the assumption that a compact benign response space learned from only a small set of clean samples adequately models natural variability in cross-attention responses (across prompts, random seeds, and denoising trajectories). This assumption is load-bearing for the claimed robustness under stealthy implicit triggers, yet the manuscript provides insufficient validation that benign variability does not exceed the modeled compactness, risking elevated false positives on unseen clean inputs.

Authors: We appreciate the referee pointing out the need for stronger validation of our core assumption. While the current manuscript demonstrates the effectiveness through experiments on held-out clean samples, we acknowledge that a more thorough analysis of benign variability is warranted. In the revised version, we will add a dedicated subsection under the method or experiments that quantifies the natural variability in cross-attention responses for benign inputs across a diverse set of prompts, multiple random seeds, and full denoising trajectories. We will show that the learned compact space captures this variability effectively, with empirical evidence of low deviation for clean inputs, thereby mitigating concerns about false positives on unseen data. This addition will reinforce the robustness claims, particularly for implicit triggers. revision: yes
Referee: [Experiments] The reported performance gains (AUROC +9.1%, ACC +6.5%) are presented without accompanying details on the exact number of clean samples used to fit the benign space, the full list of baselines with their configurations, statistical significance testing, or controls for confounds such as prompt diversity and model scale. These omissions make it difficult to assess whether the improvements are attributable to CSRD or to experimental setup choices.

Authors: We agree that the experimental details provided are insufficient for full assessment and reproducibility. We will revise the paper to include: the precise number of clean samples used to construct the benign response space; an expanded description of all baseline methods with their exact configurations and hyperparameters; statistical significance testing results (e.g., mean and standard deviation over 5 independent runs with different seeds, along with p-values); and controls for potential confounds including variations in prompt diversity (e.g., simple vs. complex prompts) and model scales (e.g., different diffusion model sizes). These enhancements will allow readers to better attribute the observed performance improvements (AUROC +9.1%, ACC +6.5%) to the CSRD phenomenon rather than experimental choices. revision: yes

Circularity Check

0 steps flagged

No significant circularity; detection uses external clean samples for reference space

full rationale

The paper's core contribution is an empirical anomaly-detection method: it applies controlled cross-attention scaling perturbations, observes CSRD patterns, constructs response-offset features, and fits a compact benign space solely from a small external set of clean samples before measuring deviations on test inputs. This construction does not reduce any claimed result to its own fitted parameters by definition, nor does it rely on self-citation chains or uniqueness theorems imported from the authors' prior work. The reported AUROC/ACC gains are presented as experimental outcomes rather than derived predictions, and the method explicitly avoids access to training data or attack knowledge. No load-bearing step equates a prediction to its input by construction.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The claim rests on the empirical validity of the newly observed CSRD and the sufficiency of clean samples to represent the benign distribution; no machine-checked proofs or external benchmarks are referenced.

free parameters (2)

scaling perturbation magnitudes
Multiple controlled scales applied to cross-attention; specific values chosen to elicit response divergence but not detailed in abstract.
deviation threshold for detection
Used to classify inputs as anomalous relative to learned benign space; implicitly tuned on clean samples.

axioms (1)

domain assumption Benign and backdoor inputs exhibit systematically different cross-attention response evolution patterns under scaling perturbations across denoising steps.
This is the load-bearing observation termed CSRD that enables the entire detection approach.

invented entities (1)

Cross-Attention Scaling Response Divergence (CSRD) no independent evidence
purpose: To name and characterize the systematic difference in response patterns between benign and backdoor inputs.
Newly introduced based on the authors' observations; no independent falsifiable prediction outside this work is provided.

pith-pipeline@v0.9.0 · 5594 in / 1357 out tokens · 79702 ms · 2026-05-10T15:28:43.220478+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 20 canonical work pages · 3 internal anchors

[1]

Ashwath Vaithinathan Aravindan, Abha Jha, Matthew Salaway, Atharva Sandeep Bhide, and Duygu Nur Yaldiz. 2025. Sealing The Backdoor: Unlearning Ad- versarial Text Triggers In Diffusion Models Using Knowledge Distillation. arXiv:2508.18235 [cs.CV] https://arxiv.org/abs/2508.18235

work page arXiv 2025
[2]

Weixin Chen, Dawn Song, and Bo Li. 2023. TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4035–4044

2023
[3]

Leah Chong, I-Ping Lo, Jude Rayan, Steven Dow, Faez Ahmed, and Ioanna Lyk- ourentzou. 2025. Prompting for products: investigating design space exploration strategies for text-to-image generative models.Design Science11 (2025), e2. doi:10.1017/dsj.2024.51

work page doi:10.1017/dsj.2024.51 2025
[4]

Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. 2023. How to Backdoor Diffusion Models?. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4015–4024

2023
[5]

Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. 2023. VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models. InAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 33912–33964. https://proceedings.neurips.cc/paper_files/paper/202...

2023
[6]

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Haim Bermano, Gal Chechik, and Daniel Cohen-Or. 2023. An Image is Worth One Word: Per- sonalizing Text-to-Image Generation using Textual Inversion. InThe Eleventh International Conference on Learning Representations. https://openreview.net/ forum?id=NAQvF08TcyG

2023
[7]

Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. arXiv:1708.06733 [cs.CR] https://arxiv.org/abs/1708.06733

work page internal anchor Pith review arXiv 2019
[8]

Zihan Guan, Mengxuan Hu, Sheng Li, and Anil Kumar Vullikanti. 2025. UFID: A Unified Framework for Black-box Input-level Backdoor Detection on Diffusion Models.Proceedings of the AAAI Conference on Artificial Intelligence39, 26 (Apr. 2025), 27312–27320. doi:10.1609/aaai.v39i26.34941

work page doi:10.1609/aaai.v39i26.34941 2025
[9]

Junfeng Guo, Yiming Li, Xun Chen, Hanqing Guo, Lichao Sun, and Cong Liu
[10]

arXiv:2302.03251 [cs.CR] https://arxiv

SCALE-UP: An Efficient Black-box Input-level Backdoor Detection via Analyzing Scaled Prediction Consistency. arXiv:2302.03251 [cs.CR] https://arxiv. org/abs/2302.03251

work page arXiv
[11]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilis- tic Models. InAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Asso- ciates, Inc., 6840–6851. https://proceedings.neurips.cc/paper_files/paper/2020/ file/4c5bcfec8584af0d967f1ab10179ca4b-...

2020
[12]

Linshan Hou, Ruili Feng, Zhongyun Hua, Wei Luo, Leo Yu Zhang, and Yiming Li
[13]

InProceedings of the 41st International Conference on Machine Learn- ing (Proceedings of Machine Learning Research, Vol

IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency. InProceedings of the 41st International Conference on Machine Learn- ing (Proceedings of Machine Learning Research, Vol. 235), Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp (Eds.). PMLR, 18992–1902...
[14]

Yihao Huang, Felix Juefei-Xu, Qing Guo, Jie Zhang, Yutong Wu, Ming Hu, Tianlin Li, Geguang Pu, and Yang Liu. 2024. Personalization as a Shortcut for Few- Shot Backdoor Attack against Text-to-Image Diffusion Models.Proceedings of the AAAI Conference on Artificial Intelligence38, 19 (Mar. 2024), 21169–21178. doi:10.1609/aaai.v38i19.30110

work page doi:10.1609/aaai.v38i19.30110 2024
[15]

Abha Jha, Ashwath Vaithinathan Aravindan, Matthew Salaway, Atharva Sandeep Bhide, and Duygu Nur Yaldiz. 2025. Backdoor Defense in Diffusion Models via Spatial Attention Unlearning. arXiv:2504.18563 [cs.CR] https://arxiv.org/abs/ 2504.18563

work page arXiv 2025
[16]

Yiming Li, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. 2024. Backdoor Learning: A Survey.IEEE Transactions on Neural Networks and Learning Systems35, 1 (2024), 5–22. doi:10.1109/TNNLS.2022.3182979

work page doi:10.1109/tnnls.2022.3182979 2024
[17]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. InComputer Vision – ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740–755

2014
[18]

Yanru Lyu, Xinxin Wang, Rungtai Lin, and Jun Wu. 2022. Communication in Human–AI Co-Creation: Perceptual Analysis of Paintings Generated by Text-to- Image System.Applied Sciences12, 22 (2022). doi:10.3390/app122211312

work page doi:10.3390/app122211312 2022
[19]

Yichuan Mo, Hui Huang, Mingjie Li, Ang Li, and Yisen Wang. 2024. TERD: a unified framework for safeguarding diffusion models against backdoors. In Proceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24). JMLR.org, Article 1463, 18 pages

2024
[20]

Alexander Quinn Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob Mcgrew, Ilya Sutskever, and Mark Chen. 2022. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. InProceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), K...

2022
[21]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. InProceedings of the 38th Inter- national Conference on Machine Learning (Proceedings of Machi...

2021
[22]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen
[23]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv:2204.06125 [cs.CV] https://arxiv.org/abs/2204.06125

work page internal anchor Pith review arXiv
[24]

Amon Rapp, Chiara Di Lodovico, Federico Torrielli, and Luigi Di Caro. 2025. How do people experience the images created by generative artificial intelligence? An exploration of people’s perceptions, appraisals, and emotions related to a Gen-AI text-to-image model and its creations.International Journal of Human-Computer Studies193 (2025), 103375. doi:10.1...

work page doi:10.1016/j.ijhcs.2024.103375 2025
[25]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684–10695

2022
[26]

Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. 2018. Deep One-Class Classification. InProceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 4393–4402. ht...

2018
[27]

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22500–22510

2023
[28]

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Den- ton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Lan- guage Understanding. arXiv:2205.11487 [cs.CV] https:/...

work page internal anchor Pith review arXiv 2022
[29]

Neural Computation , volume = 13, number = 7, pages =

Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the Sup- port of a High-Dimensional Distribution.Neural Computation13, 7 (07 2001), 1443–1471. arXiv:https://direct.mit.edu/neco/article- pdf/13/7/1443/814849/089976601750264965.pdf doi:10.1162/089976601750264965

work page doi:10.1162/089976601750264965 2001
[30]

Lukas Struppek, Dominik Hintersdorf, and Kristian Kersting. 2023. Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis. In2023 IEEE/CVF International Conference on Computer Vision (ICCV). 4561–4573. doi:10.1109/ICCV51070.2023.00423 Conference’17, July 2017, Washington, DC, USA Li et al

work page doi:10.1109/iccv51070.2023.00423 2023
[31]

Vu Tuan Truong, Luan Ba Dang, and Long Bao Le. 2025. Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey.ACM Comput. Surv. 57, 8, Article 216 (April 2025), 44 pages. doi:10.1145/3721479

work page doi:10.1145/3721479 2025
[32]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InAdvances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proc...

2017
[33]

Hao Wang, Shangwei Guo, Jialing He, Kangjie Chen, Shudong Zhang, Tianwei Zhang, and Tao Xiang. 2024. EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second. InProceedings of the 32nd ACM International Conference on Multimedia(Melbourne VIC, Australia)(MM ’24). Association for Computing Machinery, New York, NY, USA, 3657–3665. doi:10.1145/36646...

work page doi:10.1145/3664647.3680689 2024
[34]

Zhongqi Wang, Jie Zhang, Shiguang Shan, and Xilin Chen. 2025. T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models. InComputer Vision – ECCV 2024, Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and Gül Varol (Eds.). Springer Nature Switzerland, Cham, 107– 124

2025
[35]

Zhongqi Wang, Jie Zhang, Shiguang Shan, and Xilin Chen. 2026. Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models. IEEE Transactions on Pattern Analysis and Machine Intelligence48, 3 (2026), 3652–

2026
[36]

doi:10.1109/TPAMI.2025.3644016

work page doi:10.1109/tpami.2025.3644016 2025
[37]

Shengfang Zhai, Yinpeng Dong, Qingni Shen, Shi Pu, Yuejian Fang, and Hang Su. 2023. Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning. InProceedings of the 31st ACM International Confer- ence on Multimedia(Ottawa ON, Canada)(MM ’23). Association for Computing Machinery, New York, NY, USA, 1577–1587. doi:10.1145/3581...

work page doi:10.1145/3581783.3612108 2023
[38]

Shengfang Zhai, Jiajun Li, Yue Liu, Huanran Chen, Zhihua Tian, Wenjie Qu, Qingni Shen, Ruoxi Jia, Yinpeng Dong, and Jiaheng Zhang. 2025. Efficient Input- level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Varia- tion. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 15182–15193

2025
[39]

Chenyu Zhang, Mingwang Hu, Wenhui Li, and Lanjun Wang. 2025. Adversarial attacks and defenses on text-to-image diffusion models: A survey.Information Fusion114 (2025), 102701. doi:10.1016/j.inffus.2024.102701

work page doi:10.1016/j.inffus.2024.102701 2025
[40]

Jie Zhang, Zhongqi Wang, Shiguang Shan, and Xilin Chen. 2025. Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models. arXiv:2503.17724 [cs.CV] https://arxiv.org/abs/2503.17724

work page arXiv 2025
[41]

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV). 3836–3847. A Proof of Theorem 3.1 In this appendix, we spell out the regularity conditions used in Theorem 3.1 and give the full proof. Throughout, we fix a den...

2023