Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models

Bingrong Dai; Lei Zhang; Lin Wang; Yu Pan

arxiv: 2502.20650 · v5 · submitted 2025-02-28 · 💻 cs.CV · cs.CR

Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models

Lei Zhang , Yu Pan , Bingrong Dai , Lin Wang This is my paper

Pith reviewed 2026-05-23 02:32 UTC · model grok-4.3

classification 💻 cs.CV cs.CR

keywords backdoor attackdiffusion modelsstylistic triggersimage generationadversarial noisebackdoor detectionmodel vulnerabilities

0 comments

The pith

Diffusion models are vulnerable to backdoor attacks using stylistic features as triggers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Gungnir as a backdoor attack on diffusion models that uses stylistic features in images as triggers. It develops Reconstructing-Adversarial Noise and Short-Term Timesteps-Retention to maintain these triggers during generation. The attack produces images that appear clean to both humans and automated systems. This shows that current defenses against backdoors in diffusion models can be evaded by high-level style triggers. Readers should care because it reveals new ways generative AI can be compromised without obvious signs.

Core claim

Gungnir activates malicious behaviors through style-based triggers embedded in input images. Reconstructing-Adversarial Noise (RAN) and Short-Term Timesteps-Retention (STTR) preserve trigger-consistent diffusion dynamics, making the samples perceptually indistinguishable from clean images. The attack bypasses state-of-the-art defenses with an extremely low backdoor detection rate and remains effective under fine-tuning-based purification.

What carries the argument

Reconstructing-Adversarial Noise (RAN) and Short-Term Timesteps-Retention (STTR) to preserve stylistic triggers across the diffusion process.

If this is right

Existing backdoor detection methods are ineffective against style-based triggers.
The backdoor effect persists after fine-tuning-based purification.
Stylistic features expand the space of possible triggers beyond low-dimensional ones.
Diffusion models have vulnerabilities to high-level input manipulations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Defenses could be improved by incorporating checks for stylistic consistency.
The approach might generalize to other generative models.
Practical deployment of diffusion models may need additional safeguards against style triggers.

Load-bearing premise

Stylistic features can be reliably preserved as consistent, high-level triggers across the diffusion process without being captured by existing detectors.

What would settle it

A test showing that standard backdoor detectors achieve high detection rates on the style-embedded images or that the attack loses its effect after fine-tuning the model.

Figures

Figures reproduced from arXiv: 2502.20650 by Bingrong Dai, Lei Zhang, Lin Wang, Yu Pan.

**Figure 1.** Figure 1: Overview our Gungnir method enables attackers to activate a backdoor in diffusion models through a specific [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: A backdoor attack operates on the principle that when an attacker supplies an input containing a predefined [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview our approach Gungnir, utilizing RAN and STTR, successfully implements the style of the input [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Evaluating the baseline models performance across different training epochs. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: The metrics of different step configurations of STTR and RAN strength [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: In the text-to-image task, Gungnir remains effective: when the model generates a stylized image during the [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

read the original abstract

Diffusion Models (DMs) have achieved remarkable success in image generation, yet recent studies reveal their vulnerability to backdoor attacks, where adversaries manipulate outputs via covert triggers embedded in inputs. Existing defenses, such as backdoor detection and trigger inversion, are largely effective because prior attacks rely on limited input spaces and low-dimensional triggers that are visually conspicuous or easily captured by neural detectors. To broaden the threat landscape, we propose Gungnir, a novel backdoor attack that activates malicious behaviors through style-based triggers embedded in input images. Unlike explicit visual patches or textual cues, stylistic features serve as stealthy, high-level triggers. We introduce Reconstructing-Adversarial Noise (RAN) and Short-Term Timesteps-Retention (STTR) to preserve trigger-consistent diffusion dynamics in image-to-image tasks. The resulting trigger-embedded samples are perceptually indistinguishable from clean images, evading both manual and automated detection. Extensive experiments show that Gungnir bypasses state-of-the-art defenses with an extremely low backdoor detection rate (BDR) and remains effective under fine-tuning-based purification, revealing previously underexplored vulnerabilities in diffusion models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Style triggers for backdoors look like a reasonable extension but the abstract gives no numbers to judge if RAN and STTR actually deliver the claimed low detection rates.

read the letter

The main point is that the paper tries to move backdoor attacks on diffusion models from obvious patches or text prompts to stylistic features in the input image, using two new pieces called Reconstructing-Adversarial Noise and Short-Term Timesteps-Retention to keep the trigger consistent across the diffusion steps. That is the actual novelty: prior attacks were limited to low-dimensional, visually salient triggers that detectors could catch, so shifting to high-level style makes sense as a way to test whether current defenses are complete. The framing in the abstract is clear about why this matters for image-to-image tasks. The paper does a decent job laying out the limitation of existing work and proposing components that target the preservation problem directly. If the experiments show the trigger survives without introducing detectable artifacts, it would be a useful data point for security evaluations of generative models. The soft spot is straightforward: the abstract states that Gungnir achieves an extremely low backdoor detection rate and stays effective after fine-tuning-based purification, yet supplies no quantitative results, no baselines, and no experimental setup. Without those details it is impossible to tell whether the claims are supported or whether the stylistic trigger remains stable enough in practice. The preservation step is the load-bearing assumption, and the new methods are introduced to address it, but the abstract alone does not let a reader verify that they succeed. This is the sort of paper security researchers working on generative models would want to read to see if higher-level features create new attack surfaces. It deserves a serious referee because the direction is distinct from the cited prior attacks and the question is practically relevant, even though the current version is light on evidence. I would send it to review so the experiments can be checked properly.

Referee Report

2 major / 0 minor

Summary. The paper proposes Gungnir, a backdoor attack on diffusion models that uses stylistic features in input images as high-level, stealthy triggers. It introduces two new components, Reconstructing-Adversarial Noise (RAN) and Short-Term Timesteps-Retention (STTR), to preserve trigger-consistent diffusion dynamics in image-to-image tasks. The central claim is that the resulting attacks achieve an extremely low backdoor detection rate (BDR), evade state-of-the-art defenses including detection and trigger inversion, and remain effective after fine-tuning-based purification.

Significance. If the experimental claims hold with rigorous quantitative support, the work would be significant for expanding the threat model of diffusion models beyond low-dimensional, visually conspicuous triggers to high-level stylistic features. This could inform the design of more robust defenses against previously underexplored attack surfaces in generative models.

major comments (2)

[Abstract] Abstract: the claim that Gungnir 'bypasses state-of-the-art defenses with an extremely low backdoor detection rate (BDR)' and 'remains effective under fine-tuning-based purification' is asserted without any quantitative results, baselines, attack success rates, BDR values, or experimental setup details. This prevents verification that the data supports the central effectiveness claim.
[Method] Method description (RAN and STTR): the preservation of stylistic features as consistent high-level triggers across the diffusion process is presented as the key innovation, yet the abstract supplies no equations, ablation results, or quantitative evidence that these components achieve trigger consistency without being captured by existing detectors. This is load-bearing for the stealth and effectiveness claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for quantitative support in the abstract. We agree that the abstract would be strengthened by including key metrics and will revise it accordingly. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that Gungnir 'bypasses state-of-the-art defenses with an extremely low backdoor detection rate (BDR)' and 'remains effective under fine-tuning-based purification' is asserted without any quantitative results, baselines, attack success rates, BDR values, or experimental setup details. This prevents verification that the data supports the central effectiveness claim.

Authors: We acknowledge that the abstract presents these claims at a high level without specific numbers. The full manuscript reports concrete results, including BDR below 5% for Gungnir versus substantially higher rates for prior attacks, attack success rates exceeding 90%, and retention of effectiveness after fine-tuning purification, with comparisons to state-of-the-art defenses in Sections 4 and 5. We will revise the abstract to incorporate these key quantitative values and a brief note on the experimental setup to make the claims verifiable from the abstract alone. revision: yes
Referee: [Method] Method description (RAN and STTR): the preservation of stylistic features as consistent high-level triggers across the diffusion process is presented as the key innovation, yet the abstract supplies no equations, ablation results, or quantitative evidence that these components achieve trigger consistency without being captured by existing detectors. This is load-bearing for the stealth and effectiveness claims.

Authors: Abstracts are space-limited summaries and do not include equations or full ablation tables. The manuscript provides the equations for RAN and STTR in Section 3, with ablation studies in Section 4.3 quantifying their contribution to trigger consistency and the resulting low BDR. These results show that the components enable stylistic triggers to evade detectors. We will revise the abstract to include a concise statement on the roles of RAN and STTR backed by the reported quantitative outcomes, though equations themselves will remain in the method section. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces a new backdoor attack (Gungnir) on diffusion models via stylistic triggers, supported by two new components (RAN and STTR) for preserving trigger consistency. No equations, fitted parameters, or derivation steps are described that reduce by construction to prior inputs, self-citations, or renamed known results. The central claims rest on empirical construction and experimental results rather than any self-referential mathematical chain. This is a standard empirical security paper with independent content; the reader's assessment of score 1.0 aligns with the absence of load-bearing circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim depends on the unverified effectiveness of two newly introduced techniques (RAN and STTR) for preserving triggers in diffusion dynamics, with no independent evidence or external benchmarks cited in the abstract.

axioms (1)

standard math Diffusion models operate via standard forward noise addition and reverse denoising processes that can be influenced by input features.
This is a foundational assumption for all diffusion model research invoked implicitly when discussing trigger preservation.

invented entities (2)

Reconstructing-Adversarial Noise (RAN) no independent evidence
purpose: Preserve trigger-consistent diffusion dynamics in image-to-image tasks
Newly proposed component with no external validation or prior evidence mentioned.
Short-Term Timesteps-Retention (STTR) no independent evidence
purpose: Preserve trigger-consistent diffusion dynamics in image-to-image tasks
Newly proposed component with no external validation or prior evidence mentioned.

pith-pipeline@v0.9.0 · 5734 in / 1290 out tokens · 75929 ms · 2026-05-23T02:32:53.282771+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 5 internal anchors

[1]

Generative AI in mobile networks: a survey

Athanasios Karapantelakis et al. “Generative AI in mobile networks: a survey”. In:Annals of Telecommunications 79.1 (2024), pp. 15–33

work page 2024
[2]

Adoption and impacts of generative artificial intelligence: Theoretical underpinnings and research agenda

Ruchi Gupta et al. “Adoption and impacts of generative artificial intelligence: Theoretical underpinnings and research agenda”. In: International Journal of Information Management Data Insights 4.1 (2024), p. 100232

work page 2024
[4]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. “Denoising diffusion probabilistic models”. In:Advances in neural information processing systems 33 (2020), pp. 6840–6851

work page 2020
[5]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. “Denoising diffusion implicit models”. In: arXiv preprint arXiv:2010.02502 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[6]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. “Adding conditional control to text-to-image diffusion models”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023, pp. 3836–3847

work page 2023
[7]

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Tianhao Qi et al. “DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, pp. 8693–8702

work page 2024
[8]

Backdoor Learning: A Survey

Yiming Li et al. “Backdoor Learning: A Survey”. In: IEEE Transactions on Neural Networks and Learning Systems 35.1 (2024), pp. 5–22. DOI: 10.1109/TNNLS.2022.3182979

work page doi:10.1109/tnnls.2022.3182979 2024
[9]

How to backdoor diffusion models?

Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. “How to backdoor diffusion models?” In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, pp. 4015–4024

work page 2023
[10]

Rickrolling the artist: Injecting backdoors into text encoders for text-to-image synthesis

Lukas Struppek, Dominik Hintersdorf, and Kristian Kersting. “Rickrolling the artist: Injecting backdoors into text encoders for text-to-image synthesis”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023, pp. 4584–4596

work page 2023
[11]

Invisible Backdoor Attacks on Diffusion Models

Sen Li, Junchi Ma, and Minhao Cheng. “Invisible Backdoor Attacks on Diffusion Models”. In:arXiv preprint arXiv:2406.00816 (2024)

work page arXiv 2024
[12]

Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models

Changjiang Li et al. “Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models”. In:arXiv preprint arXiv:2406.09669 (2024)

work page arXiv 2024
[13]

DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-World

Wenli Sun et al. “DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-World”. In:arXiv preprint arXiv:2405.19990 (2024)

work page arXiv 2024
[14]

Attacks and defenses for generative diffusion models: A comprehensive survey

Vu Tuan Truong, Luan Ba Dang, and Long Bao Le. “Attacks and defenses for generative diffusion models: A comprehensive survey”. In:arXiv preprint arXiv:2408.03400 (2024). 9

work page arXiv 2024
[15]

A survey of backdoor attacks and defenses on large language models: Implications for security measures,

Shuai Zhao et al. “A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures”. In: arXiv preprint arXiv:2406.06852 (2024)

work page arXiv 2024
[16]

BadNets: Evaluating Backdooring Attacks on Deep Neural Networks

Tianyu Gu et al. “BadNets: Evaluating Backdooring Attacks on Deep Neural Networks”. In: IEEE Access 7 (2019), pp. 47230–47244. DOI: 10.1109/ACCESS.2019.2909068

work page doi:10.1109/access.2019.2909068 2019
[17]

Poisoned forgery face: Towards backdoor attacks on face forgery detection

Jiawei Liang et al. “Poisoned forgery face: Towards backdoor attacks on face forgery detection”. In: arXiv preprint arXiv:2402.11473 (2024)

work page arXiv 2024
[18]

Exploiting Backdoors of Face Synthesis Detection with Natural Triggers

Xiaoxuan Han et al. “Exploiting Backdoors of Face Synthesis Detection with Natural Triggers”. In: ACM Transactions on Multimedia Computing, Communications and Applications (2024)

work page 2024
[19]

MakeupAttack: Feature Space Black-box Backdoor Attack on Face Recognition via Makeup Transfer

Ming Sun et al. “MakeupAttack: Feature Space Black-box Backdoor Attack on Face Recognition via Makeup Transfer”. In: arXiv preprint arXiv:2408.12312 (2024)

work page arXiv 2024
[20]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song et al. “Score-Based Generative Modeling through Stochastic Differential Equations”. In:International Conference on Learning Representations. 2021

work page 2021
[21]

High-resolution image synthesis with latent diffusion models

Robin Rombach et al. “High-resolution image synthesis with latent diffusion models”. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022, pp. 10684–10695

work page 2022
[22]

Auto-Encoding Variational Bayes

Diederik P Kingma. “Auto-encoding variational bayes”. In: arXiv preprint arXiv:1312.6114 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[23]

Invisible Backdoor Attack with Sample-Specific Triggers

Yuezun Li et al. “Invisible Backdoor Attack with Sample-Specific Triggers”. In:IEEE International Conference on Computer Vision (ICCV). 2021

work page 2021
[24]

An Invisible Black-Box Backdoor Attack Through Frequency Domain

Tong Wang et al. “An Invisible Black-Box Backdoor Attack Through Frequency Domain”. In:Computer Vision – ECCV 2022. Ed. by Shai Avidan et al. Cham: Springer Nature Switzerland, 2022, pp. 396–413

work page 2022
[25]

Trojdiff: Trojan attacks on diffusion models with diverse targets

Weixin Chen, Dawn Song, and Bo Li. “Trojdiff: Trojan attacks on diffusion models with diverse targets”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, pp. 4035–4044

work page 2023
[26]

TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

Yichuan Mo et al. “TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors”. In: ICML. 2024

work page 2024
[27]

Elijah: Eliminating backdoors injected in diffusion models via distribution shift

Shengwei An et al. “Elijah: Eliminating backdoors injected in diffusion models via distribution shift”. In: Proceedings of the AAAI Conference on Artificial Intelligence. V ol. 38. 10. 2024, pp. 10847–10855

work page 2024
[28]

Understanding Random Forests: From Theory to Practice

Gilles Louppe. “Understanding random forests: From theory to practice”. In:arXiv preprint arXiv:1407.7502 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[29]

T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models

Zhongqi Wang et al. “T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models”. In: Computer Vision – ECCV 2024 . Cham: Springer Nature Switzerland, 2025, pp. 107–124. ISBN : 978-3-031- 73013-9

work page 2024
[30]

VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models

Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. “VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models”. In: Advances in Neural Information Processing Systems. Ed. by A. Oh et al. V ol. 36. Curran Associates, Inc., 2023, pp. 33912–33964

work page 2023
[31]

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

Nataniel Ruiz et al. “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023, pp. 22500–22510

work page 2023
[32]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Hu Ye et al. “Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models”. In:arXiv preprint arXiv:2308.06721 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[33]

U-net: Convolutional networks for biomedical image seg- mentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for biomedical image seg- mentation”. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer. 2015, pp. 234–241

work page 2015
[34]

Semantic-Guided Latent Space Backdoor Attack: a Novel Threat to Stable Diffusion

Yu Pan et al. Semantic-Guided Latent Space Backdoor Attack: a Novel Threat to Stable Diffusion. Tech. rep. EasyChair, 2024

work page 2024
[35]

EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation

Tianyu Wei et al. “EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation”. In: arXiv preprint arXiv:2406.15863 (2024)

work page arXiv 2024
[36]

Photorealistic text-to-image diffusion models with deep language understanding

Chitwan Saharia et al. “Photorealistic text-to-image diffusion models with deep language understanding”. In: Advances in neural information processing systems 35 (2022), pp. 36479–36494

work page 2022
[37]

Palette: Image-to-image diffusion models

Chitwan Saharia et al. “Palette: Image-to-image diffusion models”. In: ACM SIGGRAPH 2022 conference proceedings. 2022, pp. 1–10

work page 2022
[38]

Microsoft coco: Common objects in context

Tsung-Yi Lin et al. “Microsoft coco: Common objects in context”. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 . Springer. 2014, pp. 740–755

work page 2014
[39]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Dustin Podell et al. “Sdxl: Improving latent diffusion models for high-resolution image synthesis”. In: arXiv preprint arXiv:2307.01952 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[40]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel et al. “Gans trained by a two time-scale update rule converge to a local nash equilibrium”. In: Advances in neural information processing systems 30 (2017). 10 A Detailed Proof of Section 3.4 We show that using traditional input-output samples and full-timestep injection is ineffective for training high- dimensional feature triggers like ima...

work page 2017

[1] [1]

Generative AI in mobile networks: a survey

Athanasios Karapantelakis et al. “Generative AI in mobile networks: a survey”. In:Annals of Telecommunications 79.1 (2024), pp. 15–33

work page 2024

[2] [2]

Adoption and impacts of generative artificial intelligence: Theoretical underpinnings and research agenda

Ruchi Gupta et al. “Adoption and impacts of generative artificial intelligence: Theoretical underpinnings and research agenda”. In: International Journal of Information Management Data Insights 4.1 (2024), p. 100232

work page 2024

[3] [4]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. “Denoising diffusion probabilistic models”. In:Advances in neural information processing systems 33 (2020), pp. 6840–6851

work page 2020

[4] [5]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. “Denoising diffusion implicit models”. In: arXiv preprint arXiv:2010.02502 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[5] [6]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. “Adding conditional control to text-to-image diffusion models”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023, pp. 3836–3847

work page 2023

[6] [7]

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Tianhao Qi et al. “DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, pp. 8693–8702

work page 2024

[7] [8]

Backdoor Learning: A Survey

Yiming Li et al. “Backdoor Learning: A Survey”. In: IEEE Transactions on Neural Networks and Learning Systems 35.1 (2024), pp. 5–22. DOI: 10.1109/TNNLS.2022.3182979

work page doi:10.1109/tnnls.2022.3182979 2024

[8] [9]

How to backdoor diffusion models?

Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. “How to backdoor diffusion models?” In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, pp. 4015–4024

work page 2023

[9] [10]

Rickrolling the artist: Injecting backdoors into text encoders for text-to-image synthesis

Lukas Struppek, Dominik Hintersdorf, and Kristian Kersting. “Rickrolling the artist: Injecting backdoors into text encoders for text-to-image synthesis”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023, pp. 4584–4596

work page 2023

[10] [11]

Invisible Backdoor Attacks on Diffusion Models

Sen Li, Junchi Ma, and Minhao Cheng. “Invisible Backdoor Attacks on Diffusion Models”. In:arXiv preprint arXiv:2406.00816 (2024)

work page arXiv 2024

[11] [12]

Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models

Changjiang Li et al. “Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models”. In:arXiv preprint arXiv:2406.09669 (2024)

work page arXiv 2024

[12] [13]

DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-World

Wenli Sun et al. “DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-World”. In:arXiv preprint arXiv:2405.19990 (2024)

work page arXiv 2024

[13] [14]

Attacks and defenses for generative diffusion models: A comprehensive survey

Vu Tuan Truong, Luan Ba Dang, and Long Bao Le. “Attacks and defenses for generative diffusion models: A comprehensive survey”. In:arXiv preprint arXiv:2408.03400 (2024). 9

work page arXiv 2024

[14] [15]

A survey of backdoor attacks and defenses on large language models: Implications for security measures,

Shuai Zhao et al. “A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures”. In: arXiv preprint arXiv:2406.06852 (2024)

work page arXiv 2024

[15] [16]

BadNets: Evaluating Backdooring Attacks on Deep Neural Networks

Tianyu Gu et al. “BadNets: Evaluating Backdooring Attacks on Deep Neural Networks”. In: IEEE Access 7 (2019), pp. 47230–47244. DOI: 10.1109/ACCESS.2019.2909068

work page doi:10.1109/access.2019.2909068 2019

[16] [17]

Poisoned forgery face: Towards backdoor attacks on face forgery detection

Jiawei Liang et al. “Poisoned forgery face: Towards backdoor attacks on face forgery detection”. In: arXiv preprint arXiv:2402.11473 (2024)

work page arXiv 2024

[17] [18]

Exploiting Backdoors of Face Synthesis Detection with Natural Triggers

Xiaoxuan Han et al. “Exploiting Backdoors of Face Synthesis Detection with Natural Triggers”. In: ACM Transactions on Multimedia Computing, Communications and Applications (2024)

work page 2024

[18] [19]

MakeupAttack: Feature Space Black-box Backdoor Attack on Face Recognition via Makeup Transfer

Ming Sun et al. “MakeupAttack: Feature Space Black-box Backdoor Attack on Face Recognition via Makeup Transfer”. In: arXiv preprint arXiv:2408.12312 (2024)

work page arXiv 2024

[19] [20]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song et al. “Score-Based Generative Modeling through Stochastic Differential Equations”. In:International Conference on Learning Representations. 2021

work page 2021

[20] [21]

High-resolution image synthesis with latent diffusion models

Robin Rombach et al. “High-resolution image synthesis with latent diffusion models”. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022, pp. 10684–10695

work page 2022

[21] [22]

Auto-Encoding Variational Bayes

Diederik P Kingma. “Auto-encoding variational bayes”. In: arXiv preprint arXiv:1312.6114 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013

[22] [23]

Invisible Backdoor Attack with Sample-Specific Triggers

Yuezun Li et al. “Invisible Backdoor Attack with Sample-Specific Triggers”. In:IEEE International Conference on Computer Vision (ICCV). 2021

work page 2021

[23] [24]

An Invisible Black-Box Backdoor Attack Through Frequency Domain

Tong Wang et al. “An Invisible Black-Box Backdoor Attack Through Frequency Domain”. In:Computer Vision – ECCV 2022. Ed. by Shai Avidan et al. Cham: Springer Nature Switzerland, 2022, pp. 396–413

work page 2022

[24] [25]

Trojdiff: Trojan attacks on diffusion models with diverse targets

Weixin Chen, Dawn Song, and Bo Li. “Trojdiff: Trojan attacks on diffusion models with diverse targets”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, pp. 4035–4044

work page 2023

[25] [26]

TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

Yichuan Mo et al. “TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors”. In: ICML. 2024

work page 2024

[26] [27]

Elijah: Eliminating backdoors injected in diffusion models via distribution shift

Shengwei An et al. “Elijah: Eliminating backdoors injected in diffusion models via distribution shift”. In: Proceedings of the AAAI Conference on Artificial Intelligence. V ol. 38. 10. 2024, pp. 10847–10855

work page 2024

[27] [28]

Understanding Random Forests: From Theory to Practice

Gilles Louppe. “Understanding random forests: From theory to practice”. In:arXiv preprint arXiv:1407.7502 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[28] [29]

T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models

Zhongqi Wang et al. “T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models”. In: Computer Vision – ECCV 2024 . Cham: Springer Nature Switzerland, 2025, pp. 107–124. ISBN : 978-3-031- 73013-9

work page 2024

[29] [30]

VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models

Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. “VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models”. In: Advances in Neural Information Processing Systems. Ed. by A. Oh et al. V ol. 36. Curran Associates, Inc., 2023, pp. 33912–33964

work page 2023

[30] [31]

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

Nataniel Ruiz et al. “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023, pp. 22500–22510

work page 2023

[31] [32]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Hu Ye et al. “Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models”. In:arXiv preprint arXiv:2308.06721 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[32] [33]

U-net: Convolutional networks for biomedical image seg- mentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for biomedical image seg- mentation”. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer. 2015, pp. 234–241

work page 2015

[33] [34]

Semantic-Guided Latent Space Backdoor Attack: a Novel Threat to Stable Diffusion

Yu Pan et al. Semantic-Guided Latent Space Backdoor Attack: a Novel Threat to Stable Diffusion. Tech. rep. EasyChair, 2024

work page 2024

[34] [35]

EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation

Tianyu Wei et al. “EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation”. In: arXiv preprint arXiv:2406.15863 (2024)

work page arXiv 2024

[35] [36]

Photorealistic text-to-image diffusion models with deep language understanding

Chitwan Saharia et al. “Photorealistic text-to-image diffusion models with deep language understanding”. In: Advances in neural information processing systems 35 (2022), pp. 36479–36494

work page 2022

[36] [37]

Palette: Image-to-image diffusion models

Chitwan Saharia et al. “Palette: Image-to-image diffusion models”. In: ACM SIGGRAPH 2022 conference proceedings. 2022, pp. 1–10

work page 2022

[37] [38]

Microsoft coco: Common objects in context

Tsung-Yi Lin et al. “Microsoft coco: Common objects in context”. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 . Springer. 2014, pp. 740–755

work page 2014

[38] [39]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Dustin Podell et al. “Sdxl: Improving latent diffusion models for high-resolution image synthesis”. In: arXiv preprint arXiv:2307.01952 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[39] [40]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel et al. “Gans trained by a two time-scale update rule converge to a local nash equilibrium”. In: Advances in neural information processing systems 30 (2017). 10 A Detailed Proof of Section 3.4 We show that using traditional input-output samples and full-timestep injection is ineffective for training high- dimensional feature triggers like ima...

work page 2017