Retrievals Can Be Detrimental: Unveiling the Backdoor Vulnerability of Retrieval-Augmented Diffusion Models
Pith reviewed 2026-05-23 05:08 UTC · model grok-4.3
The pith
Retrieval-augmented diffusion models can be backdoored to generate toxic content from specific text triggers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that retrieval-augmented diffusion models are susceptible to backdoor attacks. It introduces BadRDM, which inserts toxicity surrogates into the database and applies a malicious contrastive learning variant to the retriever so that text triggers cause retrieval of those surrogates. This controls the generated contents while the diffusion model itself stays unchanged. Experiments on mainstream tasks confirm strong attack success with little impact on normal performance.
What carries the argument
BadRDM, which uses multimodal contrastive learning to create shortcuts from text triggers to inserted toxicity surrogates in the retrieval database.
If this is right
- The attack preserves the model's benign utility on non-trigger inputs.
- Entropy-based selection and generative augmentation strategies yield more effective toxicity surrogates.
- Backdoors can be injected into the retriever independently of the diffusion generation process.
- Outstanding attack effects are shown on two mainstream tasks such as text-to-image generation.
Where Pith is reading between the lines
- Other retrieval-augmented generation systems may share similar vulnerabilities if they rely on contrastive retrievers.
- Secure deployment of RDMs would require additional checks on retrieved items or retriever robustness.
- Attackers could extend this to target specific content types beyond toxicity.
Load-bearing premise
A malicious variant of contrastive learning can build reliable shortcuts from text triggers to the inserted toxicity surrogates without harming normal retrieval accuracy.
What would settle it
Running the attack and then checking whether images generated from trigger prompts consistently contain the toxic features or whether the retriever ranks the surrogates highly for those triggers.
Figures
read the original abstract
Diffusion models (DMs) have recently demonstrated remarkable generation capability. However, their training generally requires huge computational resources and large-scale datasets. To solve these, recent studies empower DMs with the advanced Retrieval-Augmented Generation (RAG) technique and propose retrieval-augmented diffusion models (RDMs). By incorporating rich knowledge from an auxiliary database, RAG enhances diffusion models' generation and generalization ability while significantly reducing model parameters. Despite the great success, RAG may introduce novel security issues that warrant further investigation. In this paper, we reveal that the RDM is susceptible to backdoor attacks by proposing a multimodal contrastive attack approach named BadRDM. Our framework fully considers RAG's characteristics and is devised to manipulate the retrieved items for given text triggers, thereby further controlling the generated contents. Specifically, we first insert a tiny portion of images into the retrieval database as target toxicity surrogates. Subsequently, a malicious variant of contrastive learning is adopted to inject backdoors into the retriever, which builds shortcuts from triggers to the toxicity surrogates. Furthermore, we enhance the attacks through novel entropy-based selection and generative augmentation strategies that can derive better toxicity surrogates. Extensive experiments on two mainstream tasks demonstrate the proposed BadRDM achieves outstanding attack effects while preserving the model's benign utility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that retrieval-augmented diffusion models (RDMs) are vulnerable to backdoor attacks via a proposed multimodal contrastive method called BadRDM. The attack inserts a small set of toxicity surrogate images into the retrieval database, then applies a malicious variant of contrastive learning to the retriever to create shortcuts from text triggers to these surrogates. Entropy-based selection and generative augmentation are used to improve surrogate quality. Experiments on two mainstream tasks are reported to achieve high attack success rates while preserving benign model utility.
Significance. If the quantitative results hold, the work is significant because it identifies a practical security vulnerability that exploits the retrieval component of RDMs, which is otherwise presented as an efficiency advantage. The explicit attack construction (trigger insertion, contrastive poisoning, surrogate strategies) and reported metrics on attack success versus clean utility provide a concrete, falsifiable demonstration that could guide defenses in retrieval-augmented generative systems.
major comments (2)
- [Attack pipeline] § on attack pipeline (retriever poisoning): the central claim that the contrastive shortcuts operate independently of the diffusion model requires an ablation showing attack success when the diffusion backbone is replaced or frozen; without this, the independence assumption remains untested and load-bearing for the 'RAG-specific' vulnerability narrative.
- [Experiments] Experimental results section, attack success table: the reported 'outstanding attack effects' must be accompanied by explicit baseline comparisons (e.g., random retrieval, direct diffusion poisoning, or non-contrastive trigger insertion) and precise definitions of success rate and utility metrics; absence of these undermines the claim that BadRDM is superior while preserving utility.
minor comments (2)
- [Abstract] Abstract: the two 'mainstream tasks' are not named; specify them (e.g., text-to-image, inpainting) for immediate clarity.
- [Notation] Notation: ensure RDM, RAG, and surrogate terminology are defined on first use and used consistently; minor inconsistencies in acronym expansion appear in the provided text.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the work's significance. We address each major comment below and will incorporate revisions accordingly.
read point-by-point responses
-
Referee: [Attack pipeline] § on attack pipeline (retriever poisoning): the central claim that the contrastive shortcuts operate independently of the diffusion model requires an ablation showing attack success when the diffusion backbone is replaced or frozen; without this, the independence assumption remains untested and load-bearing for the 'RAG-specific' vulnerability narrative.
Authors: We agree that an explicit ablation is needed to strengthen the claim of retriever-specific vulnerability. In the revised manuscript, we will add experiments freezing the diffusion backbone and replacing it with an alternative model (e.g., a different pre-trained DM), demonstrating that attack success rates remain comparable while clean utility is preserved. This will be placed in the attack pipeline section. revision: yes
-
Referee: [Experiments] Experimental results section, attack success table: the reported 'outstanding attack effects' must be accompanied by explicit baseline comparisons (e.g., random retrieval, direct diffusion poisoning, or non-contrastive trigger insertion) and precise definitions of success rate and utility metrics; absence of these undermines the claim that BadRDM is superior while preserving utility.
Authors: We acknowledge the need for clearer baselines and metric definitions. The original experiments include some implicit comparisons, but we will expand the results section to explicitly include baselines such as random retrieval, direct poisoning of the diffusion model, and non-contrastive trigger insertion. We will also add precise definitions: attack success rate as the percentage of triggered generations exhibiting toxicity (measured via a toxicity classifier), and utility via standard metrics like FID on clean prompts. These additions will be included in the revised version. revision: yes
Circularity Check
No significant circularity
full rationale
The manuscript is an empirical security paper that describes a procedural attack construction (trigger insertion, entropy-based surrogate selection, contrastive poisoning of the retriever) and reports experimental results on two tasks. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claim rests on the observable success of the described attack pipeline rather than any self-referential reduction or imported uniqueness theorem. This is the normal case for an attack paper and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The retriever component in RDMs can be independently trained or fine-tuned with a contrastive objective without altering the downstream diffusion generation process.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
a malicious variant of contrastive learning is adopted to inject backdoors into the retriever, which builds shortcuts from triggers to the toxicity surrogates
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Extensive experiments on two mainstream tasks demonstrate the proposed BadRDM achieves outstanding attack effects
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Cleanclip: Mitigating data poisoning attacks in multimodal contrastive learning
Hritik Bansal, Nishad Singhi, Yu Yang, Fan Yin, Aditya Grover, and Kai-Wei Chang. Cleanclip: Mitigating data poisoning attacks in multimodal contrastive learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 112–123, 2023. 8
work page 2023
-
[2]
Retrieval-augmented diffusion models
Andreas Blattmann, Robin Rombach, Kaan Oktay, Jonas M¨uller, and Bj ¨orn Ommer. Retrieval-augmented diffusion models. Advances in Neural Information Processing Sys- tems, 35:15309–15324, 2022. 1, 2, 3, 6, 8
work page 2022
-
[3]
Improving language models by retriev- ing from trillions of tokens
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. Improving language models by retriev- ing from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR, 2022. 2
work page 2022
-
[4]
Arantxa Casanova, Marlene Careil, Jakob Verbeek, Michal Drozdzal, and Adriana Romero Soriano. Instance- conditioned gan. Advances in Neural Information Process- ing Systems, 34:27517–27529, 2021. 2
work page 2021
-
[5]
Phan- tom: General trigger attacks on retrieval augmented lan- guage generation
Harsh Chaudhari, Giorgio Severi, John Abascal, Matthew Jagielski, Christopher A Choquette-Choo, Milad Nasr, Cristina Nita-Rotaru, and Alina Oprea. Phantom: General trigger attacks on retrieval augmented language generation. arXiv preprint arXiv:2405.20485, 2024. 3
-
[6]
Label-retrieval- augmented diffusion models for learning from noisy labels
Jian Chen, Ruiyi Zhang, Tong Yu, Rohan Sharma, Zhiqiang Xu, Tong Sun, and Changyou Chen. Label-retrieval- augmented diffusion models for learning from noisy labels. Advances in Neural Information Processing Systems , 36,
-
[7]
Re-imagen: Retrieval-augmented text-to-image gen- erator
Wenhu Chen, Hexiang Hu, Chitwan Saharia, and William W Cohen. Re-imagen: Retrieval-augmented text-to-image gen- erator. arXiv preprint arXiv:2209.14491, 2022. 2, 3
-
[8]
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 ,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases
Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases. Advances in Neural Informa- tion Processing Systems, 37:130185–130213, 2025. 3
work page 2025
-
[10]
Trojan- rag: Retrieval-augmented generation can be backdoor driver in large language models
Pengzhou Cheng, Yidong Ding, Tianjie Ju, Zongru Wu, Wei Du, Ping Yi, Zhuosheng Zhang, and Gongshen Liu. Trojan- rag: Retrieval-augmented generation can be backdoor driver in large language models. arXiv preprint arXiv:2405.13401,
-
[11]
Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. How to backdoor diffusion models? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4015–4024, 2023. 2, 3
work page 2023
-
[12]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 6
work page 2009
-
[13]
Diffusion models beat gans on image synthesis
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural informa- tion processing systems, 34:8780–8794, 2021. 2
work page 2021
-
[14]
Cpr: Retrieval augmented generation for copyright protec- tion
Aditya Golatkar, Alessandro Achille, Luca Zancato, Yu- Xiang Wang, Ashwin Swaminathan, and Stefano Soatto. Cpr: Retrieval augmented generation for copyright protec- tion. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 12374–12384,
-
[15]
Badnets: Evaluating backdooring attacks on deep neu- ral networks
Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Evaluating backdooring attacks on deep neu- ral networks. IEEE Access, 7:47230–47244, 2019. 3
work page 2019
-
[16]
Retrieval augmented language model pre- training
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. Retrieval augmented language model pre- training. In International conference on machine learning , pages 3929–3938. PMLR, 2020. 2
work page 2020
-
[17]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 6
work page 2016
-
[18]
Gans trained by a two time-scale update rule converge to a local nash equilib- rium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. Advances in neural information processing systems , 30, 2017. 6
work page 2017
-
[19]
Classifier-Free Diffusion Guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022. 2
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[20]
Denoising dif- fusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020. 1, 2
work page 2020
-
[21]
Densely connected convolutional net- works
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kil- ian Q Weinberger. Densely connected convolutional net- works. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017. 6
work page 2017
-
[22]
Nearest neighbor machine transla- tion
Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettle- moyer, and Mike Lewis. Nearest neighbor machine transla- tion. arXiv preprint arXiv:2010.00710, 2020. 2
-
[23]
Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Ui- jlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, et al. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. Interna- tional journal of computer vision , 128(7):1956–1981, 2020. 6
work page 1956
-
[24]
The role of imagenet classes in fr \’echet inception distance
Tuomas Kynk ¨a¨anniemi, Tero Karras, Miika Aittala, Timo Aila, and Jaakko Lehtinen. The role of imagenet classes in fr \’echet inception distance. arXiv preprint arXiv:2203.06026, 2022. 6
-
[25]
Con- trastive representation learning: A framework and review
Phuc H Le-Khac, Graham Healy, and Alan F Smeaton. Con- trastive representation learning: A framework and review. Ieee Access, 8:193907–193934, 2020. 5
work page 2020
-
[26]
Backdoors against natural language processing: A review
Shaofeng Li, Tian Dong, Benjamin Zi Hao Zhao, Minhui Xue, Suguo Du, and Haojin Zhu. Backdoors against natural language processing: A review. IEEE Security & Privacy , 20(5):50–59, 2022. 3
work page 2022
-
[27]
Yiming Li, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. Back- door learning: A survey. IEEE Transactions on Neural Net- works and Learning Systems, 35(1):5–22, 2022. 3
work page 2022
-
[28]
Badclip: Dual- embedding guided backdoor attack on multimodal con- trastive learning
Siyuan Liang, Mingli Zhu, Aishan Liu, Baoyuan Wu, Xiaochun Cao, and Ee-Chien Chang. Badclip: Dual- embedding guided backdoor attack on multimodal con- trastive learning. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 24645–24654, 2024. 8
work page 2024
-
[29]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014. 6
work page 2014
-
[30]
Retrieval-augmented diffusion models for time series fore- casting
Jingwei Liu, Ling Yang, Hongyan Li, and Shenda Hong. Retrieval-augmented diffusion models for time series fore- casting. arXiv preprint arXiv:2410.18712, 2024. 3
-
[31]
More control for free! im- age synthesis with semantic diffusion guidance
Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, and Trevor Darrell. More control for free! im- age synthesis with semantic diffusion guidance. In Proceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 289–299, 2023. 2
work page 2023
-
[32]
Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems , 35:5775–5787,
-
[33]
Dong Lu, Zhiqiang Wang, Teng Wang, Weili Guan, Hongchang Gao, and Feng Zheng. Set-level guidance at- tack: Boosting adversarial transferability of vision-language pre-training models. In Proceedings of the IEEE/CVF In- ternational Conference on Computer Vision, pages 102–111,
-
[34]
Gnn-lm: Language mod- eling based on global contexts via gnn
Yuxian Meng, Shi Zong, Xiaoya Li, Xiaofei Sun, Tianwei Zhang, Fei Wu, and Jiwei Li. Gnn-lm: Language mod- eling based on global contexts via gnn. arXiv preprint arXiv:2110.08743, 2021. 1, 2
-
[35]
Towards trustworthy re- trieval augmented generation for large language models: A survey
Bo Ni, Zheyuan Liu, Leyao Wang, Yongjia Lei, Yuying Zhao, Xueqi Cheng, Qingkai Zeng, Luna Dong, Yinglong Xia, Krishnaram Kenthapadi, et al. Towards trustworthy re- trieval augmented generation for large language models: A survey. arXiv preprint arXiv:2502.06872, 2025. 1
-
[36]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models.arXiv preprint arXiv:2112.10741, 2021. 2
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[37]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pages 8162–8171. PMLR,
-
[38]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. In International conference on machine learning, pages 8748–8763. PMLR, 2021. 2
work page 2021
-
[39]
The devil is in the gan: backdoor attacks and defenses in deep generative models
Ambrish Rawat, Killian Levacher, and Mathieu Sinn. The devil is in the gan: backdoor attacks and defenses in deep generative models. In European Symposium on Research in Computer Security, pages 776–783. Springer, 2022. 3
work page 2022
-
[40]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 1, 2, 6
work page 2022
-
[41]
Photorealistic text-to-image diffusion models with deep language understanding
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems, 35:36479–36494, 2022. 2
work page 2022
-
[42]
Baaan: Backdoor attacks against autoencoder and gan-based machine learning models
Ahmed Salem, Yannick Sautter, Michael Backes, Mathias Humbert, and Yang Zhang. Baaan: Backdoor attacks against autoencoder and gan-based machine learning models. arXiv preprint arXiv:2010.03007, 2020. 3
-
[43]
Laion-5b: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, et al. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural In- formation Processing Systems, 35:25278–25294, 2022. 2
work page 2022
-
[44]
Retrieval-augmented score distillation for text-to-3d gener- ation
Junyoung Seo, Susung Hong, Wooseok Jang, In `es Hyeonsu Kim, Minseop Kwak, Doyup Lee, and Seungryong Kim. Retrieval-augmented score distillation for text-to-3d gener- ation. arXiv preprint arXiv:2402.02972, 2024. 3
-
[45]
Conceptual captions: A cleaned, hypernymed, im- age alt-text dataset for automatic image captioning
Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. Conceptual captions: A cleaned, hypernymed, im- age alt-text dataset for automatic image captioning. In Pro- ceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 2556–2565, 2018. 2, 6
work page 2018
-
[46]
Morag–multi-fusion retrieval aug- mented generation for human motion
Kalakonda Sai Shashank, Shubh Maheshwari, and Ravi Ki- ran Sarvadevabhatla. Morag–multi-fusion retrieval aug- mented generation for human motion. arXiv preprint arXiv:2409.12140, 2024. 3
-
[47]
Knn- diffusion: Image generation via large-scale retrieval
Shelly Sheynin, Oron Ashual, Adam Polyak, Uriel Singer, Oran Gafni, Eliya Nachmani, and Yaniv Taigman. Knn- diffusion: Image generation via large-scale retrieval. arXiv preprint arXiv:2204.02849, 2022. 1, 2, 3
-
[48]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020. 1, 2
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[49]
Rickrolling the artist: Injecting backdoors into text en- coders for text-to-image synthesis
Lukas Struppek, Dominik Hintersdorf, and Kristian Kerst- ing. Rickrolling the artist: Injecting backdoors into text en- coders for text-to-image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 4584–4596, 2023. 3, 6, 7
work page 2023
-
[50]
On the diversity and realism of distilled dataset: An efficient dataset distilla- tion paradigm
Peng Sun, Bei Shi, Daiwei Yu, and Tao Lin. On the diversity and realism of distilled dataset: An efficient dataset distilla- tion paradigm. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 9390– 9399, 2024. 5
work page 2024
-
[51]
Retrievegan: Image synthesis via differentiable patch retrieval
Hung-Yu Tseng, Hsin-Ying Lee, Lu Jiang, Ming-Hsuan Yang, and Weilong Yang. Retrievegan: Image synthesis via differentiable patch retrieval. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16 , pages 242–257. Springer, 2020. 2
work page 2020
-
[52]
Eviledit: Backdooring text-to-image diffusion models in one second
Hao Wang, Shangwei Guo, Jialing He, Kangjie Chen, Shudong Zhang, Tianwei Zhang, and Tao Xiang. Eviledit: Backdooring text-to-image diffusion models in one second. In ACM Multimedia 2024, 2024. 2, 3, 6
work page 2024
-
[53]
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, and Jun- yang Lin. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution. arXiv preprint arXiv:2409.12191, 2024. 6
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[54]
Data poisoning attacks against multimodal encoders
Ziqing Yang, Xinlei He, Zheng Li, Michael Backes, Mathias Humbert, Pascal Berrang, and Yang Zhang. Data poisoning attacks against multimodal encoders. In International Con- ference on Machine Learning, pages 39299–39313. PMLR,
-
[55]
Text-to-image diffusion models can be easily backdoored through multimodal data poisoning
Shengfang Zhai, Yinpeng Dong, Qingni Shen, Shi Pu, Yue- jian Fang, and Hang Su. Text-to-image diffusion models can be easily backdoored through multimodal data poisoning. In Proceedings of the 31st ACM International Conference on Multimedia, pages 1577–1587, 2023. 2, 3, 8
work page 2023
-
[56]
Re- modiffuse: Retrieval-augmented motion diffusion model
Mingyuan Zhang, Xinying Guo, Liang Pan, Zhongang Cai, Fangzhou Hong, Huirong Li, Lei Yang, and Ziwei Liu. Re- modiffuse: Retrieval-augmented motion diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 364–373, 2023. 3
work page 2023
-
[57]
Yichi Zhang, Yao Huang, Yitong Sun, Chang Liu, Zhe Zhao, Zhengwei Fang, Yifan Wang, Huanran Chen, Xiao Yang, Xingxing Wei, et al. Benchmarking trustworthiness of multi- modal large language models: A comprehensive study.arXiv preprint arXiv:2406.07057, 2024. 6
-
[58]
Badcm: Invisible backdoor attack against cross-modal learning
Zheng Zhang, Xu Yuan, Lei Zhu, Jingkuan Song, and Liqiang Nie. Badcm: Invisible backdoor attack against cross-modal learning. IEEE Transactions on Image Process- ing, 2024. 6, 7
work page 2024
-
[59]
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wen- tao Zhang, and Bin Cui. Retrieval-augmented genera- tion for ai-generated content: A survey. arXiv preprint arXiv:2402.19473, 2024. 1, 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.