pith. sign in

arxiv: 2604.16481 · v1 · submitted 2026-04-12 · 💻 cs.CV · cs.AI

Erasing Thousands of Concepts: Towards Scalable and Practical Concept Erasure for Text-to-Image Diffusion Models

Pith reviewed 2026-05-10 15:35 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords concept erasuretext-to-image diffusionscalable safetymixture modeloptimal transportmixture of expertsrobustnessembedding manipulation
0
0 comments X

The pith

Text-to-image diffusion models can have thousands of unwanted concepts erased while keeping generation quality and resisting attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that models clusters of concept information inside text embeddings as mixtures of Student's t-distributions. It then applies affine optimal transport to shift away the target clusters while holding the boundaries of the remaining clusters in place, without needing hand-picked anchor examples. A mixture-of-experts module is trained to perform the removal on the embeddings and is hardened against removal attacks by adding noise to the projector during fine-tuning. If the approach works at the claimed scale, safety teams could clean large public models of many prohibited or copyrighted concepts at once instead of handling them one by one or a few hundred at a time.

Core claim

Low-rank concept distributions in text embeddings are captured by a Student's t-distribution Mixture Model that supports pin-point erasure of target concepts through affine optimal transport; boundaries of non-target distributions are preserved without pre-defined anchors. A Mixture-of-Experts module called MoEraser is then trained to delete the target embeddings while retaining the anchor embeddings, with noise injected into the text embedding projector during fine-tuning to confer robustness against white-box attacks such as module removal. Experiments across more than two thousand concepts and multiple diffusion models show that the combined procedure maintains generation quality.

What carries the argument

Student's t-distribution Mixture Model for low-rank concept distributions, combined with affine optimal transport for targeted shifts and a noise-hardened Mixture-of-Experts eraser module that selectively removes target embeddings while anchoring the rest.

If this is right

  • Thousands of concepts can be removed from a single model in one training pass instead of sequential small-scale edits.
  • The same model continues to produce high-quality images on unrelated prompts after the large-scale edits.
  • The erasure survives direct attempts to strip out the added module because noise training forces the underlying network to internalize the change.
  • No separate list of safe anchor concepts is required to protect wanted outputs during the process.
  • The procedure transfers across different diffusion architectures and across visual domains without per-model redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar embedding-space modeling could be applied to video or 3D generators if their text conditioning follows comparable low-rank structure.
  • Layering this erasure step with prompt filters or output classifiers would create defense-in-depth against both accidental and adversarial misuse.
  • The upper limit on simultaneous erasures may be set by how many distinct t-distribution components the embedding space can support before overlap becomes unavoidable.
  • Once the mixture parameters are learned, the method might allow selective re-introduction of erased concepts by reversing the transport map without full retraining.

Load-bearing premise

The Student's t-distribution Mixture Model must accurately capture the low-rank structure of concept distributions in the text embeddings so that targets can be moved without distorting the surrounding concepts or the overall image-generation capability.

What would settle it

Run the method on a model, then measure the fraction of prompts that still produce the erased concept and compare FID or CLIP scores on standard image benchmarks before and after; if either the concept reappears at high rates or quality metrics drop substantially, the central claim fails.

Figures

Figures reproduced from arXiv: 2604.16481 by Byung Hyun Lee, Hoigi Seo, Jaehyun Cho, Se Young Chun, Sungjin Lim.

Figure 2
Figure 2. Figure 2: , the concept embedding distribution empirically ex￾hibits a heavy-tailed behavior. Intuitively, since the em￾beddings are inherently “in-distribution”, genuine out-of￾distribution samples that reflect true variability are scarce. The tMM naturally models heavier tails, enabling better modeling of variability within the concept. Remark 1. Concept distribution modeling. A con￾cept in a T2I diffusion model c… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative rationale on NIR. We generated images with the prompt “a photo of Morgan Freeman” using the origi￾nal text-embedding projection Wproj. (left) and a corrupted weight Wcor. (right) on SDv1.4 and SDv3.5-L. When sufficient noise is injected, the models fail to produce high-fidelity images; without the MoEraser module to restore the generation, the model becomes unusable, enhancing robustness to whi… view at source ↗
Figure 5
Figure 5. Figure 5: MoEraser architecture and training. (a) A MoE with GLU experts scales to heterogeneous domain concepts; training maps ftar to fmap while leaving fanc unchanged. (b) To make the module non-removable, we inject structured noise into the text embedding projector and fine-tune the safety module to reconstruct the original embedding, improving robustness to white-box attacks such as module removal. Remark 4. No… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative results on erasing 2,072 concepts from SDv1.4. Among baseline methods (MACE, UCE, CPE, SPEED, and SAFREE), most remove the target concept but often degrade image fidelity, and SAFREE struggles to erase concepts at the large scale. For the preservation of remaining concepts, baseline methods typically alter the original composition or distort remaining concepts. ETC achieves precise removal of t… view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative results on erasing 515 concepts from SDv3.5-L. SAFREE reproduces the original image and fails to remove the target concept. SPEED removes the target concept but degrades fidelity, and this degradation also affects the remaining concepts. In contrast, ETC achieves accurate concept erasure while preserving remaining concepts on the SDv3.5-L, demonstrating applicability. 4.6. Ablation studies We c… view at source ↗
Figure 8
Figure 8. Figure 8: Load heatmap of experts. We visualize the frequency ratio of selection of each expert for three domains where each col￾umn represents an expert, and each row corresponds to a domain. The relatively uniform load distribution across experts suggests that the router network effectively balances expert utilization. all noise types caused similar degradation of the target con￾cept, but structured noise better p… view at source ↗
read the original abstract

Large-scale text-to-image (T2I) diffusion models deliver remarkable visual fidelity but pose safety risks due to their capacity to reproduce undesirable content, such as copyrighted ones. Concept erasure has emerged as a mitigation strategy, yet existing approaches struggle to balance scalability, precision, and robustness, which restricts their applicability to erasing only a few hundred concepts. To address these limitations, we present Erasing Thousands of Concepts (ETC), a scalable framework capable of erasing thousands of concepts while preserving generation quality. Our method first models low-rank concept distributions via a Student's t-distribution Mixture Model (tMM). It enables pin-point erasure of target concepts via affine optimal transport while preserving others by anchoring the boundaries of target concept distributions without pre-defined anchor concepts. We then train a Mixture-of-Experts (MoE)-based module, termed MoEraser, which removes target embeddings while preserving the anchor embeddings. By injecting noise into the text embedding projector and fine-tuning MoEraser for recovery, our framework achieves robustness to white-box attack such as module removal. Extensive experiments on over 2,000 concepts across heterogeneous domains and diffusion models demerate state-of-the-art scalability and precision in large-scale concept erasure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Erasing Thousands of Concepts (ETC), a framework for scalable concept erasure in text-to-image diffusion models. It first fits low-rank concept distributions in CLIP text embeddings using a Student's t-distribution Mixture Model (tMM), then applies affine optimal transport to erase target concepts while anchoring distribution boundaries to preserve non-targets without requiring pre-defined anchors. A Mixture-of-Experts module (MoEraser) is trained to remove target embeddings while retaining anchors, with noise injection into the text embedding projector during fine-tuning to confer robustness against white-box attacks such as module removal. The authors claim state-of-the-art scalability and precision based on experiments involving over 2,000 concepts across heterogeneous domains and multiple diffusion models.

Significance. If the central claims hold, the work would be significant for enabling practical, large-scale safety interventions in deployed T2I systems by addressing the scalability bottleneck of prior erasure methods (limited to hundreds of concepts). The tMM-plus-affine-transport construction for anchor-free boundary preservation and the MoE-based recovery with attack robustness represent technically interesting modeling choices that could generalize beyond the reported setting. The scale of the claimed evaluation (2000+ concepts) would also provide a useful benchmark for the community if accompanied by reproducible metrics.

major comments (3)
  1. [Abstract and §3] Abstract and §3 (tMM modeling): The central claim that the Student's t-distribution Mixture Model accurately captures low-rank structure in text embeddings to enable precise anchoring and erasure at 2000+ scale lacks any supporting quantitative evidence such as per-component likelihoods, Kolmogorov-Smirnov statistics, ablation on distribution family (t vs. Gaussian), or embedding dimensionality analysis. Without these, it is impossible to verify that the subsequent affine optimal transport step actually achieves selective removal while preserving anchors.
  2. [Abstract and §4] Abstract and §4 (MoEraser and experiments): The abstract asserts SOTA scalability, precision, and robustness on >2000 concepts yet supplies no numerical results, error bars, ablation tables, or attack success rates. This renders the claims of preserved generation quality and white-box robustness unverifiable and load-bearing for the paper's contribution.
  3. [§3.2] §3.2 (affine optimal transport): The assertion that affine optimal transport can erase targets while anchoring boundaries without pre-defined anchors is presented as a key innovation, but no derivation, closed-form solution, or proof of boundary preservation is referenced; if the transport map is learned rather than parameter-free, the 'anchor-free' claim requires explicit justification against baselines that do use anchors.
minor comments (2)
  1. [Abstract] Abstract: 'demerate' is a typographical error and should be 'demonstrate'.
  2. [§4] Notation: The distinction between 'target embeddings' and 'anchor embeddings' in the MoEraser description is introduced without a formal definition or diagram; a small schematic would improve clarity.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. We have addressed each major comment below and will incorporate revisions to strengthen the manuscript's clarity, evidence, and rigor.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (tMM modeling): The central claim that the Student's t-distribution Mixture Model accurately captures low-rank structure in text embeddings to enable precise anchoring and erasure at 2000+ scale lacks any supporting quantitative evidence such as per-component likelihoods, Kolmogorov-Smirnov statistics, ablation on distribution family (t vs. Gaussian), or embedding dimensionality analysis. Without these, it is impossible to verify that the subsequent affine optimal transport step actually achieves selective removal while preserving anchors.

    Authors: We agree that direct quantitative validation of the tMM would improve verifiability. In the revised manuscript we will add per-component log-likelihoods for the fitted mixtures, Kolmogorov-Smirnov goodness-of-fit statistics on the CLIP embeddings, an explicit ablation replacing the t-distribution with a Gaussian mixture model (reporting effects on erasure precision and anchor preservation), and an analysis of effective embedding dimensionality and rank. These additions will directly support the modeling choice before the affine transport step. revision: yes

  2. Referee: [Abstract and §4] Abstract and §4 (MoEraser and experiments): The abstract asserts SOTA scalability, precision, and robustness on >2000 concepts yet supplies no numerical results, error bars, ablation tables, or attack success rates. This renders the claims of preserved generation quality and white-box robustness unverifiable and load-bearing for the paper's contribution.

    Authors: We acknowledge that the abstract and experimental reporting should be more explicit. We will revise the abstract to state the key quantitative outcomes (erasure success rates, FID scores for generation quality, and white-box attack success rates) and expand §4 with complete tables that include error bars (standard deviation across runs), full ablation studies on MoEraser components, and statistical comparisons. The existing experiments already cover >2000 concepts; the revision will make all supporting numbers and variability measures prominent and reproducible. revision: yes

  3. Referee: [§3.2] §3.2 (affine optimal transport): The assertion that affine optimal transport can erase targets while anchoring boundaries without pre-defined anchors is presented as a key innovation, but no derivation, closed-form solution, or proof of boundary preservation is referenced; if the transport map is learned rather than parameter-free, the 'anchor-free' claim requires explicit justification against baselines that do use anchors.

    Authors: We thank the referee for this observation. The transport map is obtained in closed form from the parameters of the fitted tMM; we will add a self-contained derivation in the appendix that shows how the affine map is computed to shift only the target component while fixing the boundary points defined by the mixture. We will also include a direct comparison against anchor-based baselines to justify the anchor-free formulation. A fully general proof of boundary preservation under arbitrary distribution shifts lies beyond the scope of the current work. revision: partial

standing simulated objections not resolved
  • A complete, general proof of boundary preservation for the affine optimal transport under all possible distribution shifts.

Circularity Check

0 steps flagged

No circularity: ETC constructs new modeling, transport, and training steps from data without self-referential reduction.

full rationale

The derivation begins with fitting a tMM to low-rank text embeddings (a data-driven modeling step), applies affine optimal transport to define erasure targets while anchoring distribution boundaries (a geometric operation on the fitted model), and trains MoEraser via noise injection and recovery fine-tuning (an optimization procedure). None of these steps reduce by definition to their inputs or to a fitted parameter renamed as a prediction; the framework adds independent components rather than deriving results tautologically. No load-bearing self-citations or uniqueness theorems from prior author work appear in the abstract or description. The chain is self-contained as a constructive pipeline.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The approach rests on domain assumptions about concept distributions in embedding space and introduces new modeling entities without independent evidence or prior validation provided in the abstract.

axioms (2)
  • domain assumption Low-rank concept distributions in text embeddings of diffusion models can be accurately modeled by a Student's t-distribution Mixture Model
    Invoked as the first step for pinpoint erasure via optimal transport.
  • ad hoc to paper Affine optimal transport can erase target concepts while anchoring boundaries to preserve non-target concepts without predefined anchors
    Central mechanism claimed to enable scalability and precision.
invented entities (2)
  • MoEraser no independent evidence
    purpose: Mixture-of-Experts module that removes target embeddings while preserving anchor embeddings
    New component trained to perform selective erasure with robustness via noise injection.
  • tMM no independent evidence
    purpose: Student's t-distribution Mixture Model for modeling low-rank concept distributions
    New modeling choice to enable the erasure technique.

pith-pipeline@v0.9.0 · 5526 in / 1360 out tokens · 70595 ms · 2026-05-10T15:35:43.438655+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 4 internal anchors

  1. [1]

    Persistent anti-muslim bias in large language models.AAAI/ACM on AI, Ethics, and Society, 2021

    Abubakar Abid, Maheen Farooqi, and James Zou. Persistent anti-muslim bias in large language models.AAAI/ACM on AI, Ethics, and Society, 2021. 2

  2. [2]

    Erasing more than intended? how concept erasure degrades the generation of non-target concepts.Proceedings of the IEEE/CVF International Conference on Computer Vision,

    Ibtihel Amara, Ahmed Imtiaz Humayun, Ivana Kajic, Zarana Parekh, Natalie Harris, Sarah Young, Chirag Nagpal, Na- joung Kim, Junfeng He, Cristina Nader Vasconcelos, et al. Erasing more than intended? how concept erasure degrades the generation of non-target concepts.Proceedings of the IEEE/CVF International Conference on Computer Vision,

  3. [3]

    Model-based classification via mixtures of mul- tivariate t-distributions.Computational Statistics & Data Analysis, 2011

    Jeffrey L Andrews, Paul D McNicholas, and Sanjeena Subedi. Model-based classification via mixtures of mul- tivariate t-distributions.Computational Statistics & Data Analysis, 2011. 1, 2

  4. [4]

    Multimodal word distributions.ACL, 2017

    Ben Athiwaratkun and Andrew Wilson. Multimodal word distributions.ACL, 2017. 2

  5. [5]

    Probabilistic fasttext for multi-sense word embed- dings.ACL, 2018

    Ben Athiwaratkun, Andrew Wilson, and Animashree Anand- kumar. Probabilistic fasttext for multi-sense word embed- dings.ACL, 2018. 2

  6. [6]

    Qwen2.5-VL Technical Report

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhao- hai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Jun- yang Lin. Qwen2.5-vl technical repor...

  7. [7]

    Nudenet: Neural nets for nudity de- tection and censoring.https://nudenet.notai

    Praneeth Bedapudi. Nudenet: Neural nets for nudity de- tection and censoring.https://nudenet.notai. tech/, 2022. 2

  8. [8]

    Large image datasets: A pyrrhic win for computer vision?WACV, 2021

    Abeba Birhane and Vinay Uday Prabhu. Large image datasets: A pyrrhic win for computer vision?WACV, 2021. 2

  9. [9]

    Erasing undesir- able concepts in diffusion models with adversarial preserva- tion.NeurIPS, 2024

    Anh Bui, Long Vuong, Khanh Doan, Trung Le, Paul Mon- tague, Tamas Abraham, and Dinh Phung. Erasing undesir- able concepts in diffusion models with adversarial preserva- tion.NeurIPS, 2024. 1

  10. [10]

    Fantastic targets for concept erasure in diffusion models and where to find them.ICLR, 2025

    Anh Tuan Bui, Thuy-Trang Vu, Long Tung Vuong, Trung Le, Paul Montague, Tamas Abraham, Junae Kim, and Dinh Phung. Fantastic targets for concept erasure in diffusion models and where to find them.ICLR, 2025. 3, 12

  11. [11]

    Pixart-σ: Weak-to-strong training of dif- fusion transformer for 4k text-to-image generation.ECCV,

    Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, and Zhenguo Li. Pixart-σ: Weak-to-strong training of dif- fusion transformer for 4k text-to-image generation.ECCV,

  12. [12]

    Word2vec.Natural Language Engi- neering, 2017

    Kenneth Ward Church. Word2vec.Natural Language Engi- neering, 2017. 15

  13. [13]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blis- tein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025. 1, 2

  14. [14]

    Language modeling with gated convolutional net- works.ICML, 2017

    Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. Language modeling with gated convolutional net- works.ICML, 2017. 4

  15. [15]

    Measuring and mitigating unintended bias in text classification.AAAI/ACM on AI, Ethics, and Society,

    Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. Measuring and mitigating unintended bias in text classification.AAAI/ACM on AI, Ethics, and Society,

  16. [16]

    Scaling recti- fied flow transformers for high-resolution image synthesis

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling recti- fied flow transformers for high-resolution image synthesis. ICML, 2024. 2, 5

  17. [17]

    Salun: Empowering machine un- learning via gradient-based weight saliency in both image classification and generation.ICLR, 2024

    Chongyu Fan, Jiancheng Liu, Yihua Zhang, Dennis Wei, Eric Wong, and Sijia Liu. Salun: Empowering machine un- learning via gradient-based weight saliency in both image classification and generation.ICLR, 2024. 1

  18. [18]

    Erasing concepts from diffusion models.ICCV, 2023

    Rohit Gandikota, Joanna Materzynska, Jaden Fiotto- Kaufman, and David Bau. Erasing concepts from diffusion models.ICCV, 2023. 1, 2, 5, 6, 19

  19. [19]

    Unified concept editing in dif- fusion models.WACV, 2024

    Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzy´nska, and David Bau. Unified concept editing in dif- fusion models.WACV, 2024. 1, 2, 5, 6, 15, 16, 19

  20. [20]

    Reliable and efficient concept erasure of text- to-image diffusion models.ECCV, 2024

    Chao Gong, Kai Chen, Zhipeng Wei, Jingjing Chen, and Yu- Gang Jiang. Reliable and efficient concept erasure of text- to-image diffusion models.ECCV, 2024. 3, 19

  21. [21]

    Giphy celebrity detector.https://github

    Nick Hasty, Ihor Kroosh, Dmitry V oitekh, and Dmytro Ko- rduban. Giphy celebrity detector.https://github. com/Giphy/celeb-detection-oss, 2024. 6, 15, 16

  22. [22]

    Selective amnesia: A continual learning approach to forgetting in deep generative models

    Alvin Heng and Harold Soh. Selective amnesia: A continual learning approach to forgetting in deep generative models. NeurIPS, 2023. 1, 2

  23. [23]

    Clipscore: A reference-free evaluation met- ric for image captioning.EMNLP, 2021

    Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation met- ric for image captioning.EMNLP, 2021. 5, 6, 16

  24. [24]

    Gans trained by a two time-scale update rule converge to a local nash equilib- rium.NeurIPS, 2017

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium.NeurIPS, 2017. 6, 16

  25. [25]

    Lora: Low-rank adaptation of large language models.ICLR, 2021

    Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 2021. 2

  26. [26]

    Token merging for training- free semantic binding in text-to-image synthesis.NeurIPS,

    Taihang Hu, Linxuan Li, Joost van de Weijer, Hongcheng Gao, Fahad Shahbaz Khan, Jian Yang, Ming-Ming Cheng, Kai Wang, and Yaxing Wang. Token merging for training- free semantic binding in text-to-image synthesis.NeurIPS,

  27. [27]

    Receler: Reli- able concept erasing of text-to-image diffusion models via lightweight erasers.ECCV, 2023

    Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai, Yung- Hsuan Lai, and Yu-Chiang Frank Wang. Receler: Reli- able concept erasing of text-to-image diffusion models via lightweight erasers.ECCV, 2023. 1, 4, 5

  28. [28]

    Robo-writers: the rise and risks of language-generating ai.Nature, 2021

    Matthew Hutson. Robo-writers: the rise and risks of language-generating ai.Nature, 2021. 2

  29. [29]

    Mixtral of Experts

    Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Deven- dra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts.arXiv preprint arXiv:2401.04088, 2024. 1, 4

  30. [30]

    Overcoming catastrophic forgetting in neu- ral networks.Proceedings of the national academy of sci- ences, 2017

    James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, et al. Overcoming catastrophic forgetting in neu- ral networks.Proceedings of the national academy of sci- ences, 2017. 2

  31. [31]

    Ablating con- cepts in text-to-image diffusion models.ICCV, 2023

    Nupur Kumari, Bingliang Zhang, Sheng-Yu Wang, Eli Shechtman, Richard Zhang, and Jun-Yan Zhu. Ablating con- cepts in text-to-image diffusion models.ICCV, 2023. 1, 2

  32. [32]

    Nsfw detection machine learning model

    Gant Laborde. Nsfw detection machine learning model. https : / / github . com / GantMan / nsfw _ model,

  33. [33]

    Flux.1-dev.https : / / huggingface

    Black Forest Labs. Flux.1-dev.https : / / huggingface . co / black - forest - labs / FLUX.1-dev, 2024. 1, 2

  34. [34]

    Online continual learning on hierarchical label expansion.ICCV, 2023

    Byung Hyun Lee, Okchul Jung, Jonghyun Choi, and Se Young Chun. Online continual learning on hierarchical label expansion.ICCV, 2023. 2

  35. [35]

    Dou- bly perturbed task free continual learning.AAAI, 2024

    Byung Hyun Lee, Min-hwan Oh, and Se Young Chun. Dou- bly perturbed task free continual learning.AAAI, 2024. 2

  36. [36]

    Lo- calized concept erasure for text-to-image diffusion models using training-free gated low-rank adaptation.CVPR, 2025

    Byung Hyun Lee, Sungjin Lim, and Se Young Chun. Lo- calized concept erasure for text-to-image diffusion models using training-free gated low-rank adaptation.CVPR, 2025. 1, 2, 3, 4, 5

  37. [37]

    Concept pinpoint eraser for text- to-image diffusion models via residual attention gate.ICLR,

    Byung Hyun Lee, Sungjin Lim, Seunggyu Lee, Dong Un Kang, and Se Young Chun. Concept pinpoint eraser for text- to-image diffusion models via residual attention gate.ICLR,

  38. [38]

    1, 2, 3, 4, 5, 6, 15, 16, 19

  39. [39]

    Gshard: Scaling giant models with conditional computation and automatic sharding.ICLR,

    Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. Gshard: Scaling giant models with conditional computation and automatic sharding.ICLR,

  40. [40]

    Speed: Scalable, precise, and efficient concept erasure for diffusion models.arXiv preprint arXiv:2503.07392,

    Ouxiang Li, Yuan Wang, Xinting Hu, Houcheng Jiang, Tao Liang, Yanbin Hao, Guojun Ma, and Fuli Feng. Speed: Scal- able, precise, and efficient concept erasure for diffusion mod- els.arXiv preprint arXiv:2503.07392, 2025. 2, 4, 5, 6, 17

  41. [41]

    Safetydpo: Scalable safety alignment for text-to-image generation,

    Runtao Liu, Chen I Chieh, Jindong Gu, Jipeng Zhang, Renjie Pi, Qifeng Chen, Philip Torr, Ashkan Khakzar, and Fabio Pizzati. Safetydpo: Scalable safety alignment for text-to- image generation.arXiv preprint arXiv:2412.10493, 2024. 1

  42. [42]

    Mace: Mass concept erasure in diffusion models.CVPR, 2024

    Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, and Adams Wai-Kin Kong. Mace: Mass concept erasure in diffusion models.CVPR, 2024. 1, 2, 3, 4, 5, 6, 15, 16, 19

  43. [43]

    One-dimensional adapter to rule them all: Concepts, diffusion models and erasing applications.CVPR, 2024

    Mengyao Lyu, Yuhong Yang, Haiwen Hong, Hui Chen, Xuan Jin, Yuan He, Hui Xue, Jungong Han, and Guiguang Ding. One-dimensional adapter to rule them all: Concepts, diffusion models and erasing applications.CVPR, 2024. 1, 2

  44. [44]

    Holistic unlearning benchmark: A multi-faceted eval- uation for text-to-image diffusion model unlearning.ICCV,

    Saemi Moon, Minjong Lee, Sangdon Park, and Dongwoo Kim. Holistic unlearning benchmark: A multi-faceted eval- uation for text-to-image diffusion model unlearning.ICCV,

  45. [45]

    Glide: Towards photorealis- tic image generation and editing with text-guided diffusion models.ICML, 2022

    Alexander Quinn Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob Mcgrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealis- tic image generation and editing with text-guided diffusion models.ICML, 2022. 1

  46. [46]

    Stable diffusion 1 vs 2 - what you need to know.https://www.assemblyai.com/blog/ stable- diffusion- 1- vs- 2- what- you- need- to-know/, 2022

    Ryan O’connor. Stable diffusion 1 vs 2 - what you need to know.https://www.assemblyai.com/blog/ stable- diffusion- 1- vs- 2- what- you- need- to-know/, 2022. 2

  47. [47]

    Dall-e 2 preview - risks and limitations, 2022

    OpenAI. Dall-e 2 preview - risks and limitations, 2022. 1, 2

  48. [48]

    Edit- ing implicit assumptions in text-to-image diffusion models

    Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov. Edit- ing implicit assumptions in text-to-image diffusion models. ICCV, 2023. 2

  49. [49]

    Scalable diffusion models with transformers.ICCV, 2023

    William Peebles and Saining Xie. Scalable diffusion models with transformers.ICCV, 2023. 14

  50. [50]

    Robust mixture mod- elling using the t distribution.Statistics and computing,

    David Peel and Geoffrey J McLachlan. Robust mixture mod- elling using the t distribution.Statistics and computing,

  51. [51]

    Learn- ing transferable visual models from natural language super- vision.ICML, 2021

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision.ICML, 2021. 16

  52. [52]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image gen- eration with clip latents.arXiv preprint arXiv:2204.06125,

  53. [53]

    Red-teaming the stable diffusion safety filter.NeurIPS ML Safety Workshop, 2022

    Javier Rando, Daniel Paleka, David Lindner, Lennart Heim, and Florian Tramer. Red-teaming the stable diffusion safety filter.NeurIPS ML Safety Workshop, 2022. 1, 2

  54. [54]

    Scaling vision with sparse mix- ture of experts.NeurIPS, 2021

    Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, and Neil Houlsby. Scaling vision with sparse mix- ture of experts.NeurIPS, 2021. 1, 4

  55. [55]

    Experience replay for continual learning.NeurIPS, 2019

    David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lil- licrap, and Gregory Wayne. Experience replay for continual learning.NeurIPS, 2019. 2

  56. [56]

    Stable diffusion 2.0 release.https:// stability

    Robin Rombach. Stable diffusion 2.0 release.https:// stability . ai / news / stable - diffusion - v2 - release, 2022. 2, 19

  57. [57]

    High-resolution image syn- thesis with latent diffusion models.CVPR, 2022

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image syn- thesis with latent diffusion models.CVPR, 2022. 1, 2, 5

  58. [58]

    Stable diffusion v1 model card.https://huggingface.co/CompVis/ stable-diffusion-v1-4, 2022

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. Stable diffusion v1 model card.https://huggingface.co/CompVis/ stable-diffusion-v1-4, 2022. 2, 6, 8, 12, 13, 14, 19

  59. [59]

    Photorealistic text-to-image diffusion models with deep language understanding.NeurIPS, 2022

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding.NeurIPS, 2022. 1

  60. [60]

    Safe latent diffusion: Mitigating inappro- priate degeneration in diffusion models.CVPR, 2023

    Patrick Schramowski, Manuel Brack, Björn Deiseroth, and Kristian Kersting. Safe latent diffusion: Mitigating inappro- priate degeneration in diffusion models.CVPR, 2023. 1, 2, 6

  61. [61]

    Laion-5b: An open large-scale dataset for training next generation image-text models.NeurIPS, 2022

    Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, et al. Laion-5b: An open large-scale dataset for training next generation image-text models.NeurIPS, 2022. 2

  62. [62]

    Geometrical prop- erties of text token embeddings for strong semantic binding in text-to-image generation, 2025

    Hoigi Seo, Junseo Bang, Haechang Lee, Joohoon Lee, Byung Hyun Lee, and Se Young Chun. Geometrical prop- erties of text token embeddings for strong semantic binding in text-to-image generation, 2025. 13

  63. [63]

    Outra- geously large neural networks: The sparsely-gated mixture- of-experts layer.ICLR, 2017

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outra- geously large neural networks: The sparsely-gated mixture- of-experts layer.ICLR, 2017. 1

  64. [64]

    Diffusion art or digital forgery? investigating data replication in diffusion models.CVPR,

    Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Diffusion art or digital forgery? investigating data replication in diffusion models.CVPR,

  65. [65]

    Stereo: A two- stage framework for adversarially robust concept erasing from text-to-image diffusion models.CVPR, 2025

    Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer, Vishal M Patel, and Karthik Nandakumar. Stereo: A two- stage framework for adversarially robust concept erasing from text-to-image diffusion models.CVPR, 2025. 2, 3

  66. [66]

    Demys- tifying mmd gans.ICLR, 2018

    JD Sutherland, Michael Arbel, and Arthur Gretton. Demys- tifying mmd gans.ICLR, 2018. 6, 16

  67. [67]

    Ring-a-bell! how reliable are concept removal methods for diffusion models?ICLR, 2024

    Yu-Lin Tsai, Chia-yi Hsu, Chulin Xie, Chih-hsun Lin, Jia You Chen, Bo Li, Pin-Yu Chen, Chia-Mu Yu, and Chun- ying Huang. Ring-a-bell! how reliable are concept removal methods for diffusion models?ICLR, 2024. 6, 18, 19

  68. [68]

    Word representations via gaussian embedding.ICLR, 2015

    Luke Vilnis and Andrew McCallum. Word representations via gaussian embedding.ICLR, 2015. 2

  69. [69]

    Safree: Training-free and adaptive guard for safe text-to-image and video generation.ICLR, 2025

    Jaehong Yoon, Shoubin Yu, Vaidehi Patil, Huaxiu Yao, and Mohit Bansal. Safree: Training-free and adaptive guard for safe text-to-image and video generation.ICLR, 2025. 2, 5, 6, 16

  70. [70]

    arXiv preprint arXiv:2303.17591 , year=

    Eric Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, and Humphrey Shi. Forget-me-not: Learning to forget in text-to- image diffusion models.arXiv preprint arXiv:2303.17591,

  71. [71]

    The unreasonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018. 5

  72. [72]

    for now , author=

    Yimeng Zhang, Jinghan Jia, Xin Chen, Aochuan Chen, Yi- hua Zhang, Jiancheng Liu, Ke Ding, and Sijia Liu. To gener- ate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images... for now.arXiv preprint arXiv:2310.11868, 2023. 18, 19

  73. [73]

    Defensive unlearning with adversarial training for robust concept erasure in diffusion models.NeurIPS, 2024

    Yimeng Zhang, Xin Chen, Jinghan Jia, Yihua Zhang, Chongyu Fan, Jiancheng Liu, Mingyi Hong, Ke Ding, and Sijia Liu. Defensive unlearning with adversarial training for robust concept erasure in diffusion models.NeurIPS, 2024. 1, 6 Erasing Thousands of Concepts: Towards Scalable and Practical Concept Erasure for Text-to-Image Diffusion Model Supplementary Ma...