pith. sign in

arxiv: 2606.25548 · v1 · pith:N2BRS7YMnew · submitted 2026-06-24 · 💻 cs.CV · cs.LG

Concept Removal for Frontier Image Generative Models

Pith reviewed 2026-06-25 21:16 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords concept removaldiffusion modelsimage generative modelstranscoderbottleneck layermodel editingconcept suppressionadversarial robustness
0
0 comments X

The pith

Replacing the bottleneck layer with a trained transcoder lets image generative models selectively disable unwanted concepts while keeping output quality intact.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method that swaps the internal bottleneck layer in models such as SD3.5, Flux, and Infinity for a transcoder. The transcoder is trained both to match the original layer's behavior and to organize its activations into separate features tied to individual concepts. Once inserted, the transcoder acts as a built-in filter that can turn off specific concept signals without external add-ons. Because the change sits inside the model backbone, the removal stays in place even when users have full white-box access. Experiments show the approach outperforms prior methods on concept removal, preserves image quality, resists adversarial prompts, and supports removing multiple concepts one after another.

Core claim

The central claim is that an in-place substitution of the bottleneck layer with a transcoder trained to replicate the original layer while structuring activations into distinct, selectively disableable features creates an integrated filter that removes target concepts from diffusion and autoregressive image models without degrading overall generation behavior or requiring external components.

What carries the argument

The transcoder that replaces the bottleneck layer and structures its activations into distinct, selectively disableable features corresponding to individual concepts.

If this is right

  • The method achieves state-of-the-art concept removal on modern diffusion and autoregressive models.
  • Generation quality remains comparable to the unmodified model.
  • The removal resists adversarial prompts that try to elicit the disabled concept.
  • Multiple distinct concepts can be removed sequentially without cumulative degradation.
  • The edit persists under white-box access because it modifies the model backbone directly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same transcoder replacement might extend to other generative domains such as video or audio if those models also contain analogous bottleneck layers.
  • The structured activation features could be inspected to study which internal representations correspond to specific visual concepts.
  • Sequential removal capability suggests the approach could support ongoing, iterative safety updates after initial deployment.
  • Because the change is internal rather than an added filter, it may reduce the attack surface compared with external concept-removal modules.

Load-bearing premise

A transcoder can be trained to match the original bottleneck layer exactly while also organizing its activations so that individual concepts can be turned off without side effects on the rest of the model.

What would settle it

After the transcoder is inserted and a concept is disabled, either image quality drops measurably on standard benchmarks or the removed concept still appears reliably in generations from ordinary prompts.

Figures

Figures reproduced from arXiv: 2606.25548 by Adam Dziedzic, Aditya Kumar, Franziska Boenisch, Pierre Joly.

Figure 2
Figure 2. Figure 2: Overview of our BLOCK. We detail our transcoder￾based concept removal framework. 4.1. Our BLOCK Framework SOTA architectures, including both DMs and IARs, rely on one or more text encoders that generate embeddings to guide image generation. These embeddings are typi￾cally injected into the image-generative backbone via trans￾formations such as projection layers or MLPs applied to pooled text features, depe… view at source ↗
Figure 3
Figure 3. Figure 3: Multi-Concept Style Removal of Our Method and SOTA Baselines [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative results for style removal across models and baselines. Prompts and seeds are listed below each image. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results for object removal across models and baselines. Prompts and seeds are listed below each image. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Prompt used for LLaVA-based style classification in our evaluation. The model must select exactly one label, ensuring consistent evaluation across generated samples. Prompt for Object Classification with LLaVA Classify the object depicted in this image. Choose exactly one option from the numbered list. Respond with only the number. Object categories: 1. Architecture 2. Bear 3. Bird 4. Butterfly 5. Cat 6. D… view at source ↗
Figure 7
Figure 7. Figure 7: Prompt used for object classification in our LLaVA-based evaluation. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Multi-Concept Removal of Our Method and SOTA Baselines for Objects. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
read the original abstract

Image generative models are trained on massive, largely uncurated internet-scale datasets that contain undesirable visual concepts. Efficiently removing such concepts from the model generations without degrading the quality of output images remains challenging. We introduce a novel concept removal method for frontier diffusion and image autoregressive models, such as SD3.5, Flux, and Infinity. Our intervention replaces the internal bottleneck layer present in all these modern models with a transcoder that is trained to replicate the original layer while structuring it into distinct activation features. This in-place substitution creates an integrated filter through which concept-specific signals can be selectively disabled while preserving the rest of the model's behavior. Since the intervention modifies the model backbone rather than attaching an external component, it remains persistent under white-box access. Empirically, the approach achieves state-of-the-art concept removal performance across modern diffusion and autoregressive models, maintains visual generation quality, provides robustness against adversarial prompts, and supports sequential removal of diverse concepts. This positions our method as a practical approach for concept removal in frontier image generative models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims a novel concept removal technique for frontier diffusion and autoregressive image models (SD3.5, Flux, Infinity) that replaces the internal bottleneck layer with a transcoder trained to replicate the original layer's function while factoring its activations into distinct, selectively disableable features; the substitution is asserted to enable persistent, in-place concept filtering without external components, achieving SOTA removal performance, preserved generation quality, adversarial robustness, and support for sequential multi-concept removal.

Significance. If the central empirical claims were substantiated, the work would offer a practically significant advance by providing an integrated, persistent intervention inside the model backbone rather than an add-on filter. The approach could address a real deployment need for frontier models. However, the manuscript as presented supplies no quantitative evidence, so significance cannot be assessed.

major comments (2)
  1. [Abstract] Abstract: the assertions of 'state-of-the-art concept removal performance', 'maintains visual generation quality', 'robustness against adversarial prompts', and 'supports sequential removal' are presented without any metrics, baselines, ablation studies, or experimental details, so the claims cannot be evaluated.
  2. [Abstract] Abstract (central claim paragraph): the dual requirement that the transcoder 'replicate the original layer' while 'structuring it into distinct activation features' that can be 'selectively disabled' with 'no side effects on overall model behavior' is asserted but unsupported; no reconstruction loss values, feature-disentanglement metrics, or trade-off ablations are supplied to address the tension between exact replication and clean per-concept disablement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for identifying the lack of quantitative support for the abstract claims. We agree these assertions require explicit metrics and will revise the manuscript to include them.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertions of 'state-of-the-art concept removal performance', 'maintains visual generation quality', 'robustness against adversarial prompts', and 'supports sequential removal' are presented without any metrics, baselines, ablation studies, or experimental details, so the claims cannot be evaluated.

    Authors: We agree the abstract overclaims without evidence. The revised version will incorporate specific quantitative results (e.g., removal accuracy vs. baselines, FID scores for quality, adversarial robustness rates, and sequential removal success) directly into the abstract, with pointers to the experimental sections. revision: yes

  2. Referee: [Abstract] Abstract (central claim paragraph): the dual requirement that the transcoder 'replicate the original layer' while 'structuring it into distinct activation features' that can be 'selectively disabled' with 'no side effects on overall model behavior' is asserted but unsupported; no reconstruction loss values, feature-disentanglement metrics, or trade-off ablations are supplied to address the tension between exact replication and clean per-concept disablement.

    Authors: We accept this criticism. The revision will add reported reconstruction losses, disentanglement metrics (such as feature correlation or activation independence scores), and ablation studies on the replication-vs.-removal trade-off to demonstrate that selective disablement occurs without side effects. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architectural substitution with no derivation reducing to fitted inputs or self-citations

full rationale

The paper presents an empirical intervention: replacing a bottleneck layer with a trained transcoder that replicates the original while enabling selective feature disablement. No mathematical derivation chain, equations, or 'predictions' are claimed that reduce by construction to the training inputs. No self-citation load-bearing steps, uniqueness theorems, or ansatzes imported from prior author work are invoked in the provided text. The central claim rests on training and empirical validation rather than definitional equivalence or fitted-parameter renaming. This is a standard non-circular empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unverified assumption that the bottleneck layer admits a transcoder replacement that disentangles concept-specific signals while preserving all other behavior; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption The bottleneck layer in diffusion and autoregressive models can be replaced in-place by a transcoder without altering the model's overall generative capability beyond the targeted features.
    Invoked in the description of the intervention as the basis for creating an integrated filter.

pith-pipeline@v0.9.1-grok · 5714 in / 1197 out tokens · 27586 ms · 2026-06-25T21:16:52.139219+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

160 extracted references · 22 canonical work pages · 6 internal anchors

  1. [1]

    C., and Zanotti, F

    Asperti, A., George, F., Marras, T., Stricescu, R. C., and Zanotti, F. A critical assessment of modern generative models’ ability to replicate artistic styles. Big Data and Cognitive Computing, 9 0 (9): 0 231, 2025

  2. [2]

    On mechanistic knowledge localization in text-to-image generative models

    Basu, S., Rezaei, K., Kattakinda, P., Morariu, V., Zhao, N., Rossi, R., Manjunatha, V., and Feizi, S. On mechanistic knowledge localization in text-to-image generative models. International Conference on Machine Learning, 2024 a

  3. [3]

    Localizing and editing knowledge in text-to-image generative models

    Basu, S., Zhao, N., Morariu, V., Feizi, S., and Manjunatha, V. Localizing and editing knowledge in text-to-image generative models. 2024 b

  4. [4]

    Flux.1, 2024

    Black Forest Labs . Flux.1, 2024. URL https://blackforestlabs.ai/announcing-black-forest-labs/

  5. [5]

    T., Vu, T., Vuong, L

    Bui, A. T., Vu, T., Vuong, L. T., Le, T., Montague, P., Abraham, T., Kim, J., and Phung, D. Fantastic targets for concept erasure in diffusion models and where to find them. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025 . OpenReview.net, 2025. URL https://openreview.net/forum?id=tZdqL5FH7w

  6. [6]

    ConceptPrune : Concept editing in diffusion models via skilled neuron pruning

    Chavhan, R., Li, D., and Hospedales, T. ConceptPrune : Concept editing in diffusion models via skilled neuron pruning. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=kSdWcw5mkp

  7. [7]

    Prompting4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts

    Chin, Z.-Y., Jiang, C.-M., Huang, C.-C., Chen, P.-Y., and Chiu, W.-C. Prompting4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

  8. [8]

    and Deja, K

    Cywi \'n ski, B. and Deja, K. Saeuron: Interpretable concept unlearning in diffusion models with sparse autoencoders. International Conference on Machine Learning, 2025

  9. [9]

    Transcoders find interpretable llm feature circuits

    Dunefsky, J., Chlenski, P., and Nanda, N. Transcoders find interpretable llm feature circuits. Advances in Neural Information Processing Systems, 37: 0 24375--24410, 2024

  10. [11]

    Scaling rectified flow transformers for high-resolution image synthesis

    Esser, P., Kulal, S., Blattmann, A., Entezari, R., M \"u ller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al. Scaling rectified flow transformers for high-resolution image synthesis. In Forty-first international conference on machine learning, 2024 a

  11. [12]

    Scaling rectified flow transformers for high-resolution image synthesis

    Esser, P., Kulal, S., Blattmann, A., Entezari, R., M \"u ller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al. Scaling rectified flow transformers for high-resolution image synthesis. In Forty-first International Conference on Machine Learning, 2024 b

  12. [13]

    Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation

    Fan, C., Liu, J., Zhang, Y., Wong, E., Wei, D., and Liu, S. Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation. The Twelfth International Conference on Learning Representations, 2024

  13. [14]

    Erasing concepts from diffusion models

    Gandikota, R., Materzynska, J., Fiotto-Kaufman, J., and Bau, D. Erasing concepts from diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pp.\ 2426--2436, 2023

  14. [15]

    Unified concept editing in diffusion models

    Gandikota, R., Orgad, H., Belinkov, Y., Materzy \'n ska, J., and Bau, D. Unified concept editing in diffusion models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.\ 5111--5120, 2024

  15. [16]

    Eraseanything: Enabling concept erasure in rectified flow transformers

    Gao, D., Lu, S., Zhou, W., Chu, J., Zhang, J., Jia, M., Zhang, B., Fan, Z., and Zhang, W. Eraseanything: Enabling concept erasure in rectified flow transformers. In Forty-second International Conference on Machine Learning, 2025

  16. [18]

    Gary Marcus, R. S. Generative ai has a visual plagiarism problem: Experiments with midjourney and dall-e 3 show a copyright minefield, 2024. URL https://spectrum.ieee.org/midjourney-copyright. Accessed: 2024-01-06

  17. [19]

    Reliable and efficient concept erasure of text-to-image diffusion models

    Gong, C., Chen, K., Wei, Z., Chen, J., and Jiang, Y.-G. Reliable and efficient concept erasure of text-to-image diffusion models. In European Conference on Computer Vision, pp.\ 73--88. Springer, 2024

  18. [20]

    Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis

    Han, J., Liu, J., Jiang, Y., Yan, B., Zhang, Y., Yuan, Z., Peng, B., and Liu, X. Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis. In Proceedings of the Computer Vision and Pattern Recognition Conference, pp.\ 15733--15744, 2025

  19. [21]

    and Soh, H

    Heng, A. and Soh, H. Selective amnesia: A continual learning approach to forgetting in deep generative models. Advances in Neural Information Processing Systems, 36: 0 17170--17194, 2023

  20. [22]

    Clipscore: A reference-free evaluation metric for image captioning

    Hessel, J., Holtzman, A., Forbes, M., Le Bras, R., and Choi, Y. Clipscore: A reference-free evaluation metric for image captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.\ 7514--7528, 2021

  21. [23]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium

    Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017

  22. [25]

    Ablating concepts in text-to-image diffusion models

    Kumari, N., Zhang, B., Wang, S.-Y., Shechtman, E., Zhang, R., and Zhu, J.-Y. Ablating concepts in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 22691--22702, 2023

  23. [26]

    Labs, B. F. Flux. https://github.com/black-forest-labs/flux, 2024

  24. [27]

    H., Lim, S., and Chun, S

    Lee, B. H., Lim, S., and Chun, S. Y. Localized concept erasure for text-to-image diffusion models using training-free gated low-rank adaptation. In CVPR, 2025

  25. [28]

    S., Hou, Q., Wang, Y., and Yang, J

    Li, S., van de Weijer, J., Hu, T., Khan, F. S., Hou, Q., Wang, Y., and Yang, J. Get what you want, not what you don't: Image content suppression for text-to-image diffusion models. The Twelfth International Conference on Learning Representations, 2024

  26. [29]

    Microsoft COCO: Common Objects in Context

    Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., and Dollár, P. Microsoft coco: Common objects in context. arXiv preprint arXiv: 1405.0312, 2014

  27. [30]

    Liu, H., Li, C., Wu, Q., and Lee, Y. J. Visual instruction tuning. In NeurIPS, 2023

  28. [31]

    Lu, S., Wang, Z., Li, L., Liu, Y., and Kong, A. W.-K. Mace: Mass concept erasure in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 6430--6440, 2024

  29. [32]

    One-dimensional adapter to rule them all: Concepts diffusion models and erasing applications

    Lyu, M., Yang, Y., Hong, H., Chen, H., Jin, X., He, Y., Xue, H., Han, J., and Ding, G. One-dimensional adapter to rule them all: Concepts diffusion models and erasing applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 7559--7568, 2024

  30. [33]

    Hpsv3: Towards wide-spectrum human preference score

    Ma, Y., Wu, X., Sun, K., and Li, H. Hpsv3: Towards wide-spectrum human preference score. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 15086--15095, October 2025 a

  31. [34]

    Hpsv3: Towards wide-spectrum human preference score

    Ma, Y., Wu, X., Sun, K., and Li, H. Hpsv3: Towards wide-spectrum human preference score. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 15086--15095, 2025 b

  32. [35]

    Editing implicit assumptions in text-to-image diffusion models

    Orgad, H., Kawar, B., and Belinkov, Y. Editing implicit assumptions in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 7053--7061, 2023

  33. [37]

    Safe-clip: Removing nsfw concepts from vision-and-language models

    Poppi, S., Poppi, T., Cocchi, F., Cornia, M., Baraldi, L., and Cucchiara, R. Safe-clip: Removing nsfw concepts from vision-and-language models. In European Conference on Computer Vision, pp.\ 340--356. Springer, 2024

  34. [38]

    Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models

    Qu, Y., Shen, X., He, X., Backes, M., Zannettou, S., and Zhang, Y. Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pp.\ 3403--3417, 2023

  35. [39]

    Zero-shot text-to-image generation

    Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. Zero-shot text-to-image generation. In International conference on machine learning, pp.\ 8821--8831. Pmlr, 2021

  36. [41]

    High-resolution image synthesis with latent diffusion models

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 10684--10695, 2022

  37. [42]

    Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models

    Schramowski, P., Brack, M., Deiseroth, B., and Kersting, K. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 22522--22531, 2023

  38. [43]

    DeepFloyd IF : a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding

    StabilityAI. DeepFloyd IF : a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding. https://github.com/deep-floyd/IF, 2023. Last accessed on 2025-01-17

  39. [45]

    Tsai, Y., Hsu, C., Xie, C., Lin, C., Chen, J., Li, B., Chen, P., Yu, C., and Huang, C. Ring-a-bell! how reliable are concept removal methods for diffusion models? In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net, 2024. URL https://openreview.net/forum?id=lm7MRcsFiS

  40. [46]

    Pixel recurrent neural networks

    Van Den Oord, A., Kalchbrenner, N., and Kavukcuoglu, K. Pixel recurrent neural networks. In International conference on machine learning, pp.\ 1747--1756. PMLR, 2016

  41. [47]

    Precise, fast, and low-cost concept erasure in value space: Orthogonal complement matters

    Wang, Y., Li, O., Mu, T., Hao, Y., Liu, K., Wang, X., and He, X. Precise, fast, and low-cost concept erasure in value space: Orthogonal complement matters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 28759--28768, June 2025

  42. [49]

    and Harandi, M

    Wu, J. and Harandi, M. Scissorhands: Scrub data influence via connection sensitivity in networks. In European Conference on Computer Vision, pp.\ 367--384. Springer, 2024

  43. [50]

    Erasing undesirable influence in diffusion models

    Wu, J., Le, T., Hayat, M., and Harandi, M. Erasing undesirable influence in diffusion models. In Proceedings of the Computer Vision and Pattern Recognition Conference, pp.\ 28263--28273, 2025

  44. [51]

    MMA-Diffusion: MultiModal Attack on Diffusion Models

    Yang, Y., Gao, R., Wang, X., Ho, T.-Y., Xu, N., and Xu, Q. MMA-Diffusion: MultiModal Attack on Diffusion Models . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ) , 2024 a

  45. [52]

    Sneakyprompt: Jailbreaking text-to-image generative models

    Yang, Y., Hui, B., Yuan, H., Gong, N., and Cao, Y. Sneakyprompt: Jailbreaking text-to-image generative models. In 2024 IEEE symposium on security and privacy (SP), pp.\ 897--912. IEEE, 2024 b

  46. [53]

    Safree: Training-free and adaptive guard for safe text-to-image and video generation

    Yoon, J., Yu, S., Patil, V., Yao, H., and Bansal, M. Safree: Training-free and adaptive guard for safe text-to-image and video generation. The Thirteenth International Conference on Learning Representations, 2025

  47. [54]

    Forget-me-not: Learning to forget in text-to-image diffusion models

    Zhang, G., Wang, K., Xu, X., Wang, Z., and Shi, H. Forget-me-not: Learning to forget in text-to-image diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 1755--1764, 2024 a

  48. [55]

    R., Liu, X., and Liu, S

    Zhang, Y., Fan, C., Zhang, Y., Yao, Y., Jia, J., Liu, J., Zhang, G., Liu, G., Kompella, R. R., Liu, X., and Liu, S. Unlearncanvas: Stylized image dataset for enhanced machine unlearning evaluation in diffusion models. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024 b . URL https://openreview.net/...

  49. [56]

    To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images

    Zhang, Y., Jia, J., Chen, X., Chen, A., Zhang, Y., Liu, J., Ding, K., and Liu, S. To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images... for now. European Conference on Computer Vision (ECCV), 2024 c

  50. [57]

    Minimalist concept erasure in generative models

    Zhang, Y., Jin, E., Dong, Y., Wu, Y., Torr, P., Khakzar, A., Stegmaier, J., and Kawaguchi, K. Minimalist concept erasure in generative models. International Conference on Machine Learning, 2025

  51. [58]

    Image and video tokenization with binary spherical quantization

    Zhao, Y., Xiong, Y., and Krähenbühl, P. Image and video tokenization with binary spherical quantization. arXiv preprint arXiv: 2406.07548, 2024

  52. [59]

    Closing the safety gap: Surgical concept erasure in visual autoregressive models

    Zhong, X., Zhou, Y., Zhang, Z., Li, J., Yi, S., Chen, B., Xia, S.-T., Wang, X., and Xu, K. Closing the safety gap: Surgical concept erasure in visual autoregressive models. In The Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=tlYSbw5GXY

  53. [60]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Editing implicit assumptions in text-to-image diffusion models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  54. [61]

    Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

    Unified concept editing in diffusion models , author=. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

  55. [62]

    arXiv preprint arXiv:2501.19066 , year=

    Concept steerers: Leveraging k-sparse autoencoders for controllable generations , author=. arXiv preprint arXiv:2501.19066 , year=

  56. [63]

    arXiv preprint arXiv:2503.09446 , year=

    Sparse autoencoder as a zero-shot classifier for concept erasing in text-to-image diffusion models , author=. arXiv preprint arXiv:2503.09446 , year=

  57. [64]

    arXiv preprint arXiv:2506.22806 , year=

    Concept pinpoint eraser for text-to-image diffusion models via residual attention gate , author=. arXiv preprint arXiv:2506.22806 , year=

  58. [65]

    2024 , publisher=

    Localizing and editing knowledge in text-to-image generative models , author=. 2024 , publisher=

  59. [66]

    European Conference on Computer Vision , pages=

    Reliable and efficient concept erasure of text-to-image diffusion models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

  60. [67]

    2024 , organization=

    On mechanistic knowledge localization in text-to-image generative models , author=. 2024 , organization=

  61. [68]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Mace: Mass concept erasure in diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  62. [69]

    International Conference on Machine Learning , year=

    Minimalist Concept Erasure in Generative Models , author=. International Conference on Machine Learning , year=

  63. [70]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Erasing concepts from diffusion models , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  64. [71]

    Forty-second International Conference on Machine Learning , year=

    Eraseanything: Enabling concept erasure in rectified flow transformers , author=. Forty-second International Conference on Machine Learning , year=

  65. [72]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Forget-me-not: Learning to forget in text-to-image diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  66. [73]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Ablating concepts in text-to-image diffusion models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  67. [74]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Erasing undesirable influence in diffusion models , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  68. [75]

    The Twelfth International Conference on Learning Representations , year=

    Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation , author=. The Twelfth International Conference on Learning Representations , year=

  69. [76]

    European Conference on Computer Vision , pages=

    Scissorhands: Scrub data influence via connection sensitivity in networks , author=. European Conference on Computer Vision , pages=. 2024 , organization=

  70. [77]

    Advances in Neural Information Processing Systems , volume=

    Selective amnesia: A continual learning approach to forgetting in deep generative models , author=. Advances in Neural Information Processing Systems , volume=

  71. [78]

    European Conference on Computer Vision , pages=

    Safe-clip: Removing nsfw concepts from vision-and-language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

  72. [79]

    International Conference on Machine Learning , year=

    SAeUron: Interpretable concept unlearning in diffusion models with sparse autoencoders , author=. International Conference on Machine Learning , year=

  73. [80]

    International Conference on Machine Learning , year=

    Mechanistic unlearning: Robust knowledge unlearning and editing via mechanistic localization , author=. International Conference on Machine Learning , year=

  74. [81]

    The Thirteenth International Conference on Learning Representations , year=

    Safree: Training-free and adaptive guard for safe text-to-image and video generation , author=. The Thirteenth International Conference on Learning Representations , year=

  75. [82]

    The Twelfth International Conference on Learning Representations , year=

    Get what you want, not what you don't: Image content suppression for text-to-image diffusion models , author=. The Twelfth International Conference on Learning Representations , year=

  76. [83]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    One-dimensional adapter to rule them all: Concepts diffusion models and erasing applications , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  77. [84]

    The Thirteenth International Conference on Learning Representations , year=

    Precise Parameter Localization for Textual Generation in Diffusion Models , author=. The Thirteenth International Conference on Learning Representations , year=

  78. [85]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Hpsv3: Towards wide-spectrum human preference score , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  79. [86]

    Forty-first International Conference on Machine Learning , year=

    Scaling rectified flow transformers for high-resolution image synthesis , author=. Forty-first International Conference on Machine Learning , year=

  80. [87]

    2023 , note=

    StabilityAI , title=. 2023 , note=

Showing first 80 references.