Concept Removal for Frontier Image Generative Models

Adam Dziedzic; Aditya Kumar; Franziska Boenisch; Pierre Joly

arxiv: 2606.25548 · v1 · pith:N2BRS7YMnew · submitted 2026-06-24 · 💻 cs.CV · cs.LG

Concept Removal for Frontier Image Generative Models

Aditya Kumar , Pierre Joly , Adam Dziedzic , Franziska Boenisch This is my paper

Pith reviewed 2026-06-25 21:16 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords concept removaldiffusion modelsimage generative modelstranscoderbottleneck layermodel editingconcept suppressionadversarial robustness

0 comments

The pith

Replacing the bottleneck layer with a trained transcoder lets image generative models selectively disable unwanted concepts while keeping output quality intact.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method that swaps the internal bottleneck layer in models such as SD3.5, Flux, and Infinity for a transcoder. The transcoder is trained both to match the original layer's behavior and to organize its activations into separate features tied to individual concepts. Once inserted, the transcoder acts as a built-in filter that can turn off specific concept signals without external add-ons. Because the change sits inside the model backbone, the removal stays in place even when users have full white-box access. Experiments show the approach outperforms prior methods on concept removal, preserves image quality, resists adversarial prompts, and supports removing multiple concepts one after another.

Core claim

The central claim is that an in-place substitution of the bottleneck layer with a transcoder trained to replicate the original layer while structuring activations into distinct, selectively disableable features creates an integrated filter that removes target concepts from diffusion and autoregressive image models without degrading overall generation behavior or requiring external components.

What carries the argument

The transcoder that replaces the bottleneck layer and structures its activations into distinct, selectively disableable features corresponding to individual concepts.

If this is right

The method achieves state-of-the-art concept removal on modern diffusion and autoregressive models.
Generation quality remains comparable to the unmodified model.
The removal resists adversarial prompts that try to elicit the disabled concept.
Multiple distinct concepts can be removed sequentially without cumulative degradation.
The edit persists under white-box access because it modifies the model backbone directly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same transcoder replacement might extend to other generative domains such as video or audio if those models also contain analogous bottleneck layers.
The structured activation features could be inspected to study which internal representations correspond to specific visual concepts.
Sequential removal capability suggests the approach could support ongoing, iterative safety updates after initial deployment.
Because the change is internal rather than an added filter, it may reduce the attack surface compared with external concept-removal modules.

Load-bearing premise

A transcoder can be trained to match the original bottleneck layer exactly while also organizing its activations so that individual concepts can be turned off without side effects on the rest of the model.

What would settle it

After the transcoder is inserted and a concept is disabled, either image quality drops measurably on standard benchmarks or the removed concept still appears reliably in generations from ordinary prompts.

Figures

Figures reproduced from arXiv: 2606.25548 by Adam Dziedzic, Aditya Kumar, Franziska Boenisch, Pierre Joly.

**Figure 2.** Figure 2: Overview of our BLOCK. We detail our transcoderbased concept removal framework. 4.1. Our BLOCK Framework SOTA architectures, including both DMs and IARs, rely on one or more text encoders that generate embeddings to guide image generation. These embeddings are typically injected into the image-generative backbone via transformations such as projection layers or MLPs applied to pooled text features, depe… view at source ↗

**Figure 3.** Figure 3: Multi-Concept Style Removal of Our Method and SOTA Baselines [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative results for style removal across models and baselines. Prompts and seeds are listed below each image. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative results for object removal across models and baselines. Prompts and seeds are listed below each image. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Prompt used for LLaVA-based style classification in our evaluation. The model must select exactly one label, ensuring consistent evaluation across generated samples. Prompt for Object Classification with LLaVA Classify the object depicted in this image. Choose exactly one option from the numbered list. Respond with only the number. Object categories: 1. Architecture 2. Bear 3. Bird 4. Butterfly 5. Cat 6. D… view at source ↗

**Figure 7.** Figure 7: Prompt used for object classification in our LLaVA-based evaluation. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: Multi-Concept Removal of Our Method and SOTA Baselines for Objects. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

read the original abstract

Image generative models are trained on massive, largely uncurated internet-scale datasets that contain undesirable visual concepts. Efficiently removing such concepts from the model generations without degrading the quality of output images remains challenging. We introduce a novel concept removal method for frontier diffusion and image autoregressive models, such as SD3.5, Flux, and Infinity. Our intervention replaces the internal bottleneck layer present in all these modern models with a transcoder that is trained to replicate the original layer while structuring it into distinct activation features. This in-place substitution creates an integrated filter through which concept-specific signals can be selectively disabled while preserving the rest of the model's behavior. Since the intervention modifies the model backbone rather than attaching an external component, it remains persistent under white-box access. Empirically, the approach achieves state-of-the-art concept removal performance across modern diffusion and autoregressive models, maintains visual generation quality, provides robustness against adversarial prompts, and supports sequential removal of diverse concepts. This positions our method as a practical approach for concept removal in frontier image generative models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The transcoder swap for internal concept removal is a distinct architectural move, but the abstract supplies no metrics or ablations so the performance claims cannot be checked.

read the letter

The main point is that the authors replace the bottleneck layer inside models like SD3.5, Flux, and Infinity with a transcoder trained to copy the original layer while splitting activations into features that can be turned off per concept. This keeps the removal inside the model so it persists under white-box access, unlike external filters.

What stands out is the focus on persistence and sequential removal. Prior work often adds components on top or fine-tunes the whole thing; this changes the core architecture and claims it blocks concepts even against adversarial prompts without hurting generation quality. That addresses a real deployment issue for frontier models.

The soft spot is the missing evidence. The abstract asserts state-of-the-art removal, robustness, and quality preservation but shows no numbers, baselines, reconstruction errors, or ablation results on the replication-versus-disentanglement trade-off. The stress-test note is on target here: balancing exact layer replication with clean, side-effect-free feature separation is the load-bearing step, and nothing demonstrates it succeeds. Without those details the central guarantee stays unverified.

This is for people working on model editing and AI safety who need internal controls that survive access to the weights. A reader already running experiments on these models might want to see the full results section and any released code.

I would send it to peer review. The idea is different enough from cited prior work and the problem matters, even though the current write-up needs the empirical section to be evaluated properly.

Referee Report

2 major / 0 minor

Summary. The paper claims a novel concept removal technique for frontier diffusion and autoregressive image models (SD3.5, Flux, Infinity) that replaces the internal bottleneck layer with a transcoder trained to replicate the original layer's function while factoring its activations into distinct, selectively disableable features; the substitution is asserted to enable persistent, in-place concept filtering without external components, achieving SOTA removal performance, preserved generation quality, adversarial robustness, and support for sequential multi-concept removal.

Significance. If the central empirical claims were substantiated, the work would offer a practically significant advance by providing an integrated, persistent intervention inside the model backbone rather than an add-on filter. The approach could address a real deployment need for frontier models. However, the manuscript as presented supplies no quantitative evidence, so significance cannot be assessed.

major comments (2)

[Abstract] Abstract: the assertions of 'state-of-the-art concept removal performance', 'maintains visual generation quality', 'robustness against adversarial prompts', and 'supports sequential removal' are presented without any metrics, baselines, ablation studies, or experimental details, so the claims cannot be evaluated.
[Abstract] Abstract (central claim paragraph): the dual requirement that the transcoder 'replicate the original layer' while 'structuring it into distinct activation features' that can be 'selectively disabled' with 'no side effects on overall model behavior' is asserted but unsupported; no reconstruction loss values, feature-disentanglement metrics, or trade-off ablations are supplied to address the tension between exact replication and clean per-concept disablement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for identifying the lack of quantitative support for the abstract claims. We agree these assertions require explicit metrics and will revise the manuscript to include them.

read point-by-point responses

Referee: [Abstract] Abstract: the assertions of 'state-of-the-art concept removal performance', 'maintains visual generation quality', 'robustness against adversarial prompts', and 'supports sequential removal' are presented without any metrics, baselines, ablation studies, or experimental details, so the claims cannot be evaluated.

Authors: We agree the abstract overclaims without evidence. The revised version will incorporate specific quantitative results (e.g., removal accuracy vs. baselines, FID scores for quality, adversarial robustness rates, and sequential removal success) directly into the abstract, with pointers to the experimental sections. revision: yes
Referee: [Abstract] Abstract (central claim paragraph): the dual requirement that the transcoder 'replicate the original layer' while 'structuring it into distinct activation features' that can be 'selectively disabled' with 'no side effects on overall model behavior' is asserted but unsupported; no reconstruction loss values, feature-disentanglement metrics, or trade-off ablations are supplied to address the tension between exact replication and clean per-concept disablement.

Authors: We accept this criticism. The revision will add reported reconstruction losses, disentanglement metrics (such as feature correlation or activation independence scores), and ablation studies on the replication-vs.-removal trade-off to demonstrate that selective disablement occurs without side effects. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architectural substitution with no derivation reducing to fitted inputs or self-citations

full rationale

The paper presents an empirical intervention: replacing a bottleneck layer with a trained transcoder that replicates the original while enabling selective feature disablement. No mathematical derivation chain, equations, or 'predictions' are claimed that reduce by construction to the training inputs. No self-citation load-bearing steps, uniqueness theorems, or ansatzes imported from prior author work are invoked in the provided text. The central claim rests on training and empirical validation rather than definitional equivalence or fitted-parameter renaming. This is a standard non-circular empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unverified assumption that the bottleneck layer admits a transcoder replacement that disentangles concept-specific signals while preserving all other behavior; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption The bottleneck layer in diffusion and autoregressive models can be replaced in-place by a transcoder without altering the model's overall generative capability beyond the targeted features.
Invoked in the description of the intervention as the basis for creating an integrated filter.

pith-pipeline@v0.9.1-grok · 5714 in / 1197 out tokens · 27586 ms · 2026-06-25T21:16:52.139219+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

160 extracted references · 22 canonical work pages · 6 internal anchors

[1]

C., and Zanotti, F

Asperti, A., George, F., Marras, T., Stricescu, R. C., and Zanotti, F. A critical assessment of modern generative models’ ability to replicate artistic styles. Big Data and Cognitive Computing, 9 0 (9): 0 231, 2025

2025
[2]

On mechanistic knowledge localization in text-to-image generative models

Basu, S., Rezaei, K., Kattakinda, P., Morariu, V., Zhao, N., Rossi, R., Manjunatha, V., and Feizi, S. On mechanistic knowledge localization in text-to-image generative models. International Conference on Machine Learning, 2024 a

2024
[3]

Localizing and editing knowledge in text-to-image generative models

Basu, S., Zhao, N., Morariu, V., Feizi, S., and Manjunatha, V. Localizing and editing knowledge in text-to-image generative models. 2024 b

2024
[4]

Flux.1, 2024

Black Forest Labs . Flux.1, 2024. URL https://blackforestlabs.ai/announcing-black-forest-labs/

2024
[5]

T., Vu, T., Vuong, L

Bui, A. T., Vu, T., Vuong, L. T., Le, T., Montague, P., Abraham, T., Kim, J., and Phung, D. Fantastic targets for concept erasure in diffusion models and where to find them. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025 . OpenReview.net, 2025. URL https://openreview.net/forum?id=tZdqL5FH7w

2025
[6]

ConceptPrune : Concept editing in diffusion models via skilled neuron pruning

Chavhan, R., Li, D., and Hospedales, T. ConceptPrune : Concept editing in diffusion models via skilled neuron pruning. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=kSdWcw5mkp

2025
[7]

Prompting4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts

Chin, Z.-Y., Jiang, C.-M., Huang, C.-C., Chen, P.-Y., and Chiu, W.-C. Prompting4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

2023
[8]

and Deja, K

Cywi \'n ski, B. and Deja, K. Saeuron: Interpretable concept unlearning in diffusion models with sparse autoencoders. International Conference on Machine Learning, 2025

2025
[9]

Transcoders find interpretable llm feature circuits

Dunefsky, J., Chlenski, P., and Nanda, N. Transcoders find interpretable llm feature circuits. Advances in Neural Information Processing Systems, 37: 0 24375--24410, 2024

2024
[11]

Scaling rectified flow transformers for high-resolution image synthesis

Esser, P., Kulal, S., Blattmann, A., Entezari, R., M \"u ller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al. Scaling rectified flow transformers for high-resolution image synthesis. In Forty-first international conference on machine learning, 2024 a

2024
[12]

Scaling rectified flow transformers for high-resolution image synthesis

Esser, P., Kulal, S., Blattmann, A., Entezari, R., M \"u ller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al. Scaling rectified flow transformers for high-resolution image synthesis. In Forty-first International Conference on Machine Learning, 2024 b

2024
[13]

Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation

Fan, C., Liu, J., Zhang, Y., Wong, E., Wei, D., and Liu, S. Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation. The Twelfth International Conference on Learning Representations, 2024

2024
[14]

Erasing concepts from diffusion models

Gandikota, R., Materzynska, J., Fiotto-Kaufman, J., and Bau, D. Erasing concepts from diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pp.\ 2426--2436, 2023

2023
[15]

Unified concept editing in diffusion models

Gandikota, R., Orgad, H., Belinkov, Y., Materzy \'n ska, J., and Bau, D. Unified concept editing in diffusion models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.\ 5111--5120, 2024

2024
[16]

Eraseanything: Enabling concept erasure in rectified flow transformers

Gao, D., Lu, S., Zhou, W., Chu, J., Zhang, J., Jia, M., Zhang, B., Fan, Z., and Zhang, W. Eraseanything: Enabling concept erasure in rectified flow transformers. In Forty-second International Conference on Machine Learning, 2025

2025
[18]

Gary Marcus, R. S. Generative ai has a visual plagiarism problem: Experiments with midjourney and dall-e 3 show a copyright minefield, 2024. URL https://spectrum.ieee.org/midjourney-copyright. Accessed: 2024-01-06

2024
[19]

Reliable and efficient concept erasure of text-to-image diffusion models

Gong, C., Chen, K., Wei, Z., Chen, J., and Jiang, Y.-G. Reliable and efficient concept erasure of text-to-image diffusion models. In European Conference on Computer Vision, pp.\ 73--88. Springer, 2024

2024
[20]

Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis

Han, J., Liu, J., Jiang, Y., Yan, B., Zhang, Y., Yuan, Z., Peng, B., and Liu, X. Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis. In Proceedings of the Computer Vision and Pattern Recognition Conference, pp.\ 15733--15744, 2025

2025
[21]

and Soh, H

Heng, A. and Soh, H. Selective amnesia: A continual learning approach to forgetting in deep generative models. Advances in Neural Information Processing Systems, 36: 0 17170--17194, 2023

2023
[22]

Clipscore: A reference-free evaluation metric for image captioning

Hessel, J., Holtzman, A., Forbes, M., Le Bras, R., and Choi, Y. Clipscore: A reference-free evaluation metric for image captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.\ 7514--7528, 2021

2021
[23]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017

2017
[25]

Ablating concepts in text-to-image diffusion models

Kumari, N., Zhang, B., Wang, S.-Y., Shechtman, E., Zhang, R., and Zhu, J.-Y. Ablating concepts in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 22691--22702, 2023

2023
[26]

Labs, B. F. Flux. https://github.com/black-forest-labs/flux, 2024

2024
[27]

H., Lim, S., and Chun, S

Lee, B. H., Lim, S., and Chun, S. Y. Localized concept erasure for text-to-image diffusion models using training-free gated low-rank adaptation. In CVPR, 2025

2025
[28]

S., Hou, Q., Wang, Y., and Yang, J

Li, S., van de Weijer, J., Hu, T., Khan, F. S., Hou, Q., Wang, Y., and Yang, J. Get what you want, not what you don't: Image content suppression for text-to-image diffusion models. The Twelfth International Conference on Learning Representations, 2024

2024
[29]

Microsoft COCO: Common Objects in Context

Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., and Dollár, P. Microsoft coco: Common objects in context. arXiv preprint arXiv: 1405.0312, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[30]

Liu, H., Li, C., Wu, Q., and Lee, Y. J. Visual instruction tuning. In NeurIPS, 2023

2023
[31]

Lu, S., Wang, Z., Li, L., Liu, Y., and Kong, A. W.-K. Mace: Mass concept erasure in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 6430--6440, 2024

2024
[32]

One-dimensional adapter to rule them all: Concepts diffusion models and erasing applications

Lyu, M., Yang, Y., Hong, H., Chen, H., Jin, X., He, Y., Xue, H., Han, J., and Ding, G. One-dimensional adapter to rule them all: Concepts diffusion models and erasing applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 7559--7568, 2024

2024
[33]

Hpsv3: Towards wide-spectrum human preference score

Ma, Y., Wu, X., Sun, K., and Li, H. Hpsv3: Towards wide-spectrum human preference score. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 15086--15095, October 2025 a

2025
[34]

Hpsv3: Towards wide-spectrum human preference score

Ma, Y., Wu, X., Sun, K., and Li, H. Hpsv3: Towards wide-spectrum human preference score. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 15086--15095, 2025 b

2025
[35]

Editing implicit assumptions in text-to-image diffusion models

Orgad, H., Kawar, B., and Belinkov, Y. Editing implicit assumptions in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 7053--7061, 2023

2023
[37]

Safe-clip: Removing nsfw concepts from vision-and-language models

Poppi, S., Poppi, T., Cocchi, F., Cornia, M., Baraldi, L., and Cucchiara, R. Safe-clip: Removing nsfw concepts from vision-and-language models. In European Conference on Computer Vision, pp.\ 340--356. Springer, 2024

2024
[38]

Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models

Qu, Y., Shen, X., He, X., Backes, M., Zannettou, S., and Zhang, Y. Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pp.\ 3403--3417, 2023

2023
[39]

Zero-shot text-to-image generation

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. Zero-shot text-to-image generation. In International conference on machine learning, pp.\ 8821--8831. Pmlr, 2021

2021
[41]

High-resolution image synthesis with latent diffusion models

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 10684--10695, 2022

2022
[42]

Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models

Schramowski, P., Brack, M., Deiseroth, B., and Kersting, K. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 22522--22531, 2023

2023
[43]

DeepFloyd IF : a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding

StabilityAI. DeepFloyd IF : a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding. https://github.com/deep-floyd/IF, 2023. Last accessed on 2025-01-17

2023
[45]

Tsai, Y., Hsu, C., Xie, C., Lin, C., Chen, J., Li, B., Chen, P., Yu, C., and Huang, C. Ring-a-bell! how reliable are concept removal methods for diffusion models? In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net, 2024. URL https://openreview.net/forum?id=lm7MRcsFiS

2024
[46]

Pixel recurrent neural networks

Van Den Oord, A., Kalchbrenner, N., and Kavukcuoglu, K. Pixel recurrent neural networks. In International conference on machine learning, pp.\ 1747--1756. PMLR, 2016

2016
[47]

Precise, fast, and low-cost concept erasure in value space: Orthogonal complement matters

Wang, Y., Li, O., Mu, T., Hao, Y., Liu, K., Wang, X., and He, X. Precise, fast, and low-cost concept erasure in value space: Orthogonal complement matters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 28759--28768, June 2025

2025
[49]

and Harandi, M

Wu, J. and Harandi, M. Scissorhands: Scrub data influence via connection sensitivity in networks. In European Conference on Computer Vision, pp.\ 367--384. Springer, 2024

2024
[50]

Erasing undesirable influence in diffusion models

Wu, J., Le, T., Hayat, M., and Harandi, M. Erasing undesirable influence in diffusion models. In Proceedings of the Computer Vision and Pattern Recognition Conference, pp.\ 28263--28273, 2025

2025
[51]

MMA-Diffusion: MultiModal Attack on Diffusion Models

Yang, Y., Gao, R., Wang, X., Ho, T.-Y., Xu, N., and Xu, Q. MMA-Diffusion: MultiModal Attack on Diffusion Models . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ) , 2024 a

2024
[52]

Sneakyprompt: Jailbreaking text-to-image generative models

Yang, Y., Hui, B., Yuan, H., Gong, N., and Cao, Y. Sneakyprompt: Jailbreaking text-to-image generative models. In 2024 IEEE symposium on security and privacy (SP), pp.\ 897--912. IEEE, 2024 b

2024
[53]

Safree: Training-free and adaptive guard for safe text-to-image and video generation

Yoon, J., Yu, S., Patil, V., Yao, H., and Bansal, M. Safree: Training-free and adaptive guard for safe text-to-image and video generation. The Thirteenth International Conference on Learning Representations, 2025

2025
[54]

Forget-me-not: Learning to forget in text-to-image diffusion models

Zhang, G., Wang, K., Xu, X., Wang, Z., and Shi, H. Forget-me-not: Learning to forget in text-to-image diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 1755--1764, 2024 a

2024
[55]

R., Liu, X., and Liu, S

Zhang, Y., Fan, C., Zhang, Y., Yao, Y., Jia, J., Liu, J., Zhang, G., Liu, G., Kompella, R. R., Liu, X., and Liu, S. Unlearncanvas: Stylized image dataset for enhanced machine unlearning evaluation in diffusion models. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024 b . URL https://openreview.net/...

2024
[56]

To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images

Zhang, Y., Jia, J., Chen, X., Chen, A., Zhang, Y., Liu, J., Ding, K., and Liu, S. To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images... for now. European Conference on Computer Vision (ECCV), 2024 c

2024
[57]

Minimalist concept erasure in generative models

Zhang, Y., Jin, E., Dong, Y., Wu, Y., Torr, P., Khakzar, A., Stegmaier, J., and Kawaguchi, K. Minimalist concept erasure in generative models. International Conference on Machine Learning, 2025

2025
[58]

Image and video tokenization with binary spherical quantization

Zhao, Y., Xiong, Y., and Krähenbühl, P. Image and video tokenization with binary spherical quantization. arXiv preprint arXiv: 2406.07548, 2024

work page arXiv 2024
[59]

Closing the safety gap: Surgical concept erasure in visual autoregressive models

Zhong, X., Zhou, Y., Zhang, Z., Li, J., Yi, S., Chen, B., Xia, S.-T., Wang, X., and Xu, K. Closing the safety gap: Surgical concept erasure in visual autoregressive models. In The Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=tlYSbw5GXY

2026
[60]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Editing implicit assumptions in text-to-image diffusion models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[61]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

Unified concept editing in diffusion models , author=. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=
[62]

arXiv preprint arXiv:2501.19066 , year=

Concept steerers: Leveraging k-sparse autoencoders for controllable generations , author=. arXiv preprint arXiv:2501.19066 , year=

work page arXiv
[63]

arXiv preprint arXiv:2503.09446 , year=

Sparse autoencoder as a zero-shot classifier for concept erasing in text-to-image diffusion models , author=. arXiv preprint arXiv:2503.09446 , year=

work page arXiv
[64]

arXiv preprint arXiv:2506.22806 , year=

Concept pinpoint eraser for text-to-image diffusion models via residual attention gate , author=. arXiv preprint arXiv:2506.22806 , year=

work page arXiv
[65]

2024 , publisher=

Localizing and editing knowledge in text-to-image generative models , author=. 2024 , publisher=

2024
[66]

European Conference on Computer Vision , pages=

Reliable and efficient concept erasure of text-to-image diffusion models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[67]

2024 , organization=

On mechanistic knowledge localization in text-to-image generative models , author=. 2024 , organization=

2024
[68]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Mace: Mass concept erasure in diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[69]

International Conference on Machine Learning , year=

Minimalist Concept Erasure in Generative Models , author=. International Conference on Machine Learning , year=
[70]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Erasing concepts from diffusion models , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
[71]

Forty-second International Conference on Machine Learning , year=

Eraseanything: Enabling concept erasure in rectified flow transformers , author=. Forty-second International Conference on Machine Learning , year=
[72]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Forget-me-not: Learning to forget in text-to-image diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[73]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Ablating concepts in text-to-image diffusion models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[74]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Erasing undesirable influence in diffusion models , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
[75]

The Twelfth International Conference on Learning Representations , year=

Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation , author=. The Twelfth International Conference on Learning Representations , year=
[76]

European Conference on Computer Vision , pages=

Scissorhands: Scrub data influence via connection sensitivity in networks , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[77]

Advances in Neural Information Processing Systems , volume=

Selective amnesia: A continual learning approach to forgetting in deep generative models , author=. Advances in Neural Information Processing Systems , volume=
[78]

European Conference on Computer Vision , pages=

Safe-clip: Removing nsfw concepts from vision-and-language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[79]

International Conference on Machine Learning , year=

SAeUron: Interpretable concept unlearning in diffusion models with sparse autoencoders , author=. International Conference on Machine Learning , year=
[80]

International Conference on Machine Learning , year=

Mechanistic unlearning: Robust knowledge unlearning and editing via mechanistic localization , author=. International Conference on Machine Learning , year=
[81]

The Thirteenth International Conference on Learning Representations , year=

Safree: Training-free and adaptive guard for safe text-to-image and video generation , author=. The Thirteenth International Conference on Learning Representations , year=
[82]

The Twelfth International Conference on Learning Representations , year=

Get what you want, not what you don't: Image content suppression for text-to-image diffusion models , author=. The Twelfth International Conference on Learning Representations , year=
[83]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

One-dimensional adapter to rule them all: Concepts diffusion models and erasing applications , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[84]

The Thirteenth International Conference on Learning Representations , year=

Precise Parameter Localization for Textual Generation in Diffusion Models , author=. The Thirteenth International Conference on Learning Representations , year=
[85]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Hpsv3: Towards wide-spectrum human preference score , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[86]

Forty-first International Conference on Machine Learning , year=

Scaling rectified flow transformers for high-resolution image synthesis , author=. Forty-first International Conference on Machine Learning , year=
[87]

2023 , note=

StabilityAI , title=. 2023 , note=

2023

Showing first 80 references.

[1] [1]

C., and Zanotti, F

Asperti, A., George, F., Marras, T., Stricescu, R. C., and Zanotti, F. A critical assessment of modern generative models’ ability to replicate artistic styles. Big Data and Cognitive Computing, 9 0 (9): 0 231, 2025

2025

[2] [2]

On mechanistic knowledge localization in text-to-image generative models

Basu, S., Rezaei, K., Kattakinda, P., Morariu, V., Zhao, N., Rossi, R., Manjunatha, V., and Feizi, S. On mechanistic knowledge localization in text-to-image generative models. International Conference on Machine Learning, 2024 a

2024

[3] [3]

Localizing and editing knowledge in text-to-image generative models

Basu, S., Zhao, N., Morariu, V., Feizi, S., and Manjunatha, V. Localizing and editing knowledge in text-to-image generative models. 2024 b

2024

[4] [4]

Flux.1, 2024

Black Forest Labs . Flux.1, 2024. URL https://blackforestlabs.ai/announcing-black-forest-labs/

2024

[5] [5]

T., Vu, T., Vuong, L

Bui, A. T., Vu, T., Vuong, L. T., Le, T., Montague, P., Abraham, T., Kim, J., and Phung, D. Fantastic targets for concept erasure in diffusion models and where to find them. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025 . OpenReview.net, 2025. URL https://openreview.net/forum?id=tZdqL5FH7w

2025

[6] [6]

ConceptPrune : Concept editing in diffusion models via skilled neuron pruning

Chavhan, R., Li, D., and Hospedales, T. ConceptPrune : Concept editing in diffusion models via skilled neuron pruning. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=kSdWcw5mkp

2025

[7] [7]

Prompting4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts

Chin, Z.-Y., Jiang, C.-M., Huang, C.-C., Chen, P.-Y., and Chiu, W.-C. Prompting4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

2023

[8] [8]

and Deja, K

Cywi \'n ski, B. and Deja, K. Saeuron: Interpretable concept unlearning in diffusion models with sparse autoencoders. International Conference on Machine Learning, 2025

2025

[9] [9]

Transcoders find interpretable llm feature circuits

Dunefsky, J., Chlenski, P., and Nanda, N. Transcoders find interpretable llm feature circuits. Advances in Neural Information Processing Systems, 37: 0 24375--24410, 2024

2024

[10] [11]

Scaling rectified flow transformers for high-resolution image synthesis

Esser, P., Kulal, S., Blattmann, A., Entezari, R., M \"u ller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al. Scaling rectified flow transformers for high-resolution image synthesis. In Forty-first international conference on machine learning, 2024 a

2024

[11] [12]

Scaling rectified flow transformers for high-resolution image synthesis

Esser, P., Kulal, S., Blattmann, A., Entezari, R., M \"u ller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al. Scaling rectified flow transformers for high-resolution image synthesis. In Forty-first International Conference on Machine Learning, 2024 b

2024

[12] [13]

Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation

Fan, C., Liu, J., Zhang, Y., Wong, E., Wei, D., and Liu, S. Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation. The Twelfth International Conference on Learning Representations, 2024

2024

[13] [14]

Erasing concepts from diffusion models

Gandikota, R., Materzynska, J., Fiotto-Kaufman, J., and Bau, D. Erasing concepts from diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pp.\ 2426--2436, 2023

2023

[14] [15]

Unified concept editing in diffusion models

Gandikota, R., Orgad, H., Belinkov, Y., Materzy \'n ska, J., and Bau, D. Unified concept editing in diffusion models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.\ 5111--5120, 2024

2024

[15] [16]

Eraseanything: Enabling concept erasure in rectified flow transformers

Gao, D., Lu, S., Zhou, W., Chu, J., Zhang, J., Jia, M., Zhang, B., Fan, Z., and Zhang, W. Eraseanything: Enabling concept erasure in rectified flow transformers. In Forty-second International Conference on Machine Learning, 2025

2025

[16] [18]

Gary Marcus, R. S. Generative ai has a visual plagiarism problem: Experiments with midjourney and dall-e 3 show a copyright minefield, 2024. URL https://spectrum.ieee.org/midjourney-copyright. Accessed: 2024-01-06

2024

[17] [19]

Reliable and efficient concept erasure of text-to-image diffusion models

Gong, C., Chen, K., Wei, Z., Chen, J., and Jiang, Y.-G. Reliable and efficient concept erasure of text-to-image diffusion models. In European Conference on Computer Vision, pp.\ 73--88. Springer, 2024

2024

[18] [20]

Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis

Han, J., Liu, J., Jiang, Y., Yan, B., Zhang, Y., Yuan, Z., Peng, B., and Liu, X. Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis. In Proceedings of the Computer Vision and Pattern Recognition Conference, pp.\ 15733--15744, 2025

2025

[19] [21]

and Soh, H

Heng, A. and Soh, H. Selective amnesia: A continual learning approach to forgetting in deep generative models. Advances in Neural Information Processing Systems, 36: 0 17170--17194, 2023

2023

[20] [22]

Clipscore: A reference-free evaluation metric for image captioning

Hessel, J., Holtzman, A., Forbes, M., Le Bras, R., and Choi, Y. Clipscore: A reference-free evaluation metric for image captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.\ 7514--7528, 2021

2021

[21] [23]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017

2017

[22] [25]

Ablating concepts in text-to-image diffusion models

Kumari, N., Zhang, B., Wang, S.-Y., Shechtman, E., Zhang, R., and Zhu, J.-Y. Ablating concepts in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 22691--22702, 2023

2023

[23] [26]

Labs, B. F. Flux. https://github.com/black-forest-labs/flux, 2024

2024

[24] [27]

H., Lim, S., and Chun, S

Lee, B. H., Lim, S., and Chun, S. Y. Localized concept erasure for text-to-image diffusion models using training-free gated low-rank adaptation. In CVPR, 2025

2025

[25] [28]

S., Hou, Q., Wang, Y., and Yang, J

Li, S., van de Weijer, J., Hu, T., Khan, F. S., Hou, Q., Wang, Y., and Yang, J. Get what you want, not what you don't: Image content suppression for text-to-image diffusion models. The Twelfth International Conference on Learning Representations, 2024

2024

[26] [29]

Microsoft COCO: Common Objects in Context

Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., and Dollár, P. Microsoft coco: Common objects in context. arXiv preprint arXiv: 1405.0312, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[27] [30]

Liu, H., Li, C., Wu, Q., and Lee, Y. J. Visual instruction tuning. In NeurIPS, 2023

2023

[28] [31]

Lu, S., Wang, Z., Li, L., Liu, Y., and Kong, A. W.-K. Mace: Mass concept erasure in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 6430--6440, 2024

2024

[29] [32]

One-dimensional adapter to rule them all: Concepts diffusion models and erasing applications

Lyu, M., Yang, Y., Hong, H., Chen, H., Jin, X., He, Y., Xue, H., Han, J., and Ding, G. One-dimensional adapter to rule them all: Concepts diffusion models and erasing applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 7559--7568, 2024

2024

[30] [33]

Hpsv3: Towards wide-spectrum human preference score

Ma, Y., Wu, X., Sun, K., and Li, H. Hpsv3: Towards wide-spectrum human preference score. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 15086--15095, October 2025 a

2025

[31] [34]

Hpsv3: Towards wide-spectrum human preference score

Ma, Y., Wu, X., Sun, K., and Li, H. Hpsv3: Towards wide-spectrum human preference score. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 15086--15095, 2025 b

2025

[32] [35]

Editing implicit assumptions in text-to-image diffusion models

Orgad, H., Kawar, B., and Belinkov, Y. Editing implicit assumptions in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 7053--7061, 2023

2023

[33] [37]

Safe-clip: Removing nsfw concepts from vision-and-language models

Poppi, S., Poppi, T., Cocchi, F., Cornia, M., Baraldi, L., and Cucchiara, R. Safe-clip: Removing nsfw concepts from vision-and-language models. In European Conference on Computer Vision, pp.\ 340--356. Springer, 2024

2024

[34] [38]

Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models

Qu, Y., Shen, X., He, X., Backes, M., Zannettou, S., and Zhang, Y. Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pp.\ 3403--3417, 2023

2023

[35] [39]

Zero-shot text-to-image generation

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. Zero-shot text-to-image generation. In International conference on machine learning, pp.\ 8821--8831. Pmlr, 2021

2021

[36] [41]

High-resolution image synthesis with latent diffusion models

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 10684--10695, 2022

2022

[37] [42]

Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models

Schramowski, P., Brack, M., Deiseroth, B., and Kersting, K. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 22522--22531, 2023

2023

[38] [43]

DeepFloyd IF : a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding

StabilityAI. DeepFloyd IF : a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding. https://github.com/deep-floyd/IF, 2023. Last accessed on 2025-01-17

2023

[39] [45]

Tsai, Y., Hsu, C., Xie, C., Lin, C., Chen, J., Li, B., Chen, P., Yu, C., and Huang, C. Ring-a-bell! how reliable are concept removal methods for diffusion models? In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net, 2024. URL https://openreview.net/forum?id=lm7MRcsFiS

2024

[40] [46]

Pixel recurrent neural networks

Van Den Oord, A., Kalchbrenner, N., and Kavukcuoglu, K. Pixel recurrent neural networks. In International conference on machine learning, pp.\ 1747--1756. PMLR, 2016

2016

[41] [47]

Precise, fast, and low-cost concept erasure in value space: Orthogonal complement matters

Wang, Y., Li, O., Mu, T., Hao, Y., Liu, K., Wang, X., and He, X. Precise, fast, and low-cost concept erasure in value space: Orthogonal complement matters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 28759--28768, June 2025

2025

[42] [49]

and Harandi, M

Wu, J. and Harandi, M. Scissorhands: Scrub data influence via connection sensitivity in networks. In European Conference on Computer Vision, pp.\ 367--384. Springer, 2024

2024

[43] [50]

Erasing undesirable influence in diffusion models

Wu, J., Le, T., Hayat, M., and Harandi, M. Erasing undesirable influence in diffusion models. In Proceedings of the Computer Vision and Pattern Recognition Conference, pp.\ 28263--28273, 2025

2025

[44] [51]

MMA-Diffusion: MultiModal Attack on Diffusion Models

Yang, Y., Gao, R., Wang, X., Ho, T.-Y., Xu, N., and Xu, Q. MMA-Diffusion: MultiModal Attack on Diffusion Models . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ) , 2024 a

2024

[45] [52]

Sneakyprompt: Jailbreaking text-to-image generative models

Yang, Y., Hui, B., Yuan, H., Gong, N., and Cao, Y. Sneakyprompt: Jailbreaking text-to-image generative models. In 2024 IEEE symposium on security and privacy (SP), pp.\ 897--912. IEEE, 2024 b

2024

[46] [53]

Safree: Training-free and adaptive guard for safe text-to-image and video generation

Yoon, J., Yu, S., Patil, V., Yao, H., and Bansal, M. Safree: Training-free and adaptive guard for safe text-to-image and video generation. The Thirteenth International Conference on Learning Representations, 2025

2025

[47] [54]

Forget-me-not: Learning to forget in text-to-image diffusion models

Zhang, G., Wang, K., Xu, X., Wang, Z., and Shi, H. Forget-me-not: Learning to forget in text-to-image diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 1755--1764, 2024 a

2024

[48] [55]

R., Liu, X., and Liu, S

Zhang, Y., Fan, C., Zhang, Y., Yao, Y., Jia, J., Liu, J., Zhang, G., Liu, G., Kompella, R. R., Liu, X., and Liu, S. Unlearncanvas: Stylized image dataset for enhanced machine unlearning evaluation in diffusion models. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024 b . URL https://openreview.net/...

2024

[49] [56]

To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images

Zhang, Y., Jia, J., Chen, X., Chen, A., Zhang, Y., Liu, J., Ding, K., and Liu, S. To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images... for now. European Conference on Computer Vision (ECCV), 2024 c

2024

[50] [57]

Minimalist concept erasure in generative models

Zhang, Y., Jin, E., Dong, Y., Wu, Y., Torr, P., Khakzar, A., Stegmaier, J., and Kawaguchi, K. Minimalist concept erasure in generative models. International Conference on Machine Learning, 2025

2025

[51] [58]

Image and video tokenization with binary spherical quantization

Zhao, Y., Xiong, Y., and Krähenbühl, P. Image and video tokenization with binary spherical quantization. arXiv preprint arXiv: 2406.07548, 2024

work page arXiv 2024

[52] [59]

Closing the safety gap: Surgical concept erasure in visual autoregressive models

Zhong, X., Zhou, Y., Zhang, Z., Li, J., Yi, S., Chen, B., Xia, S.-T., Wang, X., and Xu, K. Closing the safety gap: Surgical concept erasure in visual autoregressive models. In The Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=tlYSbw5GXY

2026

[53] [60]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Editing implicit assumptions in text-to-image diffusion models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[54] [61]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

Unified concept editing in diffusion models , author=. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

[55] [62]

arXiv preprint arXiv:2501.19066 , year=

Concept steerers: Leveraging k-sparse autoencoders for controllable generations , author=. arXiv preprint arXiv:2501.19066 , year=

work page arXiv

[56] [63]

arXiv preprint arXiv:2503.09446 , year=

Sparse autoencoder as a zero-shot classifier for concept erasing in text-to-image diffusion models , author=. arXiv preprint arXiv:2503.09446 , year=

work page arXiv

[57] [64]

arXiv preprint arXiv:2506.22806 , year=

Concept pinpoint eraser for text-to-image diffusion models via residual attention gate , author=. arXiv preprint arXiv:2506.22806 , year=

work page arXiv

[58] [65]

2024 , publisher=

Localizing and editing knowledge in text-to-image generative models , author=. 2024 , publisher=

2024

[59] [66]

European Conference on Computer Vision , pages=

Reliable and efficient concept erasure of text-to-image diffusion models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024

[60] [67]

2024 , organization=

On mechanistic knowledge localization in text-to-image generative models , author=. 2024 , organization=

2024

[61] [68]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Mace: Mass concept erasure in diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[62] [69]

International Conference on Machine Learning , year=

Minimalist Concept Erasure in Generative Models , author=. International Conference on Machine Learning , year=

[63] [70]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Erasing concepts from diffusion models , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

[64] [71]

Forty-second International Conference on Machine Learning , year=

Eraseanything: Enabling concept erasure in rectified flow transformers , author=. Forty-second International Conference on Machine Learning , year=

[65] [72]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Forget-me-not: Learning to forget in text-to-image diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[66] [73]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Ablating concepts in text-to-image diffusion models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[67] [74]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Erasing undesirable influence in diffusion models , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

[68] [75]

The Twelfth International Conference on Learning Representations , year=

Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation , author=. The Twelfth International Conference on Learning Representations , year=

[69] [76]

European Conference on Computer Vision , pages=

Scissorhands: Scrub data influence via connection sensitivity in networks , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024

[70] [77]

Advances in Neural Information Processing Systems , volume=

Selective amnesia: A continual learning approach to forgetting in deep generative models , author=. Advances in Neural Information Processing Systems , volume=

[71] [78]

European Conference on Computer Vision , pages=

Safe-clip: Removing nsfw concepts from vision-and-language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024

[72] [79]

International Conference on Machine Learning , year=

SAeUron: Interpretable concept unlearning in diffusion models with sparse autoencoders , author=. International Conference on Machine Learning , year=

[73] [80]

International Conference on Machine Learning , year=

Mechanistic unlearning: Robust knowledge unlearning and editing via mechanistic localization , author=. International Conference on Machine Learning , year=

[74] [81]

The Thirteenth International Conference on Learning Representations , year=

Safree: Training-free and adaptive guard for safe text-to-image and video generation , author=. The Thirteenth International Conference on Learning Representations , year=

[75] [82]

The Twelfth International Conference on Learning Representations , year=

Get what you want, not what you don't: Image content suppression for text-to-image diffusion models , author=. The Twelfth International Conference on Learning Representations , year=

[76] [83]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

One-dimensional adapter to rule them all: Concepts diffusion models and erasing applications , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[77] [84]

The Thirteenth International Conference on Learning Representations , year=

Precise Parameter Localization for Textual Generation in Diffusion Models , author=. The Thirteenth International Conference on Learning Representations , year=

[78] [85]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Hpsv3: Towards wide-spectrum human preference score , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[79] [86]

Forty-first International Conference on Machine Learning , year=

Scaling rectified flow transformers for high-resolution image synthesis , author=. Forty-first International Conference on Machine Learning , year=

[80] [87]

2023 , note=

StabilityAI , title=. 2023 , note=

2023