pith. sign in

arxiv: 2605.19739 · v1 · pith:F4UYJNAXnew · submitted 2026-05-19 · 💻 cs.CV

FlowErase-RL: Rethinking Concept Erasure as Reward Optimization in Flow Matching Models

Pith reviewed 2026-05-20 05:23 UTC · model grok-4.3

classification 💻 cs.CV
keywords concept erasureflow matchingreward optimizationGRPOtext-to-imagemodel safetyadversarial robustness
0
0 comments X

The pith

FlowErase-RL reframes concept erasure in flow matching models as a reward optimization problem using a dynamic dual-path reward system.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that concept erasure can be achieved more effectively by optimizing rewards rather than through supervised fine-tuning or inference-time fixes. It introduces two reward paths: one that penalizes generation of the target concept and another that encourages fidelity in non-target areas. These paths are balanced dynamically based on performance during training. This leads to better erasure results on things like nudity and styles while keeping image quality high and resisting attempts to bypass the erasure. A sympathetic reader would care because it suggests a scalable way to control what AI image generators produce without needing lots of matched data.

Core claim

The central claim is that reformulating concept erasure as a GRPO-based reward optimization problem, with a dynamic dual-path reward mechanism that balances Concept Erasure and Non-target Space rewards via a performance-driven switching strategy, enables state-of-the-art performance in suppressing target concepts while preserving generative quality and semantic alignment in flow matching models.

What carries the argument

The dynamic dual-path reward mechanism that jointly optimizes a Concept Erasure reward to suppress target concepts and a Non-target Space reward to preserve generative fidelity, adaptively balanced by a performance-driven switching strategy.

If this is right

  • The method achieves state-of-the-art erasure performance on nudity, object, and artistic style tasks.
  • It maintains strong image quality and semantic alignment after erasure.
  • It shows robust resistance to adversarial attacks.
  • It scales effectively to multi-concept erasure scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach may extend to other types of generative models that use similar flow or diffusion processes.
  • Reducing reliance on supervised data could make safety measures easier to implement across different concepts.
  • Testing the method on more complex or abstract concepts could reveal additional strengths or limits.

Load-bearing premise

The performance-driven switching strategy between the Concept Erasure and Non-target Space rewards can stably optimize the model without explicit supervision and without the two paths conflicting in ways that degrade either erasure or fidelity.

What would settle it

Observing training instability where the switching causes either poor erasure of targets or degraded quality in non-target images would challenge the central claim.

Figures

Figures reproduced from arXiv: 2605.19739 by Bin Chen, Ke Xu, Shuoyang Sun, Shu-Tao Xia, Xinhao Zhong, Yimin Zhou, Yi Sun, Zhiqi Zhang.

Figure 1
Figure 1. Figure 1: Overview of FlowErase-RL. (a) illustrates the framework of our approach. We employ [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of Nudity erasure results in I2P dataset and under attacks. 5 Experiments 5.1 Experimental setup Baselines. We compare our method against four SOTA approaches applied to flow matching models, including two training-based methods ESD [36], EraseAnything [38] and one training-free method DVE [35]. Evaluation metrics. We evaluate FlowErase-RL on three CE tasks: nudity erasure, artist style erasure,… view at source ↗
Figure 3
Figure 3. Figure 3: , clearly demonstrates the erasure results of our method for target object concept. O rigin al M o difie d Chain Saw Tench English Springer Garbage Truck [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Additional results of I2P dataset. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 8
Figure 8. Figure 8: Additional object erasure results in Figure 9 and Figure 10. In Figure 11, we compare our [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 5
Figure 5. Figure 5: Additional results of adversarial attacks, including MMA, RAB,P4D and UnlearnDiff. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Results of multiple concepts erasure. Our method can successfully erase target concepts in [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Additional results of erasing ’Van Gogh’. [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of Van Gogh erasure results. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Additional erasure results of 5 object. For each concept, the images show both target [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Additional erasure results of the other 5 object. For each concept, the images show [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of images generated by different methods via MS-COCO dataset. [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Comparison of FlowErase-RL on MS-COCO Dataset for all three types of concept erasure [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗
read the original abstract

Recent advances in flow matching models have significantly improved text-to-image generation quality, but also introduce growing safety risks due to the generation of harmful or undesirable content. Existing concept erasure methods are either inference-time interventions with limited effectiveness or rely on supervised fine-tuning (SFT), which requires precisely aligned data and struggles with scalability and multi-concept settings. In this paper, we propose \emph{FlowErase-RL}, the first GRPO-based framework for concept erasure in flow matching models. We reformulate concept erasure as a reward optimization problem and introduce a \textbf{dynamic dual-path reward mechanism} that jointly optimizes (i) a Concept Erasure (CE) reward to suppress target concepts and (ii) a Non-target Space (NS) reward to preserve generative fidelity. The two reward paths are adaptively balanced during training via a performance-driven switching strategy, enabling stable optimization without explicit supervision. Extensive experiments on nudity, object, and artistic style erasure demonstrate that our method achieves state-of-the-art erasure performance while maintaining strong image quality and semantic alignment. Moreover, it exhibits robust resistance to adversarial attacks and scales effectively to multi-concept scenarios. Our results establish a new paradigm for safe and controllable generation in flow matching models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces FlowErase-RL, a GRPO-based framework that reformulates concept erasure in flow matching models as a reward optimization problem. It proposes a dynamic dual-path reward mechanism that jointly optimizes a Concept Erasure (CE) reward to suppress target concepts and a Non-target Space (NS) reward to preserve generative fidelity, with the two paths adaptively balanced via a performance-driven switching strategy. The authors claim this yields state-of-the-art erasure performance on nudity, object, and artistic style tasks while maintaining image quality and semantic alignment, with additional robustness to adversarial attacks and scalability to multi-concept settings.

Significance. If the central empirical claims hold, the work offers a promising new paradigm for safe generation in flow matching models by replacing supervised fine-tuning with reinforcement learning and an adaptive dual-reward scheme. This could improve scalability and multi-concept handling compared to prior inference-time or SFT-based erasure methods. The paper is credited for the reformulation of erasure as GRPO reward optimization and for conducting experiments across multiple erasure categories.

major comments (2)
  1. [§3.2] §3.2 (Dynamic Dual-Path Reward Mechanism): The performance-driven switching strategy is asserted to enable stable optimization without explicit supervision and without the CE and NS paths conflicting, yet the manuscript supplies no formal convergence analysis, Lyapunov-style stability argument, or ablation that isolates the switching logic from fixed-weight baselines. In flow matching models with coupled continuous trajectories, even modest misalignment could accumulate across timesteps, so the absence of such analysis makes it difficult to confirm that observed gains derive from the adaptive mechanism rather than hyperparameter choices.
  2. [§4.3] §4.3 (Experimental Results): The claim of state-of-the-art erasure performance with maintained fidelity rests on the switching strategy, but without an ablation comparing adaptive switching to static reward weighting or reporting per-timestep reward conflict metrics, it remains unclear whether the dual-path mechanism is load-bearing for the reported improvements or whether simpler baselines would suffice.
minor comments (2)
  1. The abstract and introduction would benefit from a concise summary table of key metrics (e.g., erasure success rate, FID, CLIP score) against the strongest baselines to allow readers to assess the SOTA claim at a glance.
  2. [§3.2] Notation for the switching threshold and performance metric used to trigger path selection should be defined explicitly in the method section rather than left implicit in the algorithm description.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on our work. We respond to each major comment below and indicate the revisions we plan to make to the manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Dynamic Dual-Path Reward Mechanism): The performance-driven switching strategy is asserted to enable stable optimization without explicit supervision and without the CE and NS paths conflicting, yet the manuscript supplies no formal convergence analysis, Lyapunov-style stability argument, or ablation that isolates the switching logic from fixed-weight baselines. In flow matching models with coupled continuous trajectories, even modest misalignment could accumulate across timesteps, so the absence of such analysis makes it difficult to confirm that observed gains derive from the adaptive mechanism rather than hyperparameter choices.

    Authors: We appreciate this observation regarding the theoretical underpinnings of our dynamic dual-path reward mechanism. Providing a formal convergence analysis or a Lyapunov-style stability argument for the performance-driven switching in the setting of coupled continuous trajectories is a significant undertaking that we have not pursued in the current work, as our focus has been on empirical validation and practical effectiveness. To directly address the request for an ablation isolating the switching logic, we will add such an analysis in the revised manuscript, comparing the adaptive strategy against fixed-weight baselines. We believe this will help confirm that the gains stem from the adaptive balancing rather than specific hyperparameter selections. revision: partial

  2. Referee: [§4.3] §4.3 (Experimental Results): The claim of state-of-the-art erasure performance with maintained fidelity rests on the switching strategy, but without an ablation comparing adaptive switching to static reward weighting or reporting per-timestep reward conflict metrics, it remains unclear whether the dual-path mechanism is load-bearing for the reported improvements or whether simpler baselines would suffice.

    Authors: We agree that additional experimental evidence would strengthen the case for the dual-path mechanism. In the revised version of the manuscript, we will include an ablation study comparing the adaptive switching strategy to static reward weighting approaches. Furthermore, we will report per-timestep reward conflict metrics to illustrate how the performance-driven switching mitigates potential conflicts between the CE and NS rewards, thereby supporting that the mechanism is indeed load-bearing for the observed state-of-the-art results. revision: yes

standing simulated objections not resolved
  • Formal convergence analysis or Lyapunov-style stability argument for the performance-driven switching strategy.

Circularity Check

0 steps flagged

No significant circularity in the proposed RL reformulation or empirical claims

full rationale

The paper introduces FlowErase-RL as a GRPO-based reward optimization framework with a dynamic dual-path (CE and NS) mechanism balanced by a performance-driven switching strategy. All central claims of SOTA erasure performance, fidelity preservation, and robustness are supported by extensive experiments across nudity, object, and style tasks rather than by any self-referential equations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the results to the method's own inputs by construction. The derivation chain is therefore self-contained and externally validated through empirical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the effectiveness of GRPO applied to flow matching and on the stability of the adaptive reward balancing; these are domain assumptions rather than derived results. No free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption GRPO can be applied to optimize concept erasure in flow matching models without requiring precisely aligned supervised data.
    The method is built directly on this premise to avoid SFT limitations.

pith-pipeline@v0.9.0 · 5765 in / 1302 out tokens · 52233 ms · 2026-05-20T05:23:45.725393+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 10 internal anchors

  1. [1]

    AI-created Child Sexual Abuse Images ’Threaten to Overwhelm Internet’.The Guardian, 10 2023

    Dan Milmo. AI-created Child Sexual Abuse Images ’Threaten to Overwhelm Internet’.The Guardian, 10 2023

  2. [2]

    Diffusion Models Beat GANs on Image Synthesis

    Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis.ArXiv, abs/2105.05233, 2021

  3. [3]

    Classifier-Free Diffusion Guidance

    Jonathan Ho. Classifier-free diffusion guidance.ArXiv, abs/2207.12598, 2022

  4. [4]

    Jonathan Ho, Ajay Jain, and P. Abbeel. Denoising diffusion probabilistic models.ArXiv, abs/2006.11239, 2020

  5. [5]

    Glide: Towards photorealistic image generation and editing with text-guided diffusion models

    Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. InInternational Conference on Machine Learning, 2021

  6. [6]

    Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer

    Robin Rombach, A. Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models.2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685, 2021

  7. [7]

    Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Seyed Kam- yar Seyed Ghasemipour, Burcu Karagol Ayan, Seyedeh Sara Mahdavi, Raphael Gontijo Lopes, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding.ArXiv, abs/2205.11487, 2022

  8. [8]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022

  9. [9]

    Flux, 2024

    Black Forest Labs. Flux, 2024

  10. [10]

    InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society

    Harry H. Jiang, Lauren Brown, Jessica Cheng, Mehtab Khan, Abhishek Gupta, Deja Workman, Alex Hanna, Johnathan Flowers, and Timnit Gebru. Ai art and its impact on artists. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23, page 363–374, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9798400702310. doi: ...

  11. [11]

    An AI-generated picture won an art prize

    Kevin Roose. An AI-generated picture won an art prize. Artists aren’t happy.The New York Times, 9 2022

  12. [12]

    AI art generators hit with copyright suit over artists’ images.The Guardian, 1 2023

    Riddhi Setty. AI art generators hit with copyright suit over artists’ images.The Guardian, 1 2023

  13. [13]

    Another Body: My AI Porn Nightmare – a disturbing digital detective story.The Washington Post, 1 2024

    Ed Power. Another Body: My AI Porn Nightmare – a disturbing digital detective story.The Washington Post, 1 2024

  14. [14]

    Stable diffusion 2.0, 2022

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. Stable diffusion 2.0, 2022

  15. [15]

    Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22522–22531, 2022

    Patrick Schramowski, Manuel Brack, Bjorn Deiseroth, and Kristian Kersting. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22522–22531, 2022

  16. [16]

    Stable Diffusion 1 vs 2 - What You Need to Know

    Ryan O’Connor. Stable Diffusion 1 vs 2 - What You Need to Know. Blog post, 2022. Accessed: 2025-01-01

  17. [17]

    Denoising Diffusion Implicit Models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.ArXiv, abs/2010.02502, 2020

  18. [18]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.ArXiv, abs/2210.02747, 2022

  19. [19]

    Yuan Wang, Ouxiang Li, Tingting Mu, Yanbin Hao, Kuien Liu, Xiang Wang, and Xiangnan He. Precise, fast, and low-cost concept erasure in value space: Orthogonal complement matters.2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 28759–28768, 2024

  20. [20]

    Conceptprune: Concept editing in diffusion models via skilled neuron pruning.arXiv preprint arXiv:2405.19237,

    Ruchika Chavhan, Da Li, and Timothy M. Hospedales. Conceptprune: Concept editing in diffusion models via skilled neuron pruning.ArXiv, abs/2405.19237, 2024

  21. [21]

    Unified concept editing in diffusion models.2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5099–5108, 2023

    Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzy’nska, and David Bau. Unified concept editing in diffusion models.2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5099–5108, 2023. 10

  22. [22]

    Trce: Towards reliable malicious concept erasure in text-to-image diffusion models.ArXiv, abs/2503.07389, 2025

    Ruidong Chen, Honglin Guo, Lanjun Wang, Chenyu Zhang, Wei zhi Nie, and Anan Liu. Trce: Towards reliable malicious concept erasure in text-to-image diffusion models.ArXiv, abs/2503.07389, 2025

  23. [23]

    Mace: Mass concept erasure in diffusion models.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6430–6440, 2024

    Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, and Adams Wai-Kin Kong. Mace: Mass concept erasure in diffusion models.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6430–6440, 2024

  24. [24]

    Saeuron: Interpretable concept unlearning in diffusion models with sparse autoencoders

    Bartosz Cywi’nski and Kamil Deja. Saeuron: Interpretable concept unlearning in diffusion models with sparse autoencoders.ArXiv, abs/2501.18052, 2025

  25. [25]

    R.a.c.e.: Robust adversarial concept erasure for secure text-to-image diffusion model.ArXiv, abs/2405.16341, 2024

    Chang Soo Kim, Kyle Min, and Yezhou Yang. R.a.c.e.: Robust adversarial concept erasure for secure text-to-image diffusion model.ArXiv, abs/2405.16341, 2024

  26. [26]

    Safegen: Mitigating sexually explicit content generation in text-to-image models.Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, 2024

    Xinfeng Li, Yuchen Yang, Jiangyi Deng, Chen Yan, Yanjiao Chen, Xiaoyu Ji, and Wenyuan Xu. Safegen: Mitigating sexually explicit content generation in text-to-image models.Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, 2024

  27. [27]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Jun-Mei Song, Mingchuan Zhang, Y . K. Li, Yu Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. ArXiv, abs/2402.03300, 2024

  28. [28]

    Flow-GRPO: Training Flow Matching Models via Online RL

    Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.ArXiv, abs/2505.05470, 2025

  29. [29]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.ArXiv, abs/2209.03003, 2022

  30. [30]

    Flux.1 kontext: Flow matching for in-context image generation and editing in latent space, 2025

    Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context image ...

  31. [31]

    Patrick Esser, Sumith Kulal, A. Blattmann, Rahim Entezari, Jonas Muller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthesis. InInternational Conference on Machine Lea...

  32. [32]

    Red-teaming the stable diffusion safety filter,

    Javier Rando, Daniel Paleka, David Lindner, Lennard Heim, and Florian Tramèr. Red-teaming the stable diffusion safety filter.ArXiv, abs/2210.04610, 2022

  33. [33]

    DALL·E 3 System Card

    OpenAI. DALL·E 3 System Card. Technical report, OpenAI, 9 2023

  34. [34]

    Acterase: A training-free paradigm for precise concept erasure via activation redirection

    Yi Sun, Xinhao Zhong, Hongyang Li, Yimin Zhou, Junhao Li, Bin Chen, and Xuan Wang. Acterase: A training-free paradigm for precise concept erasure via activation redirection. 2026

  35. [35]

    Differential vector erasure: Unified training-free concept erasure for flow matching models.ArXiv, abs/2602.01089, 2026

    Zhiqi Zhang, Xinhao Zhong, Yi Sun, Shuoyang Sun, Bin Chen, Shutao Xia, and Xuan Wang. Differential vector erasure: Unified training-free concept erasure for flow matching models.ArXiv, abs/2602.01089, 2026

  36. [36]

    Erasing concepts from diffusion models.2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2426–2436, 2023

    Rohit Gandikota, Joanna Materzynska, Jaden Fiotto-Kaufman, and David Bau. Erasing concepts from diffusion models.2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2426–2436, 2023

  37. [37]

    Forget-me-not: Learning to forget in text-to-image diffusion models.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1755–1764, 2023

    Eric Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, and Humphrey Shi. Forget-me-not: Learning to forget in text-to-image diffusion models.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1755–1764, 2023

  38. [38]

    Eraseanything: Enabling concept erasure in rectified flow transformers

    Daiheng Gao, Shilin Lu, Wenbo Zhou, Jiaming Chu, Jie Zhang, Mengxi Jia, Bang Zhang, Zhaoxin Fan, and Weiming Zhang. Eraseanything: Enabling concept erasure in rectified flow transformers. InF orty-second International Conference on Machine Learning, 2025

  39. [39]

    Nsfw-detection-dl: Deep learning based nsfw content detection

    Lakshay Chhabra. Nsfw-detection-dl: Deep learning based nsfw content detection. https://github. com/lakshaychhabra/NSFW-Detection-DL, 2020

  40. [40]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, 2021. 11

  41. [41]

    GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Günter Klambauer, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a nash equilibrium.ArXiv, abs/1706.08500, 2017

  42. [42]

    Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C

    Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean Conference on Computer Vision, 2014

  43. [43]

    Mma-diffusion: Multimodal attack on diffusion models.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7737–7746, 2023

    Yijun Yang, Ruiyuan Gao, Xiaosen Wang, Nan Xu, and Qiang Xu. Mma-diffusion: Multimodal attack on diffusion models.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7737–7746, 2023

  44. [44]

    Ring-a-bell! how reliable are concept removal methods for diffusion models? ArXiv, abs/2310.10012, 2023

    Yu-Lin Tsai, Chia-Yi Hsu, Chulin Xie, Chih-Hsun Lin, Jia-You Chen, Bo Li, Pin-Yu Chen, Chia-Mu Yu, and Chun ying Huang. Ring-a-bell! how reliable are concept removal methods for diffusion models? ArXiv, abs/2310.10012, 2023

  45. [45]

    Prompt- ing4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts.ArXiv, abs/2309.06135, 2023

    Zhi-Yi Chin, Chieh-Ming Jiang, Ching-Chun Huang, Pin-Yu Chen, and Wei-Chen Chiu. Prompt- ing4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts.ArXiv, abs/2309.06135, 2023

  46. [46]

    To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images

    Yimeng Zhang, Jinghan Jia, Xin Chen, Aochuan Chen, Yihua Zhang, Jiancheng Liu, Ke Ding, and Sijia Liu. To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images ... for now. InEuropean Conference on Computer Vision, 2023

  47. [47]

    Socher, Li Jia Li, Kai Li, and Li Fei-Fei

    Jia Deng, Wei Dong, R. Socher, Li Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database.Proc of IEEE Computer Vision & Pattern Recognition, pages 248–255, 2009

  48. [48]

    Zhang, Shaoqing Ren, and Jian Sun

    Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015

  49. [49]

    Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip H. S. Torr, and Fabio Pizzati. Latent guard: a safety framework for text-to-image generation. InEuropean Conference on Computer Vision, 2024

  50. [50]

    , forming the Forget Set. Meanwhile, we utilize GPT to generate the most similar prompts that do not contain

    Josh Achiam, Steven Adler, Sandhini Agarwal, and et al. Gpt-4 technical report, 2023. 12 A Details of implementation A.1 Details of hyper-parameters For each concept erasure task, the following settings remain the same. Due to the resource limitations of A6000, we set the inference steps to 12, the CFG scale to 1.0 (default setting of FLUX 1.0 Schnell) an...

  51. [51]

    Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...