FlowErase-RL: Rethinking Concept Erasure as Reward Optimization in Flow Matching Models
Pith reviewed 2026-05-20 05:23 UTC · model grok-4.3
The pith
FlowErase-RL reframes concept erasure in flow matching models as a reward optimization problem using a dynamic dual-path reward system.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that reformulating concept erasure as a GRPO-based reward optimization problem, with a dynamic dual-path reward mechanism that balances Concept Erasure and Non-target Space rewards via a performance-driven switching strategy, enables state-of-the-art performance in suppressing target concepts while preserving generative quality and semantic alignment in flow matching models.
What carries the argument
The dynamic dual-path reward mechanism that jointly optimizes a Concept Erasure reward to suppress target concepts and a Non-target Space reward to preserve generative fidelity, adaptively balanced by a performance-driven switching strategy.
If this is right
- The method achieves state-of-the-art erasure performance on nudity, object, and artistic style tasks.
- It maintains strong image quality and semantic alignment after erasure.
- It shows robust resistance to adversarial attacks.
- It scales effectively to multi-concept erasure scenarios.
Where Pith is reading between the lines
- This approach may extend to other types of generative models that use similar flow or diffusion processes.
- Reducing reliance on supervised data could make safety measures easier to implement across different concepts.
- Testing the method on more complex or abstract concepts could reveal additional strengths or limits.
Load-bearing premise
The performance-driven switching strategy between the Concept Erasure and Non-target Space rewards can stably optimize the model without explicit supervision and without the two paths conflicting in ways that degrade either erasure or fidelity.
What would settle it
Observing training instability where the switching causes either poor erasure of targets or degraded quality in non-target images would challenge the central claim.
Figures
read the original abstract
Recent advances in flow matching models have significantly improved text-to-image generation quality, but also introduce growing safety risks due to the generation of harmful or undesirable content. Existing concept erasure methods are either inference-time interventions with limited effectiveness or rely on supervised fine-tuning (SFT), which requires precisely aligned data and struggles with scalability and multi-concept settings. In this paper, we propose \emph{FlowErase-RL}, the first GRPO-based framework for concept erasure in flow matching models. We reformulate concept erasure as a reward optimization problem and introduce a \textbf{dynamic dual-path reward mechanism} that jointly optimizes (i) a Concept Erasure (CE) reward to suppress target concepts and (ii) a Non-target Space (NS) reward to preserve generative fidelity. The two reward paths are adaptively balanced during training via a performance-driven switching strategy, enabling stable optimization without explicit supervision. Extensive experiments on nudity, object, and artistic style erasure demonstrate that our method achieves state-of-the-art erasure performance while maintaining strong image quality and semantic alignment. Moreover, it exhibits robust resistance to adversarial attacks and scales effectively to multi-concept scenarios. Our results establish a new paradigm for safe and controllable generation in flow matching models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces FlowErase-RL, a GRPO-based framework that reformulates concept erasure in flow matching models as a reward optimization problem. It proposes a dynamic dual-path reward mechanism that jointly optimizes a Concept Erasure (CE) reward to suppress target concepts and a Non-target Space (NS) reward to preserve generative fidelity, with the two paths adaptively balanced via a performance-driven switching strategy. The authors claim this yields state-of-the-art erasure performance on nudity, object, and artistic style tasks while maintaining image quality and semantic alignment, with additional robustness to adversarial attacks and scalability to multi-concept settings.
Significance. If the central empirical claims hold, the work offers a promising new paradigm for safe generation in flow matching models by replacing supervised fine-tuning with reinforcement learning and an adaptive dual-reward scheme. This could improve scalability and multi-concept handling compared to prior inference-time or SFT-based erasure methods. The paper is credited for the reformulation of erasure as GRPO reward optimization and for conducting experiments across multiple erasure categories.
major comments (2)
- [§3.2] §3.2 (Dynamic Dual-Path Reward Mechanism): The performance-driven switching strategy is asserted to enable stable optimization without explicit supervision and without the CE and NS paths conflicting, yet the manuscript supplies no formal convergence analysis, Lyapunov-style stability argument, or ablation that isolates the switching logic from fixed-weight baselines. In flow matching models with coupled continuous trajectories, even modest misalignment could accumulate across timesteps, so the absence of such analysis makes it difficult to confirm that observed gains derive from the adaptive mechanism rather than hyperparameter choices.
- [§4.3] §4.3 (Experimental Results): The claim of state-of-the-art erasure performance with maintained fidelity rests on the switching strategy, but without an ablation comparing adaptive switching to static reward weighting or reporting per-timestep reward conflict metrics, it remains unclear whether the dual-path mechanism is load-bearing for the reported improvements or whether simpler baselines would suffice.
minor comments (2)
- The abstract and introduction would benefit from a concise summary table of key metrics (e.g., erasure success rate, FID, CLIP score) against the strongest baselines to allow readers to assess the SOTA claim at a glance.
- [§3.2] Notation for the switching threshold and performance metric used to trigger path selection should be defined explicitly in the method section rather than left implicit in the algorithm description.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our work. We respond to each major comment below and indicate the revisions we plan to make to the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Dynamic Dual-Path Reward Mechanism): The performance-driven switching strategy is asserted to enable stable optimization without explicit supervision and without the CE and NS paths conflicting, yet the manuscript supplies no formal convergence analysis, Lyapunov-style stability argument, or ablation that isolates the switching logic from fixed-weight baselines. In flow matching models with coupled continuous trajectories, even modest misalignment could accumulate across timesteps, so the absence of such analysis makes it difficult to confirm that observed gains derive from the adaptive mechanism rather than hyperparameter choices.
Authors: We appreciate this observation regarding the theoretical underpinnings of our dynamic dual-path reward mechanism. Providing a formal convergence analysis or a Lyapunov-style stability argument for the performance-driven switching in the setting of coupled continuous trajectories is a significant undertaking that we have not pursued in the current work, as our focus has been on empirical validation and practical effectiveness. To directly address the request for an ablation isolating the switching logic, we will add such an analysis in the revised manuscript, comparing the adaptive strategy against fixed-weight baselines. We believe this will help confirm that the gains stem from the adaptive balancing rather than specific hyperparameter selections. revision: partial
-
Referee: [§4.3] §4.3 (Experimental Results): The claim of state-of-the-art erasure performance with maintained fidelity rests on the switching strategy, but without an ablation comparing adaptive switching to static reward weighting or reporting per-timestep reward conflict metrics, it remains unclear whether the dual-path mechanism is load-bearing for the reported improvements or whether simpler baselines would suffice.
Authors: We agree that additional experimental evidence would strengthen the case for the dual-path mechanism. In the revised version of the manuscript, we will include an ablation study comparing the adaptive switching strategy to static reward weighting approaches. Furthermore, we will report per-timestep reward conflict metrics to illustrate how the performance-driven switching mitigates potential conflicts between the CE and NS rewards, thereby supporting that the mechanism is indeed load-bearing for the observed state-of-the-art results. revision: yes
- Formal convergence analysis or Lyapunov-style stability argument for the performance-driven switching strategy.
Circularity Check
No significant circularity in the proposed RL reformulation or empirical claims
full rationale
The paper introduces FlowErase-RL as a GRPO-based reward optimization framework with a dynamic dual-path (CE and NS) mechanism balanced by a performance-driven switching strategy. All central claims of SOTA erasure performance, fidelity preservation, and robustness are supported by extensive experiments across nudity, object, and style tasks rather than by any self-referential equations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the results to the method's own inputs by construction. The derivation chain is therefore self-contained and externally validated through empirical benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption GRPO can be applied to optimize concept erasure in flow matching models without requiring precisely aligned supervised data.
Reference graph
Works this paper leans on
-
[1]
AI-created Child Sexual Abuse Images ’Threaten to Overwhelm Internet’.The Guardian, 10 2023
Dan Milmo. AI-created Child Sexual Abuse Images ’Threaten to Overwhelm Internet’.The Guardian, 10 2023
work page 2023
-
[2]
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis.ArXiv, abs/2105.05233, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[3]
Classifier-Free Diffusion Guidance
Jonathan Ho. Classifier-free diffusion guidance.ArXiv, abs/2207.12598, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[4]
Jonathan Ho, Ajay Jain, and P. Abbeel. Denoising diffusion probabilistic models.ArXiv, abs/2006.11239, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[5]
Glide: Towards photorealistic image generation and editing with text-guided diffusion models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. InInternational Conference on Machine Learning, 2021
work page 2021
-
[6]
Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer
Robin Rombach, A. Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models.2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685, 2021
work page 2022
-
[7]
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Seyed Kam- yar Seyed Ghasemipour, Burcu Karagol Ayan, Seyedeh Sara Mahdavi, Raphael Gontijo Lopes, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding.ArXiv, abs/2205.11487, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[8]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022
work page 2022
- [9]
-
[10]
InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society
Harry H. Jiang, Lauren Brown, Jessica Cheng, Mehtab Khan, Abhishek Gupta, Deja Workman, Alex Hanna, Johnathan Flowers, and Timnit Gebru. Ai art and its impact on artists. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23, page 363–374, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9798400702310. doi: ...
-
[11]
An AI-generated picture won an art prize
Kevin Roose. An AI-generated picture won an art prize. Artists aren’t happy.The New York Times, 9 2022
work page 2022
-
[12]
AI art generators hit with copyright suit over artists’ images.The Guardian, 1 2023
Riddhi Setty. AI art generators hit with copyright suit over artists’ images.The Guardian, 1 2023
work page 2023
-
[13]
Ed Power. Another Body: My AI Porn Nightmare – a disturbing digital detective story.The Washington Post, 1 2024
work page 2024
-
[14]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. Stable diffusion 2.0, 2022
work page 2022
-
[15]
Patrick Schramowski, Manuel Brack, Bjorn Deiseroth, and Kristian Kersting. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22522–22531, 2022
work page 2023
-
[16]
Stable Diffusion 1 vs 2 - What You Need to Know
Ryan O’Connor. Stable Diffusion 1 vs 2 - What You Need to Know. Blog post, 2022. Accessed: 2025-01-01
work page 2022
-
[17]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.ArXiv, abs/2010.02502, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[18]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.ArXiv, abs/2210.02747, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[19]
Yuan Wang, Ouxiang Li, Tingting Mu, Yanbin Hao, Kuien Liu, Xiang Wang, and Xiangnan He. Precise, fast, and low-cost concept erasure in value space: Orthogonal complement matters.2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 28759–28768, 2024
work page 2025
-
[20]
Ruchika Chavhan, Da Li, and Timothy M. Hospedales. Conceptprune: Concept editing in diffusion models via skilled neuron pruning.ArXiv, abs/2405.19237, 2024
-
[21]
Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzy’nska, and David Bau. Unified concept editing in diffusion models.2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5099–5108, 2023. 10
work page 2024
-
[22]
Ruidong Chen, Honglin Guo, Lanjun Wang, Chenyu Zhang, Wei zhi Nie, and Anan Liu. Trce: Towards reliable malicious concept erasure in text-to-image diffusion models.ArXiv, abs/2503.07389, 2025
-
[23]
Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, and Adams Wai-Kin Kong. Mace: Mass concept erasure in diffusion models.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6430–6440, 2024
work page 2024
-
[24]
Saeuron: Interpretable concept unlearning in diffusion models with sparse autoencoders
Bartosz Cywi’nski and Kamil Deja. Saeuron: Interpretable concept unlearning in diffusion models with sparse autoencoders.ArXiv, abs/2501.18052, 2025
-
[25]
Chang Soo Kim, Kyle Min, and Yezhou Yang. R.a.c.e.: Robust adversarial concept erasure for secure text-to-image diffusion model.ArXiv, abs/2405.16341, 2024
-
[26]
Xinfeng Li, Yuchen Yang, Jiangyi Deng, Chen Yan, Yanjiao Chen, Xiaoyu Ji, and Wenyuan Xu. Safegen: Mitigating sexually explicit content generation in text-to-image models.Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, 2024
work page 2024
-
[27]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Jun-Mei Song, Mingchuan Zhang, Y . K. Li, Yu Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. ArXiv, abs/2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[28]
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.ArXiv, abs/2505.05470, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.ArXiv, abs/2209.03003, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[30]
Flux.1 kontext: Flow matching for in-context image generation and editing in latent space, 2025
Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context image ...
work page 2025
-
[31]
Patrick Esser, Sumith Kulal, A. Blattmann, Rahim Entezari, Jonas Muller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthesis. InInternational Conference on Machine Lea...
work page 2024
-
[32]
Red-teaming the stable diffusion safety filter,
Javier Rando, Daniel Paleka, David Lindner, Lennard Heim, and Florian Tramèr. Red-teaming the stable diffusion safety filter.ArXiv, abs/2210.04610, 2022
- [33]
-
[34]
Acterase: A training-free paradigm for precise concept erasure via activation redirection
Yi Sun, Xinhao Zhong, Hongyang Li, Yimin Zhou, Junhao Li, Bin Chen, and Xuan Wang. Acterase: A training-free paradigm for precise concept erasure via activation redirection. 2026
work page 2026
-
[35]
Zhiqi Zhang, Xinhao Zhong, Yi Sun, Shuoyang Sun, Bin Chen, Shutao Xia, and Xuan Wang. Differential vector erasure: Unified training-free concept erasure for flow matching models.ArXiv, abs/2602.01089, 2026
-
[36]
Rohit Gandikota, Joanna Materzynska, Jaden Fiotto-Kaufman, and David Bau. Erasing concepts from diffusion models.2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2426–2436, 2023
work page 2023
-
[37]
Eric Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, and Humphrey Shi. Forget-me-not: Learning to forget in text-to-image diffusion models.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1755–1764, 2023
work page 2024
-
[38]
Eraseanything: Enabling concept erasure in rectified flow transformers
Daiheng Gao, Shilin Lu, Wenbo Zhou, Jiaming Chu, Jie Zhang, Mengxi Jia, Bang Zhang, Zhaoxin Fan, and Weiming Zhang. Eraseanything: Enabling concept erasure in rectified flow transformers. InF orty-second International Conference on Machine Learning, 2025
work page 2025
-
[39]
Nsfw-detection-dl: Deep learning based nsfw content detection
Lakshay Chhabra. Nsfw-detection-dl: Deep learning based nsfw content detection. https://github. com/lakshaychhabra/NSFW-Detection-DL, 2020
work page 2020
-
[40]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, 2021. 11
work page 2021
-
[41]
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Günter Klambauer, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a nash equilibrium.ArXiv, abs/1706.08500, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[42]
Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean Conference on Computer Vision, 2014
work page 2014
-
[43]
Yijun Yang, Ruiyuan Gao, Xiaosen Wang, Nan Xu, and Qiang Xu. Mma-diffusion: Multimodal attack on diffusion models.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7737–7746, 2023
work page 2024
-
[44]
Yu-Lin Tsai, Chia-Yi Hsu, Chulin Xie, Chih-Hsun Lin, Jia-You Chen, Bo Li, Pin-Yu Chen, Chia-Mu Yu, and Chun ying Huang. Ring-a-bell! how reliable are concept removal methods for diffusion models? ArXiv, abs/2310.10012, 2023
-
[45]
Zhi-Yi Chin, Chieh-Ming Jiang, Ching-Chun Huang, Pin-Yu Chen, and Wei-Chen Chiu. Prompt- ing4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts.ArXiv, abs/2309.06135, 2023
-
[46]
Yimeng Zhang, Jinghan Jia, Xin Chen, Aochuan Chen, Yihua Zhang, Jiancheng Liu, Ke Ding, and Sijia Liu. To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images ... for now. InEuropean Conference on Computer Vision, 2023
work page 2023
-
[47]
Socher, Li Jia Li, Kai Li, and Li Fei-Fei
Jia Deng, Wei Dong, R. Socher, Li Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database.Proc of IEEE Computer Vision & Pattern Recognition, pages 248–255, 2009
work page 2009
-
[48]
Zhang, Shaoqing Ren, and Jian Sun
Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015
work page 2016
-
[49]
Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip H. S. Torr, and Fabio Pizzati. Latent guard: a safety framework for text-to-image generation. InEuropean Conference on Computer Vision, 2024
work page 2024
-
[50]
Josh Achiam, Steven Adler, Sandhini Agarwal, and et al. Gpt-4 technical report, 2023. 12 A Details of implementation A.1 Details of hyper-parameters For each concept erasure task, the following settings remain the same. Due to the resource limitations of A6000, we set the inference steps to 12, the CFG scale to 1.0 (default setting of FLUX 1.0 Schnell) an...
work page 2023
-
[51]
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.