Machine Unlearning for Masked Diffusion Language Models

Georu Lee; Hoki Kim; Jinseong Park; Seungwon Jeong; Woojin Lee

arxiv: 2605.18253 · v1 · pith:QECAGJSUnew · submitted 2026-05-18 · 💻 cs.CL · cs.AI

Machine Unlearning for Masked Diffusion Language Models

Georu Lee , Seungwon Jeong , Hoki Kim , Jinseong Park , Woojin Lee This is my paper

Pith reviewed 2026-05-20 10:21 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords machine unlearningmasked diffusion language modelsforward KL divergencediffusion modelslanguage modelsprivacy preservationfine-tuning reversal

0 comments

The pith

Masked diffusion unlearning removes targeted knowledge by minimizing forward KL divergence to a prompt-masked unconditional anchor at masked positions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Masked Diffusion Unlearning (MDU) to erase specific information from masked diffusion language models. These models shift predictions during fine-tuning by recovering responses from masked states conditioned on prompts. MDU reverses that shift by minimizing forward KL divergence back to a prompt-masked unconditional anchor at each masked response position, using temperature scaling to manage the privacy-utility balance. If the approach holds, it supplies a tailored unlearning method for the parallel denoising process of MDLMs instead of borrowing techniques from autoregressive models.

Core claim

MDU minimizes a forward KL divergence from the prompt-conditional prediction to a prompt-masked unconditional anchor at every masked response position, with a temperature scaling parameter to control the privacy-utility trade-off, and empirical results on standard benchmarks and MDLM backbones show high unlearning performance compared to existing LLM unlearning methods.

What carries the argument

Minimizing forward KL divergence from prompt-conditional predictions to a prompt-masked unconditional anchor at masked response positions, with temperature scaling to adjust the privacy-utility trade-off.

If this is right

MDU achieves higher unlearning performance than existing LLM unlearning methods on standard benchmarks.
The framework applies directly to MDLM backbones such as LLaDA and Dream.
Temperature scaling provides explicit control over the trade-off between forgetting targeted data and retaining overall model performance.
Unlearning occurs by reversing the diffusion fine-tuning shift at masked positions without requiring changes to the core generative process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same divergence-minimization idea could be tested on diffusion models used for non-language data such as images or audio sequences.
Repeated unlearning sessions on the same model might accumulate effects that gradually reduce overall generation quality, which could be checked in follow-up experiments.
This mechanism might connect to unlearning needs in other parallel generative architectures where conditioning shifts predictions away from an unconditional baseline.

Load-bearing premise

That minimizing the forward KL divergence to the prompt-masked unconditional anchor at masked positions will remove targeted knowledge without substantially harming the model's general capabilities or introducing new unintended behaviors.

What would settle it

Measuring whether the model still generates the specific unlearned content when given the original prompts after MDU is applied.

Figures

Figures reproduced from arXiv: 2605.18253 by Georu Lee, Hoki Kim, Jinseong Park, Seungwon Jeong, Woojin Lee.

**Figure 3.** Figure 3: Token-level conditional-anchor KL analysis along a [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Convergence diagnostic on forget queries. At each epoch, we measure token-level KL [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Token-Level Conditional-Anchor KL Trajectories Example 1 [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Token-Level Conditional-Anchor KL Trajectories Example 2 [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Token-Level Conditional-Anchor KL Trajectories Example 3 [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Token-Level Conditional-Anchor KL Trajectories Example 4 [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

read the original abstract

Recent masked diffusion language models (MDLMs), such as LLaDA and Dream, have achieved performance comparable to autoregressive large language models. Unlike autoregressive models, which generate text sequentially, MDLMs generate text by iteratively denoising masked positions in parallel. During fine-tuning, MDLMs learn to recover responses from masked response states conditioned on a prompt, thereby shifting their predictions from a prompt-masked unconditional distribution toward a prompt-conditional distribution. Despite this distinct generative and fine-tuning mechanism, machine unlearning for MDLMs remains largely unexplored. In this paper, we propose Masked Diffusion Unlearning (MDU), the first unlearning framework for MDLMs, by revisiting the process of learning specific knowledge in terms of diffusion. Specifically, MDU minimizes a forward KL divergence from the prompt-conditional prediction to a prompt-masked unconditional anchor at every masked response position, with a temperature scaling parameter to control the privacy-utility trade-off. Our empirical results on standard benchmarks and MDLM backbones show that MDU achieves high unlearning performance compared to existing LLM unlearning methods. Code is available at https://github.com/leegeoru/MDU.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MDU is the first unlearning framework for masked diffusion language models, using KL minimization to an unconditional anchor during denoising, with empirical claims that need more detail to assess fully.

read the letter

The punchline is that this paper presents Masked Diffusion Unlearning (MDU) as a direct way to handle forgetting in MDLMs by minimizing forward KL divergence to a prompt-masked unconditional distribution at masked positions, controlled by temperature scaling. What the paper does well is tie the unlearning objective tightly to the model's generative process. MDLMs denoise in parallel and shift their distribution during fine-tuning from unconditional to conditional on prompts. Reversing that shift makes conceptual sense and avoids just bolting on AR-style unlearning methods. The empirical section compares against adapted baselines on standard benchmarks and claims better performance, and they provide code for others to inspect. The soft spots are around the strength of the evidence. The abstract does not include specific numbers, details on statistical significance, or thorough checks on whether general capabilities are preserved or if new issues arise from the unlearning. The assumption that the anchor serves as an effective forgetting target without leakage in the iterative process is reasonable but would benefit from more validation in the full results. This work is for people studying machine unlearning in diffusion-based or parallel generation models, or those concerned with privacy in large language models that are not purely autoregressive. A reader interested in extending unlearning techniques to new architectures would find the mechanistic approach useful. It deserves a serious referee because it addresses an unexplored area with a method grounded in the training dynamics. I recommend sending it for peer review, with suggestions to expand on the experimental details and ablations.

Referee Report

2 major / 2 minor

Summary. The paper proposes Masked Diffusion Unlearning (MDU) as the first unlearning method for masked diffusion language models (MDLMs). It reverses the fine-tuning shift by minimizing forward KL divergence from the prompt-conditional distribution to a prompt-masked unconditional anchor at masked response positions, controlled by a temperature scaling parameter. Experiments on standard benchmarks with MDLM backbones (e.g., LLaDA, Dream) report that MDU achieves higher unlearning performance than adapted existing LLM unlearning baselines.

Significance. If the empirical results hold under rigorous controls, this work is significant for addressing machine unlearning in non-autoregressive diffusion-based LLMs, a gap left by prior methods focused on autoregressive models. The mechanistic grounding in the diffusion denoising process and the public code release are strengths that support reproducibility and potential adoption.

major comments (2)

[§4 Experiments] §4 Experiments: the central claim of superior unlearning performance relative to LLM baselines is only partially supported because the manuscript provides no details on exact metrics (e.g., forget rate, retain accuracy), statistical significance testing, number of runs, or how autoregressive unlearning methods were adapted to the parallel denoising setting of MDLMs.
[§3.2 Method] §3.2 Method: the assumption that the prompt-masked unconditional anchor serves as a faithful forgetting target without residual leakage through the iterative denoising steps is load-bearing for the efficacy claim, yet the paper does not provide ablation or analysis showing that the KL minimization fully propagates the unlearning signal across denoising timesteps.

minor comments (2)

[Abstract] The abstract and introduction refer to 'standard benchmarks' without naming them (e.g., TOXICITY, TRUTHFULQA, or specific unlearning suites); explicit listing would improve clarity.
[§3.2] The temperature scaling parameter is described as controlling the privacy-utility trade-off, but no sensitivity analysis or default selection procedure is reported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to incorporate additional details and analysis where appropriate.

read point-by-point responses

Referee: [§4 Experiments] §4 Experiments: the central claim of superior unlearning performance relative to LLM baselines is only partially supported because the manuscript provides no details on exact metrics (e.g., forget rate, retain accuracy), statistical significance testing, number of runs, or how autoregressive unlearning methods were adapted to the parallel denoising setting of MDLMs.

Authors: We agree that these experimental details should be explicit. In the revised manuscript we will add: precise metric definitions (forget rate as the drop in accuracy on the forget set relative to the original model; retain accuracy as performance on the retain set); results reported as means and standard deviations over 5 independent runs with paired t-test p-values for significance; and a new paragraph in §4.1 explaining the adaptation procedure, in which autoregressive baselines are applied independently at each denoising timestep while respecting the parallel mask prediction structure of MDLMs. revision: yes
Referee: [§3.2 Method] §3.2 Method: the assumption that the prompt-masked unconditional anchor serves as a faithful forgetting target without residual leakage through the iterative denoising steps is load-bearing for the efficacy claim, yet the paper does not provide ablation or analysis showing that the KL minimization fully propagates the unlearning signal across denoising timesteps.

Authors: The concern is well-taken. Because the forward KL term is minimized at every masked position and every timestep, the unlearning signal is enforced throughout the chain of denoising steps; changes at early timesteps necessarily influence later conditional predictions. Nevertheless, we will add a targeted ablation in the revision that varies the timesteps at which MDU is applied and reports the resulting forget/retain metrics, thereby providing direct evidence that the effect propagates without substantial residual leakage. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper defines Masked Diffusion Unlearning (MDU) directly as minimization of forward KL divergence from the prompt-conditional distribution to the prompt-masked unconditional anchor at masked positions, with an explicit temperature scaling parameter. This construction is a mechanistic reversal of the described MDLM fine-tuning shift and is evaluated on external standard benchmarks against adapted LLM baselines. No equations reduce the claimed unlearning performance to a fitted quantity by construction, no self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results is smuggled in. The derivation chain is therefore self-contained against the stated inputs and external evaluation.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The method relies on one tunable temperature scaling parameter to control the privacy-utility trade-off and assumes the prompt-masked unconditional distribution is a suitable unlearning target; no new entities are postulated.

free parameters (1)

temperature scaling parameter
Controls the strength of the KL divergence term and is chosen to balance forgetting against retained utility.

pith-pipeline@v0.9.0 · 5743 in / 1091 out tokens · 31024 ms · 2026-05-20T10:21:51.111123+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MDU minimizes a forward KL divergence from the prompt-conditional prediction to a prompt-masked unconditional anchor at every masked response position, with a temperature scaling parameter

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 7 internal anchors

[1]

Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

Subham S Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

work page 2024
[2]

Large Language Diffusion Models

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models.arXiv preprint arXiv:2502.09992, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Dream 7B: Diffusion Large Language Models

Jiacheng Ye, Zhihui Xie, Lin Zheng, Jiahui Gao, Zirui Wu, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream 7b: Diffusion large language models.arXiv preprint arXiv:2508.15487, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Towards making systems forget with machine unlearning

Yinzhi Cao and Junfeng Yang. Towards making systems forget with machine unlearning. In 2015 IEEE symposium on security and privacy, pages 463–480. IEEE, 2015

work page 2015
[5]

Eternal sunshine of the spotless net: Selective forgetting in deep networks

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9304–9312, 2020

work page 2020
[6]

Large language model unlearning.Advances in Neural Information Processing Systems, 37:105425–105475, 2024

Yuanshun Yao and Xiaojun Xu. Large language model unlearning.Advances in Neural Information Processing Systems, 37:105425–105475, 2024

work page 2024
[7]

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning

Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. Negative preference optimization: From catastrophic collapse to effective unlearning.arXiv preprint arXiv:2404.05868, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

2025 , journal =

Yuanhe Zhang, Fangzhou Xie, Zhenhong Zhou, Zherui Li, Hao Chen, Kun Wang, and Yufei Guo. Jailbreaking large language diffusion models: Revealing hidden safety flaws in diffusion-based text generation.arXiv preprint arXiv:2507.19227, 2025

work page arXiv 2025
[9]

2025 , journal =

Zherui Li, Zheng Nie, Zhenhong Zhou, Yue Liu, Yitong Zhang, Yu Cheng, Qingsong Wen, Kun Wang, Yufei Guo, and Jiaheng Zhang. Diffuguard: How intrinsic safety is lost and found in diffusion large language models.arXiv preprint arXiv:2509.24296, 2025

work page arXiv 2025
[10]

2025 , journal =

Wonje Jeung, Sangyeon Yoon, Yoonjun Cho, Dongjae Jeon, Sangwoo Shin, Hyesoo Hong, and Albert No. A2d: Any-order, any-step safety alignment for diffusion language models.arXiv preprint arXiv:2509.23286, 2025

work page arXiv 2025
[11]

TOFU: A Task of Fictitious Unlearning for LLMs

Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C Lipton, and J Zico Kolter. Tofu: A task of fictitious unlearning for llms.arXiv preprint arXiv:2401.06121, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Rwku: Benchmarking real-world knowledge unlearning for large language models.Advances in Neural Information Processing Systems, 37:98213–98263, 2024

Pengfei Cao, Chenhao Wang, Zhitao He, Hongbang Yuan, Jiachun Li, Yubo Chen, Kang Liu, Jun Zhao, et al. Rwku: Benchmarking real-world knowledge unlearning for large language models.Advances in Neural Information Processing Systems, 37:98213–98263, 2024

work page 2024
[13]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. pmlr, 2015

work page 2015
[14]

Argmax flows and multinomial diffusion: Learning categorical distributions.Advances in neural infor- mation processing systems, 34:12454–12465, 2021

Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, and Max Welling. Argmax flows and multinomial diffusion: Learning categorical distributions.Advances in neural infor- mation processing systems, 34:12454–12465, 2021

work page 2021
[15]

Structured denoising diffusion models in discrete state-spaces.Advances in neural information processing systems, 34:17981–17993, 2021

Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces.Advances in neural information processing systems, 34:17981–17993, 2021

work page 2021
[16]

A continuous time framework for discrete denoising models.Advances in Neural Information Processing Systems, 35:28266–28279, 2022

Andrew Campbell, Joe Benton, Valentin De Bortoli, Thomas Rainforth, George Deligiannidis, and Arnaud Doucet. A continuous time framework for discrete denoising models.Advances in Neural Information Processing Systems, 35:28266–28279, 2022

work page 2022
[17]

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution.arXiv preprint arXiv:2310.16834, 2023. 10

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

Simplified and generalized masked diffusion for discrete data.Advances in neural information processing systems, 37:103131–103167, 2024

Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, and Michalis Titsias. Simplified and generalized masked diffusion for discrete data.Advances in neural information processing systems, 37:103131–103167, 2024

work page 2024
[19]

Kakade, and Sitan Chen

Jaeyeon Kim, Kulin Shah, Vasilis Kontonis, Sham Kakade, and Sitan Chen. Train for the worst, plan for the best: Understanding token ordering in masked diffusions.arXiv preprint arXiv:2502.06768, 2025

work page arXiv 2025
[20]

Learning fair representa- tions

Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representa- tions. InInternational conference on machine learning, pages 325–333. PMLR, 2013

work page 2013
[21]

Learning to unlearn: Instance-wise unlearning for pre-trained classifiers

Sungmin Cha, Sungjun Cho, Dasol Hwang, Honglak Lee, Taesup Moon, and Moontae Lee. Learning to unlearn: Instance-wise unlearning for pre-trained classifiers. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 11186–11194, 2024

work page 2024
[22]

Towards un- bounded machine unlearning.Advances in neural information processing systems, 36:1957– 1987, 2023

Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards un- bounded machine unlearning.Advances in neural information processing systems, 36:1957– 1987, 2023

work page 1957
[23]

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Dennis Wei, and Sijia Liu. Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation.arXiv preprint arXiv:2310.12508, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

The wmdp benchmark: measuring and reducing malicious use with unlearning

Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D Li, Ann-Kathrin Dombrowski, Shashwat Goel, Gabriel Mukobi, et al. The wmdp benchmark: measuring and reducing malicious use with unlearning. InProceedings of the 41st International Conference on Machine Learning, pages 28525–28550, 2024

work page 2024
[25]

Erasing concepts from diffusion models

Rohit Gandikota, Joanna Materzynska, Jaden Fiotto-Kaufman, and David Bau. Erasing concepts from diffusion models. InProceedings of the IEEE/CVF international conference on computer vision, pages 2426–2436, 2023

work page 2023
[26]

Defensive unlearning with adversarial training for robust concept erasure in diffusion models.Advances in neural information processing systems, 37: 36748–36776, 2024

Yimeng Zhang, Xin Chen, Jinghan Jia, Yihua Zhang, Chongyu Fan, Jiancheng Liu, Mingyi Hong, Ke Ding, and Sijia Liu. Defensive unlearning with adversarial training for robust concept erasure in diffusion models.Advances in neural information processing systems, 37: 36748–36776, 2024

work page 2024
[27]

Stereo: A two-stage framework for adversarially robust concept erasing from text-to-image diffusion models

Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer, Vishal M Patel, and Karthik Nandaku- mar. Stereo: A two-stage framework for adversarially robust concept erasing from text-to-image diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23765–23774, 2025

work page 2025
[28]

Null-text inversion for editing real images using guided diffusion models

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6038–6047, 2023

work page 2023
[29]

Ddat: Diffusion policies enforcing dynamically admissible robot trajectories

Jean-Baptiste Bouvier, Kanghyun Ryu, Kartik Nagpal, Qiayuan Liao, Koushil Sreenath, and Negar Mehr. Ddat: Diffusion policies enforcing dynamically admissible robot trajectories. arXiv preprint arXiv:2502.15043, 2025

work page arXiv 2025
[30]

Contrastive flow matching

George Stoica, Vivek Ramanujan, Xiang Fan, Ali Farhadi, Ranjay Krishna, and Judy Hoffman. Contrastive flow matching. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1185–1194, 2025

work page 2025
[31]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022
[32]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[33]

Scaling up masked diffusion models on text

Shen Nie, Fengqi Zhu, Chao Du, Tianyu Pang, Qian Liu, Guangtao Zeng, Min Lin, and Chongxuan Li. Scaling up masked diffusion models on text. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum? id=WNvvwK0tut. 11

work page 2025
[34]

Detecting, explaining, and mitigating memorization in diffusion models

Yuxin Wen, Yuchen Liu, Chen Chen, and Lingjuan Lyu. Detecting, explaining, and mitigating memorization in diffusion models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[35]

Understanding and mitigating memorization in generative models via sharpness of probability landscapes.Proceedings of Machine Learning Research, 267:27091–27112, 2025

Dongjae Jeon, Dueun Kim, and Albert No. Understanding and mitigating memorization in generative models via sharpness of probability landscapes.Proceedings of Machine Learning Research, 267:27091–27112, 2025

work page 2025
[36]

Classifier-free guidance inside the attraction basin may cause memorization

Anubhav Jain, Yuya Kobayashi, Takashi Shibuya, Yuhta Takida, Nasir Memon, Julian Togelius, and Yuki Mitsufuji. Classifier-free guidance inside the attraction basin may cause memorization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12871–12879, 2025

work page 2025
[37]

A closer look at machine unlearning for large language models.arXiv preprint arXiv:2410.08109, 2024

Xiaojian Yuan, Tianyu Pang, Chao Du, Kejiang Chen, Weiming Zhang, and Min Lin. A closer look at machine unlearning for large language models.arXiv preprint arXiv:2410.08109, 2024

work page arXiv 2024
[38]

Diffusion language model knows the answer before it decodes

Pengxiang Li, Yefan Zhou, Dilxat Muhtar, Lu Yin, Shilin Yan, Li Shen, Yi Liang, Soroush V osoughi, and Shiwei Liu. Diffusion language model knows the answer before it decodes. In The Fourteenth International Conference on Learning Representations, 2026. URL https: //openreview.net/forum?id=g88nt4ieTG

work page 2026
[39]

Knowledge unlearning for mitigating privacy risks in language models

Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, and Minjoon Seo. Knowledge unlearning for mitigating privacy risks in language models. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14389–14408, 2023

work page 2023
[40]

Simplicity prevails: Rethinking negative preference optimization for llm unlearning.arXiv preprint arXiv:2410.07163, 2024

Chongyu Fan, Jiancheng Liu, Licong Lin, Jinghan Jia, Ruiqi Zhang, Song Mei, and Sijia Liu. Simplicity prevails: Rethinking negative preference optimization for llm unlearning.arXiv preprint arXiv:2410.07163, 2024

work page arXiv 2024
[41]

P., Zhou, Z., Shin, S., Han, B., and Weinberger, K

Qizhou Wang, Jin Peng Zhou, Zhanke Zhou, Saebyeol Shin, Bo Han, and Kilian Q Weinberger. Rethinking llm unlearning objectives: A gradient perspective and go beyond.arXiv preprint arXiv:2502.19301, 2025

work page arXiv 2025
[42]

AR⇒ MDLM

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems, 36:53728–53741, 2023. 12 A MDU Algorithm Algorithm 1MDU optimization step Require: Trainable MDLM θ with init parameters θ0...

work page 2023

[1] [1]

Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

Subham S Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

work page 2024

[2] [2]

Large Language Diffusion Models

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models.arXiv preprint arXiv:2502.09992, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

Dream 7B: Diffusion Large Language Models

Jiacheng Ye, Zhihui Xie, Lin Zheng, Jiahui Gao, Zirui Wu, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream 7b: Diffusion large language models.arXiv preprint arXiv:2508.15487, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

Towards making systems forget with machine unlearning

Yinzhi Cao and Junfeng Yang. Towards making systems forget with machine unlearning. In 2015 IEEE symposium on security and privacy, pages 463–480. IEEE, 2015

work page 2015

[5] [5]

Eternal sunshine of the spotless net: Selective forgetting in deep networks

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9304–9312, 2020

work page 2020

[6] [6]

Large language model unlearning.Advances in Neural Information Processing Systems, 37:105425–105475, 2024

Yuanshun Yao and Xiaojun Xu. Large language model unlearning.Advances in Neural Information Processing Systems, 37:105425–105475, 2024

work page 2024

[7] [7]

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning

Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. Negative preference optimization: From catastrophic collapse to effective unlearning.arXiv preprint arXiv:2404.05868, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [8]

2025 , journal =

Yuanhe Zhang, Fangzhou Xie, Zhenhong Zhou, Zherui Li, Hao Chen, Kun Wang, and Yufei Guo. Jailbreaking large language diffusion models: Revealing hidden safety flaws in diffusion-based text generation.arXiv preprint arXiv:2507.19227, 2025

work page arXiv 2025

[9] [9]

2025 , journal =

Zherui Li, Zheng Nie, Zhenhong Zhou, Yue Liu, Yitong Zhang, Yu Cheng, Qingsong Wen, Kun Wang, Yufei Guo, and Jiaheng Zhang. Diffuguard: How intrinsic safety is lost and found in diffusion large language models.arXiv preprint arXiv:2509.24296, 2025

work page arXiv 2025

[10] [10]

2025 , journal =

Wonje Jeung, Sangyeon Yoon, Yoonjun Cho, Dongjae Jeon, Sangwoo Shin, Hyesoo Hong, and Albert No. A2d: Any-order, any-step safety alignment for diffusion language models.arXiv preprint arXiv:2509.23286, 2025

work page arXiv 2025

[11] [11]

TOFU: A Task of Fictitious Unlearning for LLMs

Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C Lipton, and J Zico Kolter. Tofu: A task of fictitious unlearning for llms.arXiv preprint arXiv:2401.06121, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

Rwku: Benchmarking real-world knowledge unlearning for large language models.Advances in Neural Information Processing Systems, 37:98213–98263, 2024

Pengfei Cao, Chenhao Wang, Zhitao He, Hongbang Yuan, Jiachun Li, Yubo Chen, Kang Liu, Jun Zhao, et al. Rwku: Benchmarking real-world knowledge unlearning for large language models.Advances in Neural Information Processing Systems, 37:98213–98263, 2024

work page 2024

[13] [13]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. pmlr, 2015

work page 2015

[14] [14]

Argmax flows and multinomial diffusion: Learning categorical distributions.Advances in neural infor- mation processing systems, 34:12454–12465, 2021

Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, and Max Welling. Argmax flows and multinomial diffusion: Learning categorical distributions.Advances in neural infor- mation processing systems, 34:12454–12465, 2021

work page 2021

[15] [15]

Structured denoising diffusion models in discrete state-spaces.Advances in neural information processing systems, 34:17981–17993, 2021

Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces.Advances in neural information processing systems, 34:17981–17993, 2021

work page 2021

[16] [16]

A continuous time framework for discrete denoising models.Advances in Neural Information Processing Systems, 35:28266–28279, 2022

Andrew Campbell, Joe Benton, Valentin De Bortoli, Thomas Rainforth, George Deligiannidis, and Arnaud Doucet. A continuous time framework for discrete denoising models.Advances in Neural Information Processing Systems, 35:28266–28279, 2022

work page 2022

[17] [17]

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution.arXiv preprint arXiv:2310.16834, 2023. 10

work page internal anchor Pith review Pith/arXiv arXiv 2023

[18] [18]

Simplified and generalized masked diffusion for discrete data.Advances in neural information processing systems, 37:103131–103167, 2024

Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, and Michalis Titsias. Simplified and generalized masked diffusion for discrete data.Advances in neural information processing systems, 37:103131–103167, 2024

work page 2024

[19] [19]

Kakade, and Sitan Chen

Jaeyeon Kim, Kulin Shah, Vasilis Kontonis, Sham Kakade, and Sitan Chen. Train for the worst, plan for the best: Understanding token ordering in masked diffusions.arXiv preprint arXiv:2502.06768, 2025

work page arXiv 2025

[20] [20]

Learning fair representa- tions

Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representa- tions. InInternational conference on machine learning, pages 325–333. PMLR, 2013

work page 2013

[21] [21]

Learning to unlearn: Instance-wise unlearning for pre-trained classifiers

Sungmin Cha, Sungjun Cho, Dasol Hwang, Honglak Lee, Taesup Moon, and Moontae Lee. Learning to unlearn: Instance-wise unlearning for pre-trained classifiers. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 11186–11194, 2024

work page 2024

[22] [22]

Towards un- bounded machine unlearning.Advances in neural information processing systems, 36:1957– 1987, 2023

Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards un- bounded machine unlearning.Advances in neural information processing systems, 36:1957– 1987, 2023

work page 1957

[23] [23]

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Dennis Wei, and Sijia Liu. Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation.arXiv preprint arXiv:2310.12508, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[24] [24]

The wmdp benchmark: measuring and reducing malicious use with unlearning

Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D Li, Ann-Kathrin Dombrowski, Shashwat Goel, Gabriel Mukobi, et al. The wmdp benchmark: measuring and reducing malicious use with unlearning. InProceedings of the 41st International Conference on Machine Learning, pages 28525–28550, 2024

work page 2024

[25] [25]

Erasing concepts from diffusion models

Rohit Gandikota, Joanna Materzynska, Jaden Fiotto-Kaufman, and David Bau. Erasing concepts from diffusion models. InProceedings of the IEEE/CVF international conference on computer vision, pages 2426–2436, 2023

work page 2023

[26] [26]

Defensive unlearning with adversarial training for robust concept erasure in diffusion models.Advances in neural information processing systems, 37: 36748–36776, 2024

Yimeng Zhang, Xin Chen, Jinghan Jia, Yihua Zhang, Chongyu Fan, Jiancheng Liu, Mingyi Hong, Ke Ding, and Sijia Liu. Defensive unlearning with adversarial training for robust concept erasure in diffusion models.Advances in neural information processing systems, 37: 36748–36776, 2024

work page 2024

[27] [27]

Stereo: A two-stage framework for adversarially robust concept erasing from text-to-image diffusion models

Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer, Vishal M Patel, and Karthik Nandaku- mar. Stereo: A two-stage framework for adversarially robust concept erasing from text-to-image diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23765–23774, 2025

work page 2025

[28] [28]

Null-text inversion for editing real images using guided diffusion models

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6038–6047, 2023

work page 2023

[29] [29]

Ddat: Diffusion policies enforcing dynamically admissible robot trajectories

Jean-Baptiste Bouvier, Kanghyun Ryu, Kartik Nagpal, Qiayuan Liao, Koushil Sreenath, and Negar Mehr. Ddat: Diffusion policies enforcing dynamically admissible robot trajectories. arXiv preprint arXiv:2502.15043, 2025

work page arXiv 2025

[30] [30]

Contrastive flow matching

George Stoica, Vivek Ramanujan, Xiang Fan, Ali Farhadi, Ranjay Krishna, and Judy Hoffman. Contrastive flow matching. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1185–1194, 2025

work page 2025

[31] [31]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022

[32] [32]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[33] [33]

Scaling up masked diffusion models on text

Shen Nie, Fengqi Zhu, Chao Du, Tianyu Pang, Qian Liu, Guangtao Zeng, Min Lin, and Chongxuan Li. Scaling up masked diffusion models on text. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum? id=WNvvwK0tut. 11

work page 2025

[34] [34]

Detecting, explaining, and mitigating memorization in diffusion models

Yuxin Wen, Yuchen Liu, Chen Chen, and Lingjuan Lyu. Detecting, explaining, and mitigating memorization in diffusion models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[35] [35]

Understanding and mitigating memorization in generative models via sharpness of probability landscapes.Proceedings of Machine Learning Research, 267:27091–27112, 2025

Dongjae Jeon, Dueun Kim, and Albert No. Understanding and mitigating memorization in generative models via sharpness of probability landscapes.Proceedings of Machine Learning Research, 267:27091–27112, 2025

work page 2025

[36] [36]

Classifier-free guidance inside the attraction basin may cause memorization

Anubhav Jain, Yuya Kobayashi, Takashi Shibuya, Yuhta Takida, Nasir Memon, Julian Togelius, and Yuki Mitsufuji. Classifier-free guidance inside the attraction basin may cause memorization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12871–12879, 2025

work page 2025

[37] [37]

A closer look at machine unlearning for large language models.arXiv preprint arXiv:2410.08109, 2024

Xiaojian Yuan, Tianyu Pang, Chao Du, Kejiang Chen, Weiming Zhang, and Min Lin. A closer look at machine unlearning for large language models.arXiv preprint arXiv:2410.08109, 2024

work page arXiv 2024

[38] [38]

Diffusion language model knows the answer before it decodes

Pengxiang Li, Yefan Zhou, Dilxat Muhtar, Lu Yin, Shilin Yan, Li Shen, Yi Liang, Soroush V osoughi, and Shiwei Liu. Diffusion language model knows the answer before it decodes. In The Fourteenth International Conference on Learning Representations, 2026. URL https: //openreview.net/forum?id=g88nt4ieTG

work page 2026

[39] [39]

Knowledge unlearning for mitigating privacy risks in language models

Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, and Minjoon Seo. Knowledge unlearning for mitigating privacy risks in language models. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14389–14408, 2023

work page 2023

[40] [40]

Simplicity prevails: Rethinking negative preference optimization for llm unlearning.arXiv preprint arXiv:2410.07163, 2024

Chongyu Fan, Jiancheng Liu, Licong Lin, Jinghan Jia, Ruiqi Zhang, Song Mei, and Sijia Liu. Simplicity prevails: Rethinking negative preference optimization for llm unlearning.arXiv preprint arXiv:2410.07163, 2024

work page arXiv 2024

[41] [41]

P., Zhou, Z., Shin, S., Han, B., and Weinberger, K

Qizhou Wang, Jin Peng Zhou, Zhanke Zhou, Saebyeol Shin, Bo Han, and Kilian Q Weinberger. Rethinking llm unlearning objectives: A gradient perspective and go beyond.arXiv preprint arXiv:2502.19301, 2025

work page arXiv 2025

[42] [42]

AR⇒ MDLM

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems, 36:53728–53741, 2023. 12 A MDU Algorithm Algorithm 1MDU optimization step Require: Trainable MDLM θ with init parameters θ0...

work page 2023