Training data attribution in diffusion models via mirrored unlearning and noise-consistent skew

Dipam Goswami; Fabio Morreale; Joan Serr\`a; Wei-Hsiang Liao; Yuki Mitsufuji

arxiv: 2605.17938 · v1 · pith:OFNTJFKInew · submitted 2026-05-18 · 💻 cs.LG · cs.AI· stat.ML

Training data attribution in diffusion models via mirrored unlearning and noise-consistent skew

Joan Serr\`a , Dipam Goswami , Fabio Morreale , Wei-Hsiang Liao , Yuki Mitsufuji This is my paper

Pith reviewed 2026-05-20 13:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords training data attributiondiffusion modelsmirrored unlearningnoise-consistent skewgenerative model interpretabilitydata influenceunlearningmodel comparison

0 comments

The pith

Mirrored unlearning and noise-consistent skew provide a reliable method for training data attribution in diffusion models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes MUCS, a technique for training data attribution in diffusion models that fine-tunes a second model using bounded mirrored gradient ascent and measures the normalized skew relative to the original model with consistent noise samples. The goal is to determine which training instances most influenced a given generated output. Sympathetic readers would value this because current TDA methods are not reliable enough for practical use in understanding or controlling generative models. MUCS is shown to outperform existing methods substantially across three datasets. The work also investigates the impact of design choices and explores overlaps in influential instances as well as ensembling strategies.

Core claim

The paper claims that performing bounded mirrored gradient ascent to create a fine-tuned model and then computing the normalized skew of this model against the original using consistent noise samples identifies the most influential training data for diffusion model generations more effectively than prior approaches.

What carries the argument

The central mechanism is mirrored unlearning through bounded gradient ascent on a duplicate model combined with normalized skew measurement on consistent noise samples to isolate training data influence.

If this is right

More reliable attribution supports interpretability and downstream tasks like removing unwanted data influences from trained models.
Systematic outperformance on multiple datasets suggests the method captures true influence signals effectively.
Studying overlaps of influential instances across generated items reveals patterns in how training data affects outputs.
Ensembling TDA approaches offers a path to even greater robustness.
Insights from the unlearning component may apply to general machine unlearning scenarios in generative models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could extend to attributing influences in other stochastic generative processes beyond diffusion.
Consistent noise might be useful for comparing models in other noisy training regimes to reduce variance in comparisons.
The large margins indicate that addressing both the unlearning direction and noise consistency tackles core limitations in previous TDA methods.
Potential for use in auditing training data contributions in deployed AI systems.

Load-bearing premise

The assumption that fine-tuning via bounded mirrored gradient ascent followed by normalized skew measurement on consistent noise samples reliably identifies influential training instances rather than capturing unrelated model differences.

What would settle it

An experiment that removes the highest-attributed training samples from the dataset, retrains the diffusion model, and checks if the corresponding generations are significantly altered would falsify the method if no such change occurs.

Figures

Figures reproduced from arXiv: 2605.17938 by Dipam Goswami, Fabio Morreale, Joan Serr\`a, Wei-Hsiang Liao, Yuki Mitsufuji.

**Figure 2.** Figure 2: Examples of the distributions of similarities between [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Reference (first row) and regenerated (other rows) images for the considered approaches on [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Reference (first row) and regenerated (other rows) images for the considered approaches on [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: Reference (first row) and regenerated (other rows) images for the considered approaches on [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

read the original abstract

Training data attribution (TDA) should enable generative model interpretability and foster a variety of related downstream tasks. Nonetheless, current TDA approaches lack reliability and robustness, preventing their adoption in real-world setups. In this paper, we take a decisive step towards more reliable and robust TDA for diffusion models. We propose to perform TDA with mirrored unlearning and noise-consistent skew (MUCS). The idea is to fine-tune a second model with bounded mirrored gradient ascent, and to measure the normalized skew of this model with respect to the original one using consistent noise samples. We show that, while being conceptually simple and generic, MUCS systematically outperforms existing methods on three different datasets by a large margin. We additionally study the effect that core design choices have on final performance, and analyze novel aspects regarding the overlap of influential instances across generated items and the potential of ensembling TDA approaches. We believe that our findings may have broader implications for more general unlearning setups, as well as for tasks requiring the comparison of diffusion losses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MUCS gives a workable new recipe for training data attribution in diffusion models but the gains may still be tied more to fine-tuning quirks than to clean isolation of data influence.

read the letter

The paper's core contribution is MUCS: fine-tune a copy of the diffusion model with bounded mirrored gradient ascent, then score training examples by the normalized skew in loss on the exact same noise samples. This is presented as a straightforward, generic way to do TDA that beats prior methods by a wide margin on three datasets while also checking overlap of influential points and the value of ensembling different attribution scores. Those extra analyses are useful and show the authors thinking beyond just beating baselines. The approach is conceptually clean and the reported improvements look substantial enough to warrant attention from people building unlearning or auditing tools for generative models. That said, the central assumption still feels under-tested. Mirrored ascent plus skew on fixed noise could be picking up generic optimization differences or sensitivity to the particular ascent trajectory rather than the specific contribution of each training example. The abstract mentions ablations on design choices, but without tighter controls that directly tie the skew difference back to the removed instance, the large-margin wins could partly reflect better capture of unlearning dynamics instead of better attribution. The work is aimed at researchers who need practical TDA for diffusion models and downstream tasks like data auditing or fairness checks. It is coherent on its own terms and shows honest engagement with the problem, so it deserves a serious referee even if the experiments will need strengthening on the causal link. I would send it out for review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes MUCS for training data attribution in diffusion models: a second model is fine-tuned via bounded mirrored gradient ascent, after which the normalized skew relative to the original model is measured on identical noise samples. The central claim is that this procedure is conceptually simple yet systematically outperforms prior TDA methods by a large margin on three datasets; the authors further examine design-choice ablations, overlap of influential instances across generated items, and ensembling potential, with suggested implications for unlearning and diffusion-loss comparison.

Significance. A validated method that reliably isolates training-instance influence in diffusion models would advance interpretability and downstream tasks such as targeted unlearning. The reported large-margin gains across datasets and the analysis of overlap/ensembling are potentially useful if the skew statistic is shown to be causally linked to specific data removal rather than generic fine-tuning divergence.

major comments (2)

[Abstract and method description] The load-bearing assumption that bounded mirrored gradient ascent plus normalized skew on consistent noise isolates data influence (rather than measuring unrelated optimization artifacts) is not adequately tested. No controls or counterexamples are described that would demonstrate the mirroring operation inverts only the contribution of a removed training example; residual asymmetry in the ascent or sensitivity to the particular noise trajectory could produce high scores for non-influential points. This directly undermines the claim of systematic outperformance.
[§4] §4 (empirical evaluation): the reported large-margin gains on three datasets and the ablations on design choices lack statistical significance tests, exact baseline reproduction details, and causal validation experiments (e.g., synthetic data where ground-truth influence is known). Without these, it is unclear whether the skew difference is tied to the removed instance or to the unlearning dynamics themselves.

minor comments (2)

[Method] Clarify the precise mathematical definition of 'normalized skew' and the sampling procedure for 'consistent noise samples' at the first appearance in the method section.
[Related work] Add a short paragraph contrasting MUCS with recent TDA approaches for generative models that also use gradient or loss-based signals.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their insightful comments on our manuscript. We provide point-by-point responses to the major comments below and outline the revisions we plan to make.

read point-by-point responses

Referee: [Abstract and method description] The load-bearing assumption that bounded mirrored gradient ascent plus normalized skew on consistent noise isolates data influence (rather than measuring unrelated optimization artifacts) is not adequately tested. No controls or counterexamples are described that would demonstrate the mirroring operation inverts only the contribution of a removed training example; residual asymmetry in the ascent or sensitivity to the particular noise trajectory could produce high scores for non-influential points. This directly undermines the claim of systematic outperformance.

Authors: The mirroring in MUCS is specifically constructed to reverse the effect of including the training instance in the optimization, using bounded ascent to prevent divergence. The consistent noise samples ensure that the measured skew reflects differences in how the model processes the same input trajectory, which should be attributable to the unlearning of that instance. While we did not present explicit counterexamples in the initial submission, the systematic outperformance and design ablations suggest the effect is not merely an artifact. We will add control experiments, such as testing on held-out non-training data, to the revised manuscript to further validate the isolation of influence. revision: yes
Referee: [§4] §4 (empirical evaluation): the reported large-margin gains on three datasets and the ablations on design choices lack statistical significance tests, exact baseline reproduction details, and causal validation experiments (e.g., synthetic data where ground-truth influence is known). Without these, it is unclear whether the skew difference is tied to the removed instance or to the unlearning dynamics themselves.

Authors: We agree that including statistical significance tests will strengthen the empirical claims, and we will incorporate them (e.g., paired t-tests across multiple runs) in the revision. We will also provide more detailed information on baseline implementations in the supplementary material to facilitate exact reproduction. For causal validation, experiments with synthetic data and known ground-truth influences would be valuable but present significant challenges in the context of diffusion models, where influence is inherently probabilistic and high-dimensional. Our multi-dataset evaluation and overlap analysis provide supporting evidence for the method's validity. revision: partial

standing simulated objections not resolved

Causal validation experiments with synthetic data where ground-truth influence is known, due to the difficulty in constructing such controlled synthetic settings for complex diffusion models.

Circularity Check

0 steps flagged

No circularity: MUCS is an empirical TDA proposal evaluated on external datasets without self-referential reduction

full rationale

The paper proposes MUCS as a practical method: fine-tune a second model via bounded mirrored gradient ascent then compute normalized skew on identical noise samples. Central claims rest on empirical outperformance versus baselines across three datasets plus ablations, not on any derivation that reduces the skew metric or attribution score to a fitted quantity defined by the method itself. No self-citation load-bearing uniqueness theorem, no ansatz smuggled via prior work, and no renaming of known results as new organization. The procedure is presented as conceptually simple and generic; performance is measured against independent existing methods. This is a standard empirical contribution whose validity can be checked externally, yielding no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so concrete free parameters, axioms, or invented entities cannot be extracted. The method introduces concepts of 'mirrored unlearning' and 'noise-consistent skew' whose precise definitions and any associated hyperparameters remain unspecified.

pith-pipeline@v0.9.0 · 5728 in / 1092 out tokens · 47055 ms · 2026-05-20T13:15:15.536058+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

fine-tune a second model with bounded mirrored gradient ascent, and to measure the normalized skew of this model with respect to the original one using consistent noise samples
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we employ gradient ascent as a regularization to fine-tuning with training data

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 1 internal anchor

[1]

Hammoudeh and D

Z. Hammoudeh and D. Lowd. Training data influence analysis and estimation: a survey. Machine Learning, 113:2351–2403, 2024

work page 2024
[2]

J. Deng, Y . Hu, P. Hu, T.-W. Li, S. Liu, J. T. Wang, D. Ley, Q. Dai, B. Huang, J. Huang, C. Jiao, H. A. Just, Y . Pan, J. Shen, Y . Tu, W. Wang, X. Wang, S. Zhang, S. Zhang, R. Jia, H. Lakkaraju, H. Peng, W. Tang, C. Xiong, J. Zhao, H. Tong, H. Zhao, and Jiaqi W Ma. A survey of data attribution: methods, applications, and evaluation in the era of generat...

work page 2025
[3]

Georgiev, J

K. Georgiev, J. Vendrow, H. Salman, S. M. Park, and A. Madry. The journey, not the destination: how data guides diffusion models. InProc. of the ICML Workshop on Challenges in Deployable Generative AI, 2023

work page 2023
[4]

Zheng, T

X. Zheng, T. Pang, C. Du, J. Jiang, and M. Lin. Intriguing properties of data attribution on diffusion models. InProc. of the Int. Conf. on Learning Representations (ICLR), 2024

work page 2024
[5]

J. Lin, L. Tao, M. Dong, and C. Xu. Diffusion attribution score: evaluating training data influence in diffusion models. InProc. of the Int. Conf. on Learning Representations (ICLR), 2024

work page 2024
[6]

W. Sun, H. Liu, N. Kandpal, C. Raffel, and Y . Yang. Enhancing training data attribution with representational optimization. InAdvances in Neural Information Processing Systems (NeurIPS), page in press. 2025

work page 2025
[7]

M. Ko, F. Kang, W. Shi, M. Jin, Z. Yu, and R. Jia. The mirrored influence hypothesis: efficient data influence estimation by harnessing forward passes. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 26286–26295, 2024

work page 2024
[8]

S.-Y . Wang, A. Hertzmann, A. A. Efros, J.-Y . Zhu, and R. Zhang. Data attribution for text-to- image models by unlearning synthesized images. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, pages 4235–4266. 2024

work page 2024
[9]

Deng and J

J. Deng and J. Ma. Computational copyright: towards a royalty model for music generative AI. InICLR Workshop on Navigating and Addressing Problems for F oundation Models (DPFM), 2024

work page 2024
[10]

Morreale, W

F. Morreale, W. Hutiri, J. Serrà, A. Xiang, and Y . Mitsufuji. Attribution-by-design: ensuring inference-time provenance in generative music systems.ArXiv: 2510.08062, 2025

work page arXiv 2025
[11]

W. Kim, H. Wi, S. Park, T. Kim, S. Keum, K. Kim, T. Kim, J. Jung, T. Kim, G. Guerrero, M. Le Goff, J. Po, D. Moon, J. Nam, and J. Lee. From generation to attribution: music AI agent architectures for the post-streaming era. InProc. of the AI for Music Workshop at NeurIPS25 (AI4Music), 2025

work page 2025
[12]

C.-H. Lai, Y . Song, D. Kim, Y . Mitsufuji, and S. Ermon. The principles of diffusion models. ArXiv: 2510.21890, 2025. 10

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Y . Zhao, C. Du, X. Zheng, T. Pang, and M. Lin. Nonparametric data attribution for diffusion models.ArXiv: 2510.14269, 2025

work page arXiv 2025
[14]

W. Choi, J. Koo, K. Cheuk, J. Serrà, M. A. Martínez-Ramírez, Y . Ikemiya, N. Murata, Y . Takida, W.-H. Liao, and Y . Mitsufuji. Large-scale training data attribution for music generative models via unlearning. InAdvances in Neural Information Processing Systems (NeurIPS), Creative AI Track, page in press. 2025

work page 2025
[15]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems (NeurIPS), pages 6840–6851. 2020

work page 2020
[16]

Karras, M

T. Karras, M. Aittala, T. Aila, and S. Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems (NeurIPS), pages 26565–26577. 2022

work page 2022
[17]

Kumari, B

N. Kumari, B. Zhang, R. Zhang, E. Shechtman, and J.-Y . Zhu. Multi-concept customization of text-to-image diffusion. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 1931–1941, 2023

work page 1931
[18]

Kumari, B

N. Kumari, B. Zhang, S.-Y . Wang, E. Shechtman, R. Zhang, and J.-Y . Zhu. Ablating concepts in text-to-image diffusion models. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 22691–22702, 2023

work page 2023
[19]

Bertrand, A

Q. Bertrand, A. Gagneux, M. Massias, and R. Emonet. On the closed-form of flow matching: generalization does not arise from target stochasticity. InAdvances in Neural Information Processing Systems (NeurIPS), number in press. 2025

work page 2025
[20]

Kirkpatrick, R

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell. Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017

work page 2017
[21]

P. W. Koh and P. Liang. Understanding black-box predictions via influence functions. InProc. of the Int. Conf. on Machine Learning (ICML), pages 1885–1894, 2017

work page 2017
[22]

S. M. Park, K. Georgiev, A. Ilyas, G. Leclerc, and A. Madry. TRAK: attributing model behavior at scale. InProc. of the Int. Conf. on Machine Learning (ICML), pages 27074–27113, 2023

work page 2023
[23]

Zheng Dai and David K. Gifford. Ablation based counterfactuals.ArXiv: 2406.07908, 2024

work page arXiv 2024
[24]

Brokman, O

J. Brokman, O. Hofman, R. Vainshtein, A. Giloni, T. Shimizu, I. Singh, O. Rachmil, A. Zolfi, A. Shabtai, Y . Unno, and H. Kojima. MONTRAGE: monitoring training for attribution of generative diffusion models. InProc. of the European Conf. on Computer Vision (ECCV), pages 1–17, 2024

work page 2024
[25]

Mlodozeniec, I

B. Mlodozeniec, I. Reid, S. Power, D. Krueger, M. Erdogdu, R. E. Turner, and R. Grosse. Distributional training data attribution.ArXiv: 2506.12965, 2025

work page arXiv 2025
[26]

S.-Y . Wang, A. Hertzmann, A. A. Efros, R. Zhang, and J.-Y . Zhu. Fast data attribution for text-to-image models.ArXiv: 2511.10721, 2025

work page arXiv 2025
[27]

Alberti, K

A. Alberti, K. Hasanaliyev, M. Shah, and S. Ermon. Data unlearning in diffusion models. In Proc. of the Int. Conf. on Learning Representations (ICLR), 2025

work page 2025
[28]

Heng and H

A. Heng and H. Soh. Selective amnesia: a continual learning approach to forgetting in deep generative models. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, pages 17170–17194. 2023

work page 2023
[29]

Golatkar, A

A. Golatkar, A. Achille, and S. Soatto. Eternal sunshine of the spotless net: selective forgetting in deep networks. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 9301–9309, 2020

work page 2020
[30]

Gandikota, J

R. Gandikota, J. Materzy´nska, J. Fiotto-Kaufman, and D. Bau. Erasing concepts from diffusion models. InProc. of the IEEE/CVF Int. Conf. on Computer Vision (ICCV), pages 2426–2436, 2023. 11

work page 2023
[31]

J. Wu, T. Le, M. Hayat, and M. Harandi. Erasing undesirable influence in diffusion models. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 28263– 28273, 2025

work page 2025
[32]

Zhang, X

Y . Zhang, X. Chen, J. Jia, Y . Zhang, C. Fan, J. Liu, M. Hong, K. Ding, and S. Liu. Defensive unlearning with adversarial training for robust concept erasure in diffusion models. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, pages 36748–36776. 2024

work page 2024
[33]

Q. Shi, C. Jin, J. Zhang, and Y . Gu. ReTrack: data unlearning in diffusion models through redirecting the denoising trajectory.ArXiv: 2509.13007, 2025

work page arXiv 2025
[34]

Ho and T

J. Ho and T. Salimans. Classifier-free diffusion guidance. InNeurIPS Workshop on Deep Generative Models and Downstream Applications (DGMs and Applications), 2021

work page 2021
[35]

Liu and P

B. Liu and P. Stone. Continual learning and private unlearning. InProc. of the Conf. on Lifelong Learning Agents (CoLLAs), pages 243–254, 2022

work page 2022
[36]

Zhang, L

R. Zhang, L. Lin, Y . Bai, and S. Mei. Negative preference optimization: from catastrophic collapse to effective unlearning. InProc. of the Conf. on Language Modeling (COLM), 2024

work page 2024
[37]

A. K. Tarun, V . S. Chundawat, M. Mandal, and M. Kankanhalli. Fast yet effective machine unlearning.IEEE Transactions on Neural Networks and Learning Systems, 35(9):13046–13055, 2024

work page 2024
[38]

Kurmanji, P

M. Kurmanji, P. Triantafillou, J. Hayes, and E. Triantafillou. Towards unbounded machine unlearning. InAdvances in Neural Information Processing Systems (NeurIPS), pages 1957–1987. 2023

work page 1957
[39]

A. K. Veldanda, S.-X. Zhang, A. Das, S. Chakraborty, S. Rawls, S. Sahu, and M. Naphade. LLM surgery: efficient knowledge unlearning and editing in large language models.ArXiv: 2409.13054, 2024

work page arXiv 2024
[40]

J. Ren, Z. Dai, X. Tang, H. Liu, J. Zeng, Z. Li, R. Goutam, S. Wang, Y . Xing, Q. He, and H. Liu. A general framework to enhance fine-tuning-based LLM unlearning.Findings of the Association for Computational Linguistics (ACL), pages 18464–18476, 2025

work page 2025
[41]

Krizhevsky

A. Krizhevsky. Learning multiple layers of features from tiny images.Technical Report, 2009

work page 2009
[42]

P. Liao, X. Li, X. Liu, and K. Keutzer. The ArtBench dataset: benchmarking generative models with artworks.ArXiv: 2206.11404, 2022

work page arXiv 2022
[43]

H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. From captions to visual concepts and back. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 1473–1482, 2015

work page 2015
[44]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. InProc. of the Int. Conf. on Machine Learning (ICML), pages 8748–8763, 2021

work page 2021
[45]

Peebles and S

W. Peebles and S. Xie. Scalable diffusion models with transformers. InProc. of the IEEE Int. Conf. on Computer Vision (ICCV), pages 4195–4205, 2023

work page 2023
[46]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. PyTorch: an imperative style, high-performance deep learning library. InAdvances in Neural Information Processing Syst...

work page 2019
[47]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled weight decay regularization. InProc. of the Int. Conf. on Learning Representations (ICLR), 2019. 12

work page 2019
[48]

Darcet, T

M Oquab, T. Darcet, T. Moutakanni, H. Y . V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y . Huang, S.-W. Li, I. Misra, M. Rabbat, V . Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski. DINOv2: learning robust visual features without su...

work page 2023
[49]

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004

work page 2004
[50]

Pizzi, S

E. Pizzi, S. D. Roy, S. N. Ravindra, P. Goyal, and M. Douze. A self-supervised descriptor for image copy detection. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 14532–14542, 2022

work page 2022
[51]

Zhang, P

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 586–595, 2018

work page 2018
[52]

A. P. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms.Pattern Recognition, 30(7):1145–1159, 1997

work page 1997
[53]

S. J. Mason and N. E. Graham. Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: statistical significance and interpretation.Quarterly Journal of the Royal Meteorological Society, 128(584):2145–2166, 2002

work page 2002
[54]

Radovanovi´c, A

M. Radovanovi´c, A. Nanopoulos, and M. Ivanovi´c. Hubs in space: popular nearest neighbors in high-dimensional data.Journal of Machine Learning Research, 11:2487–2531, 2010

work page 2010
[55]

Pearl.Causality

J. Pearl.Causality. Cambridge University Press, 2nd edition, 2013

work page 2013
[56]

J. R. Epifano, R. P. Ramachandran, A. J. Masino, and G. Rasool. Revisiting the fragility of influence functions.Neural Networks, 162:581–588, 2023

work page 2023
[57]

Y . Hu, P. Hu, H. Zhao, and J. W. Ma. Most influential subset selection: challenges, promises, and beyond. InAdvances in Neural Information Processing Systems (NeurIPS), 37, pages 119778–119810. 2024

work page 2024
[58]

K. K and A. Søgaard. Revisiting methods for finding influential examples.ArXiv: 2111.04683, 2021

work page arXiv 2021
[59]

Darcet, M

T. Darcet, M. Oquab, J. Mairal, and P. Bojanowski. Vision transformers need registers. InProc. of the Int. Conf. on Learning Representations (ICLR), 2023. 13 A Supplementary Methodology A.1 Pseudo-Code Below we provide a Python-style pseudo-code of MUCS. Variables follow the notation in the main text (Sec. 2). The full code is released at[Link will be ava...

work page 2023

[1] [1]

Hammoudeh and D

Z. Hammoudeh and D. Lowd. Training data influence analysis and estimation: a survey. Machine Learning, 113:2351–2403, 2024

work page 2024

[2] [2]

J. Deng, Y . Hu, P. Hu, T.-W. Li, S. Liu, J. T. Wang, D. Ley, Q. Dai, B. Huang, J. Huang, C. Jiao, H. A. Just, Y . Pan, J. Shen, Y . Tu, W. Wang, X. Wang, S. Zhang, S. Zhang, R. Jia, H. Lakkaraju, H. Peng, W. Tang, C. Xiong, J. Zhao, H. Tong, H. Zhao, and Jiaqi W Ma. A survey of data attribution: methods, applications, and evaluation in the era of generat...

work page 2025

[3] [3]

Georgiev, J

K. Georgiev, J. Vendrow, H. Salman, S. M. Park, and A. Madry. The journey, not the destination: how data guides diffusion models. InProc. of the ICML Workshop on Challenges in Deployable Generative AI, 2023

work page 2023

[4] [4]

Zheng, T

X. Zheng, T. Pang, C. Du, J. Jiang, and M. Lin. Intriguing properties of data attribution on diffusion models. InProc. of the Int. Conf. on Learning Representations (ICLR), 2024

work page 2024

[5] [5]

J. Lin, L. Tao, M. Dong, and C. Xu. Diffusion attribution score: evaluating training data influence in diffusion models. InProc. of the Int. Conf. on Learning Representations (ICLR), 2024

work page 2024

[6] [6]

W. Sun, H. Liu, N. Kandpal, C. Raffel, and Y . Yang. Enhancing training data attribution with representational optimization. InAdvances in Neural Information Processing Systems (NeurIPS), page in press. 2025

work page 2025

[7] [7]

M. Ko, F. Kang, W. Shi, M. Jin, Z. Yu, and R. Jia. The mirrored influence hypothesis: efficient data influence estimation by harnessing forward passes. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 26286–26295, 2024

work page 2024

[8] [8]

S.-Y . Wang, A. Hertzmann, A. A. Efros, J.-Y . Zhu, and R. Zhang. Data attribution for text-to- image models by unlearning synthesized images. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, pages 4235–4266. 2024

work page 2024

[9] [9]

Deng and J

J. Deng and J. Ma. Computational copyright: towards a royalty model for music generative AI. InICLR Workshop on Navigating and Addressing Problems for F oundation Models (DPFM), 2024

work page 2024

[10] [10]

Morreale, W

F. Morreale, W. Hutiri, J. Serrà, A. Xiang, and Y . Mitsufuji. Attribution-by-design: ensuring inference-time provenance in generative music systems.ArXiv: 2510.08062, 2025

work page arXiv 2025

[11] [11]

W. Kim, H. Wi, S. Park, T. Kim, S. Keum, K. Kim, T. Kim, J. Jung, T. Kim, G. Guerrero, M. Le Goff, J. Po, D. Moon, J. Nam, and J. Lee. From generation to attribution: music AI agent architectures for the post-streaming era. InProc. of the AI for Music Workshop at NeurIPS25 (AI4Music), 2025

work page 2025

[12] [12]

C.-H. Lai, Y . Song, D. Kim, Y . Mitsufuji, and S. Ermon. The principles of diffusion models. ArXiv: 2510.21890, 2025. 10

work page internal anchor Pith review Pith/arXiv arXiv 2025

[13] [13]

Y . Zhao, C. Du, X. Zheng, T. Pang, and M. Lin. Nonparametric data attribution for diffusion models.ArXiv: 2510.14269, 2025

work page arXiv 2025

[14] [14]

W. Choi, J. Koo, K. Cheuk, J. Serrà, M. A. Martínez-Ramírez, Y . Ikemiya, N. Murata, Y . Takida, W.-H. Liao, and Y . Mitsufuji. Large-scale training data attribution for music generative models via unlearning. InAdvances in Neural Information Processing Systems (NeurIPS), Creative AI Track, page in press. 2025

work page 2025

[15] [15]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems (NeurIPS), pages 6840–6851. 2020

work page 2020

[16] [16]

Karras, M

T. Karras, M. Aittala, T. Aila, and S. Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems (NeurIPS), pages 26565–26577. 2022

work page 2022

[17] [17]

Kumari, B

N. Kumari, B. Zhang, R. Zhang, E. Shechtman, and J.-Y . Zhu. Multi-concept customization of text-to-image diffusion. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 1931–1941, 2023

work page 1931

[18] [18]

Kumari, B

N. Kumari, B. Zhang, S.-Y . Wang, E. Shechtman, R. Zhang, and J.-Y . Zhu. Ablating concepts in text-to-image diffusion models. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 22691–22702, 2023

work page 2023

[19] [19]

Bertrand, A

Q. Bertrand, A. Gagneux, M. Massias, and R. Emonet. On the closed-form of flow matching: generalization does not arise from target stochasticity. InAdvances in Neural Information Processing Systems (NeurIPS), number in press. 2025

work page 2025

[20] [20]

Kirkpatrick, R

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell. Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017

work page 2017

[21] [21]

P. W. Koh and P. Liang. Understanding black-box predictions via influence functions. InProc. of the Int. Conf. on Machine Learning (ICML), pages 1885–1894, 2017

work page 2017

[22] [22]

S. M. Park, K. Georgiev, A. Ilyas, G. Leclerc, and A. Madry. TRAK: attributing model behavior at scale. InProc. of the Int. Conf. on Machine Learning (ICML), pages 27074–27113, 2023

work page 2023

[23] [23]

Zheng Dai and David K. Gifford. Ablation based counterfactuals.ArXiv: 2406.07908, 2024

work page arXiv 2024

[24] [24]

Brokman, O

J. Brokman, O. Hofman, R. Vainshtein, A. Giloni, T. Shimizu, I. Singh, O. Rachmil, A. Zolfi, A. Shabtai, Y . Unno, and H. Kojima. MONTRAGE: monitoring training for attribution of generative diffusion models. InProc. of the European Conf. on Computer Vision (ECCV), pages 1–17, 2024

work page 2024

[25] [25]

Mlodozeniec, I

B. Mlodozeniec, I. Reid, S. Power, D. Krueger, M. Erdogdu, R. E. Turner, and R. Grosse. Distributional training data attribution.ArXiv: 2506.12965, 2025

work page arXiv 2025

[26] [26]

S.-Y . Wang, A. Hertzmann, A. A. Efros, R. Zhang, and J.-Y . Zhu. Fast data attribution for text-to-image models.ArXiv: 2511.10721, 2025

work page arXiv 2025

[27] [27]

Alberti, K

A. Alberti, K. Hasanaliyev, M. Shah, and S. Ermon. Data unlearning in diffusion models. In Proc. of the Int. Conf. on Learning Representations (ICLR), 2025

work page 2025

[28] [28]

Heng and H

A. Heng and H. Soh. Selective amnesia: a continual learning approach to forgetting in deep generative models. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, pages 17170–17194. 2023

work page 2023

[29] [29]

Golatkar, A

A. Golatkar, A. Achille, and S. Soatto. Eternal sunshine of the spotless net: selective forgetting in deep networks. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 9301–9309, 2020

work page 2020

[30] [30]

Gandikota, J

R. Gandikota, J. Materzy´nska, J. Fiotto-Kaufman, and D. Bau. Erasing concepts from diffusion models. InProc. of the IEEE/CVF Int. Conf. on Computer Vision (ICCV), pages 2426–2436, 2023. 11

work page 2023

[31] [31]

J. Wu, T. Le, M. Hayat, and M. Harandi. Erasing undesirable influence in diffusion models. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 28263– 28273, 2025

work page 2025

[32] [32]

Zhang, X

Y . Zhang, X. Chen, J. Jia, Y . Zhang, C. Fan, J. Liu, M. Hong, K. Ding, and S. Liu. Defensive unlearning with adversarial training for robust concept erasure in diffusion models. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, pages 36748–36776. 2024

work page 2024

[33] [33]

Q. Shi, C. Jin, J. Zhang, and Y . Gu. ReTrack: data unlearning in diffusion models through redirecting the denoising trajectory.ArXiv: 2509.13007, 2025

work page arXiv 2025

[34] [34]

Ho and T

J. Ho and T. Salimans. Classifier-free diffusion guidance. InNeurIPS Workshop on Deep Generative Models and Downstream Applications (DGMs and Applications), 2021

work page 2021

[35] [35]

Liu and P

B. Liu and P. Stone. Continual learning and private unlearning. InProc. of the Conf. on Lifelong Learning Agents (CoLLAs), pages 243–254, 2022

work page 2022

[36] [36]

Zhang, L

R. Zhang, L. Lin, Y . Bai, and S. Mei. Negative preference optimization: from catastrophic collapse to effective unlearning. InProc. of the Conf. on Language Modeling (COLM), 2024

work page 2024

[37] [37]

A. K. Tarun, V . S. Chundawat, M. Mandal, and M. Kankanhalli. Fast yet effective machine unlearning.IEEE Transactions on Neural Networks and Learning Systems, 35(9):13046–13055, 2024

work page 2024

[38] [38]

Kurmanji, P

M. Kurmanji, P. Triantafillou, J. Hayes, and E. Triantafillou. Towards unbounded machine unlearning. InAdvances in Neural Information Processing Systems (NeurIPS), pages 1957–1987. 2023

work page 1957

[39] [39]

A. K. Veldanda, S.-X. Zhang, A. Das, S. Chakraborty, S. Rawls, S. Sahu, and M. Naphade. LLM surgery: efficient knowledge unlearning and editing in large language models.ArXiv: 2409.13054, 2024

work page arXiv 2024

[40] [40]

J. Ren, Z. Dai, X. Tang, H. Liu, J. Zeng, Z. Li, R. Goutam, S. Wang, Y . Xing, Q. He, and H. Liu. A general framework to enhance fine-tuning-based LLM unlearning.Findings of the Association for Computational Linguistics (ACL), pages 18464–18476, 2025

work page 2025

[41] [41]

Krizhevsky

A. Krizhevsky. Learning multiple layers of features from tiny images.Technical Report, 2009

work page 2009

[42] [42]

P. Liao, X. Li, X. Liu, and K. Keutzer. The ArtBench dataset: benchmarking generative models with artworks.ArXiv: 2206.11404, 2022

work page arXiv 2022

[43] [43]

H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. From captions to visual concepts and back. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 1473–1482, 2015

work page 2015

[44] [44]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. InProc. of the Int. Conf. on Machine Learning (ICML), pages 8748–8763, 2021

work page 2021

[45] [45]

Peebles and S

W. Peebles and S. Xie. Scalable diffusion models with transformers. InProc. of the IEEE Int. Conf. on Computer Vision (ICCV), pages 4195–4205, 2023

work page 2023

[46] [46]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. PyTorch: an imperative style, high-performance deep learning library. InAdvances in Neural Information Processing Syst...

work page 2019

[47] [47]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled weight decay regularization. InProc. of the Int. Conf. on Learning Representations (ICLR), 2019. 12

work page 2019

[48] [48]

Darcet, T

M Oquab, T. Darcet, T. Moutakanni, H. Y . V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y . Huang, S.-W. Li, I. Misra, M. Rabbat, V . Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski. DINOv2: learning robust visual features without su...

work page 2023

[49] [49]

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004

work page 2004

[50] [50]

Pizzi, S

E. Pizzi, S. D. Roy, S. N. Ravindra, P. Goyal, and M. Douze. A self-supervised descriptor for image copy detection. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 14532–14542, 2022

work page 2022

[51] [51]

Zhang, P

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 586–595, 2018

work page 2018

[52] [52]

A. P. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms.Pattern Recognition, 30(7):1145–1159, 1997

work page 1997

[53] [53]

S. J. Mason and N. E. Graham. Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: statistical significance and interpretation.Quarterly Journal of the Royal Meteorological Society, 128(584):2145–2166, 2002

work page 2002

[54] [54]

Radovanovi´c, A

M. Radovanovi´c, A. Nanopoulos, and M. Ivanovi´c. Hubs in space: popular nearest neighbors in high-dimensional data.Journal of Machine Learning Research, 11:2487–2531, 2010

work page 2010

[55] [55]

Pearl.Causality

J. Pearl.Causality. Cambridge University Press, 2nd edition, 2013

work page 2013

[56] [56]

J. R. Epifano, R. P. Ramachandran, A. J. Masino, and G. Rasool. Revisiting the fragility of influence functions.Neural Networks, 162:581–588, 2023

work page 2023

[57] [57]

Y . Hu, P. Hu, H. Zhao, and J. W. Ma. Most influential subset selection: challenges, promises, and beyond. InAdvances in Neural Information Processing Systems (NeurIPS), 37, pages 119778–119810. 2024

work page 2024

[58] [58]

K. K and A. Søgaard. Revisiting methods for finding influential examples.ArXiv: 2111.04683, 2021

work page arXiv 2021

[59] [59]

Darcet, M

T. Darcet, M. Oquab, J. Mairal, and P. Bojanowski. Vision transformers need registers. InProc. of the Int. Conf. on Learning Representations (ICLR), 2023. 13 A Supplementary Methodology A.1 Pseudo-Code Below we provide a Python-style pseudo-code of MUCS. Variables follow the notation in the main text (Sec. 2). The full code is released at[Link will be ava...

work page 2023