DMin: Scalable Training Data Influence Estimation for Diffusion Models
Pith reviewed 2026-05-23 07:07 UTC · model grok-4.3
The pith
DMin enables influence estimation for billion-parameter diffusion models by compressing gradients.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DMin is the first method capable of influence estimation for diffusion models with billions of parameters. Leveraging efficient gradient compression, DMin reduces storage requirements from hundreds of TBs to mere MBs or even KBs, and retrieves the top-k most influential training samples in under 1 second, all while maintaining performance.
What carries the argument
Efficient gradient compression that approximates the vectors required for per-sample influence scores without storing full gradients.
Load-bearing premise
The compression step keeps the relative ordering and numerical accuracy of influence scores close to what full gradients would give.
What would settle it
On a small diffusion model where full gradients fit in memory, compare the exact top-k influential samples against the top-k produced by DMin and measure the overlap or rank correlation.
Figures
read the original abstract
Identifying the training data samples that most influence a generated image is a critical task in understanding diffusion models (DMs), yet existing influence estimation methods are constrained to small-scale or LoRA-tuned models due to computational limitations. To address this challenge, we propose DMin (Diffusion Model influence), a scalable framework for estimating the influence of each training data sample on a given generated image. To the best of our knowledge, it is the first method capable of influence estimation for DMs with billions of parameters. Leveraging efficient gradient compression, DMin reduces storage requirements from hundreds of TBs to mere MBs or even KBs, and retrieves the top-k most influential training samples in under 1 second, all while maintaining performance. Our empirical results demonstrate DMin is both effective in identifying influential training samples and efficient in terms of computational and storage requirements.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DMin, a scalable framework for estimating the influence of each training data sample on a generated image from diffusion models. It claims to be the first method applicable to DMs with billions of parameters by using efficient gradient compression, which reduces storage from hundreds of TBs to MBs/KBs and allows top-k retrieval in under 1 second while maintaining performance. The abstract states that empirical results demonstrate both effectiveness in identifying influential samples and efficiency in computation/storage.
Significance. If the gradient compression is shown to preserve influence rankings without distortion, the work would enable influence estimation at scales previously impossible, supporting interpretability, data auditing, and debugging for large generative models.
major comments (2)
- [Abstract] Abstract: the central claim that gradient compression 'maintains performance' lacks any supporting quantitative evidence (e.g., Kendall-tau correlation, top-k overlap, or score distortion metrics) comparing compressed vs. uncompressed gradients. This is load-bearing because influence estimation relies on gradient similarities, and any systematic change in ranking would invalidate the scalability argument.
- [Abstract] Abstract: no details are supplied on the diffusion models tested (parameter counts, architectures), evaluation metrics for influence accuracy, baselines, datasets, or validation protocols, preventing assessment of whether the empirical results support the claims of effectiveness.
minor comments (1)
- [Abstract] Abstract: the storage reduction claim ('hundreds of TBs to mere MBs or even KBs') is stated without reference to a specific model size, number of training samples, or compression ratio achieved.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on the abstract. We address each point below and will revise the manuscript to strengthen the presentation of our claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that gradient compression 'maintains performance' lacks any supporting quantitative evidence (e.g., Kendall-tau correlation, top-k overlap, or score distortion metrics) comparing compressed vs. uncompressed gradients. This is load-bearing because influence estimation relies on gradient similarities, and any systematic change in ranking would invalidate the scalability argument.
Authors: We agree that the abstract would benefit from explicit quantitative support for the 'maintains performance' claim. The full manuscript reports these metrics (including Kendall-tau correlations exceeding 0.9 and top-k overlap rates) in the experimental evaluation comparing compressed and uncompressed gradients. We will revise the abstract to include a concise reference to these results. revision: yes
-
Referee: [Abstract] Abstract: no details are supplied on the diffusion models tested (parameter counts, architectures), evaluation metrics for influence accuracy, baselines, datasets, or validation protocols, preventing assessment of whether the empirical results support the claims of effectiveness.
Authors: The abstract is intentionally brief, but the manuscript provides these details in Sections 3 and 4 (e.g., billion-parameter models such as Stable Diffusion variants, LAION datasets, influence accuracy via retrieval metrics, comparison to prior influence methods, and validation protocols). We will revise the abstract to incorporate high-level information on the models, datasets, and evaluation setup. revision: yes
Circularity Check
No significant circularity; claims rest on empirical validation of compression
full rationale
The paper introduces DMin as a new scalable framework relying on gradient compression for influence estimation in billion-parameter diffusion models. No derivation chain, equations, or results in the abstract or described content reduce a claimed prediction or first-principles outcome to its own inputs by construction. There are no self-definitional steps, fitted inputs renamed as predictions, load-bearing self-citations, uniqueness theorems imported from the authors, or ansatzes smuggled via prior work. The scalability claims (storage reduction, retrieval speed, maintained performance) are presented as empirical outcomes rather than tautological. The method is self-contained against external benchmarks for influence ranking, yielding a normal non-finding.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Second- order stochastic optimization for machine learning in linear time
Naman Agarwal, Brian Bullins, and Elad Hazan. Second- order stochastic optimization for machine learning in linear time. J. Mach. Learn. Res., 18:116:1–116:40, 2017. 5, 8
work page 2017
-
[2]
Influ- ence functions in deep learning are fragile
Samyadeep Basu, Phillip Pope, and Soheil Feizi. Influ- ence functions in deep learning are fragile. In 9th Interna- tional Conference on Learning Representations, ICLR , Vir- tual Event, Austria, 2021. OpenReview.net. 2, 5
work page 2021
-
[3]
Introducing our multimodal models, 2023
Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, and Sa ˘gnak Tas ¸ırlar. Introducing our multimodal models, 2023. 5
work page 2023
-
[4]
Dominique Brunet, Edward R. Vrscay, and Zhou Wang. On the mathematical properties of the structural similarity index. IEEE Trans. Image Process., 21(4):1488–1499, 2012. 5
work page 2012
-
[5]
Diffu- siondet: Diffusion model for object detection
Shoufa Chen, Peize Sun, Yibing Song, and Ping Luo. Diffu- siondet: Diffusion model for object detection. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023 , pages 19773–19786. IEEE, 2023. 1
work page 2023
-
[6]
Anshuman Chhabra, Peizhao Li, Prasant Mohapatra, and Hongfu Liu. ”what data benefits my classifier?” enhancing model performance and interpretability through influence- based data selection. In The Twelfth International Confer- ence on Learning Representations, ICLR 2024, Vienna, Aus- tria, May 7-11, 2024. OpenReview.net, 2024. 8
work page 2024
-
[7]
Sang Keun Choe, Hwijeen Ahn, Juhan Bae, Kewen Zhao, Minsoo Kang, Youngseog Chung, Adithya Pratapa, Willie Neiswanger, Emma Strubell, Teruko Mitamura, Jeff G. Schneider, Eduard H. Hovy, Roger B. Grosse, and Eric P. Xing. What is your data worth to gpt? llm-scale data valua- tion with influence functions. CoRR, abs/2405.13954, 2024. 1
-
[8]
Diffusion models in vision: A sur- vey
Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. Diffusion models in vision: A sur- vey. IEEE Trans. Pattern Anal. Mach. Intell., 45(9):10850– 10869, 2023. 1
work page 2023
-
[9]
Epifano, Ravi Prakash Ramachandran, Aaron J
Jacob R. Epifano, Ravi Prakash Ramachandran, Aaron J. Masino, and Ghulam Rasool. Revisiting the fragility of in- fluence functions. Neural Networks, 162:581–588, 2023. 2, 5
work page 2023
-
[10]
Scaling rec- tified flow transformers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, and Robin Rombach. Scaling rec- tified flow transformers for high-resolution image synthesis. In Forty-first International Conference on Machine Learn- ing, ICML, Vie...
work page 2024
-
[11]
Kristian Georgiev, Joshua Vendrow, Hadi Salman, Sung Min Park, and Aleksander Madry. The journey, not the des- tination: How data guides diffusion models. CoRR, abs/2312.06205, 2023. 1, 2, 6, 8
-
[12]
Amirata Ghorbani, Abubakar Abid, and James Y . Zou. Inter- pretation of neural networks is fragile. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI , pages 3681–3688, Honolulu, Hawaii, 2019. 2, 5
work page 2019
-
[13]
Roger B. Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamile Luko- siute, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, and Samuel R. Bowman. Studying large lan- guage model generalization with influence functions. CoRR, abs/2308.03296, 2023. 1, 2
-
[14]
Training data influence analysis and estimation: a survey
Zayd Hammoudeh and Daniel Lowd. Training data influence analysis and estimation: a survey. Mach. Learn. , 113(5): 2351–2403, 2024. 8
work page 2024
-
[15]
Denoising dif- fusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. In Advances in Neural Infor- mation Processing Systems 33: Annual Conference on Neu- ral Information Processing Systems 2020, NeurIPS , virtual,
work page 2020
-
[16]
Understanding black-box predictions via influence functions
Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning, ICML, pages 1885–1894, Sydney, NSW, Australia, 2017. PMLR. 1, 8
work page 2017
-
[17]
Resolv- ing training biases via influence-based data relabeling
Shuming Kong, Yanyan Shen, and Linpeng Huang. Resolv- ing training biases via influence-based data relabeling. In The Tenth International Conference on Learning Represen- tations, ICLR 2022, Virtual Event, April 25-29, 2022. Open- Review.net, 2022. 1
work page 2022
-
[18]
Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models
Yongchan Kwon, Eric Wu, Kevin Wu, and James Zou. Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models. InThe Twelfth International Con- ference on Learning Representations, ICLR, Vienna, Austria,
-
[19]
Llava-med: Training a large language- and-vision assistant for biomedicine in one day
Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, and Jianfeng Gao. Llava-med: Training a large language- and-vision assistant for biomedicine in one day. In Ad- vances in Neural Information Processing Systems 36: An- nual Conference on Neural Information Processing Systems 2023, NeurIPS, New Or...
work page 2023
-
[20]
Junnan Li, Dongxu Li, Caiming Xiong, and Steven C. H. Hoi. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. InIn- ternational Conference on Machine Learning, ICML , pages 12888–12900, Baltimore, Maryland, 2022. 5
work page 2022
-
[21]
OPORP: one permutation + one random projection
Ping Li and Xiaoyun Li. OPORP: one permutation + one random projection. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD , pages 1303–1315, Long Beach, CA, 2023. ACM. 4
work page 2023
-
[22]
Token-wise influential training data retrieval for large lan- guage models
Huawei Lin, Jikai Long, Zhaozhuo Xu, and Weijie Zhao. Token-wise influential training data retrieval for large lan- guage models. In Proceedings of the 62nd Annual Meet- ing of the Association for Computational Linguistics, ACL , pages 841–860, Bangkok, Thailand, 2024. 2, 4, 5
work page 2024
-
[23]
arXiv preprint arXiv:2208.11970 , year=
Calvin Luo. Understanding diffusion models: A unified per- spective. CoRR, abs/2208.11970, 2022. 1
-
[24]
Deeper understanding of black-box predictions via generalized influence functions
Hyeonsu Lyu, Jonggyu Jang, Sehyun Ryu, and Hyun Jong Yang. Deeper understanding of black-box predictions via generalized influence functions. CoRR, abs/2312.05586,
-
[25]
Yury A. Malkov and Dmitry A. Yashunin. Efficient and ro- bust approximate nearest neighbor search using hierarchical 9 navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell., 42(4):824–836, 2020. 6
work page 2020
-
[26]
Influ- ence functions for scalable data attribution in diffusion mod- els
Bruno Mlodozeniec, Runa Eschenhagen, Juhan Bae, Alexander Immer, David Krueger, and Richard Turner. Influ- ence functions for scalable data attribution in diffusion mod- els. CoRR, abs/2410.13850, 2024. 1, 2
-
[27]
In- triguing properties of compression on multilingual models
Kelechi Ogueji, Orevaoghene Ahia, Gbemileke Onilude, Se- bastian Gehrmann, Sara Hooker, and Julia Kreutzer. In- triguing properties of compression on multilingual models. In Proceedings of the 2022 Conference on Empirical Meth- ods in Natural Language Processing, EMNLP, pages 9092– 9110, Abu Dhabi, United Arab Emirates, 2022. 1, 2, 6, 8
work page 2022
-
[28]
TRAK: attributing model behavior at scale
Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Aleksander Madry. TRAK: attributing model behavior at scale. In International Conference on Machine Learning, ICML , pages 27074–27113, Honolulu, Hawaii,
-
[29]
Estimating training data influence by tracing gradient descent
Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating training data influence by tracing gradient descent. In Advances in Neural Information Pro- cessing Systems 33: Annual Conference on Neural Informa- tion Processing Systems 2020, NeurIPS 2020, December 6- 12, 2020, virtual, 2020. 8
work page 2020
-
[30]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML, pages 8748–8763, Virt...
work page 2021
-
[31]
High-resolution image syn- thesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR , pages 10674–10685, New Orleans, LA, 2022. 2
work page 2022
-
[32]
Scaling up influence functions
Andrea Schioppa, Polina Zablotskaia, David Vilar, and Artem Sokolov. Scaling up influence functions. In Thirty- Sixth AAAI Conference on Artificial Intelligence, AAAI , pages 8179–8186, 2022. 8
work page 2022
-
[33]
LAION-5B: an open large-scale dataset for training next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. LAION-5B: an open large-scale dataset for training next generation image-text model...
work page 2022
-
[34]
WIT: wikipedia-based image text dataset for multimodal multilingual machine learning
Krishna Srinivasan, Karthik Raman, Jiecao Chen, Michael Bendersky, and Marc Najork. WIT: wikipedia-based image text dataset for multimodal multilingual machine learning. In SIGIR ’21: The 44th International ACM SIGIR Confer- ence on Research and Development in Information Retrieval, pages 2443–2449, Virtual Event, Canada, 2021
work page 2021
-
[35]
Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau
Zijie J. Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau. Diffu- siondb: A large-scale prompt gallery dataset for text-to- image generative models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL, pages 893–911, Toronto, Canada, 2023. 1
work page 2023
-
[36]
Diffusion models: A comprehensive survey of methods and applications
Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Run- sheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming- Hsuan Yang. Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv., 56(4):105:1– 105:39, 2024. 1
work page 2024
-
[37]
Revisit, extend, and enhance hessian-free influence functions.CoRR, abs/2405.17490, 2024
Ziao Yang, Han Yue, Jian Chen, and Hongfu Liu. Revisit, extend, and enhance hessian-free influence functions.CoRR, abs/2405.17490, 2024. 8
-
[38]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In IEEE/CVF International Conference on Computer Vision, ICCV, pages 3813–3824, Paris, France, 2023. 1 10 DMin: Scalable Training Data Influence Estimation for Diffusion Models Supplementary Material
work page 2023
-
[39]
In this section, we report the detailed setting and environ- ments for our experiments
Experimental Settings. In this section, we report the detailed setting and environ- ments for our experiments. Implementation Details. We provide an open-source PyTorch implementation with multiprocessing support. We leverage Huggingface, Accelerate, Transformers, Diffusers and Peft in our implementation. Experimental Environments. Our experiments are con...
-
[40]
Ablation Study To better understand the impact of key parameters on the performance of the HNSW implementation, we conducted an ablation study by varying the graph-related parameters M and ef, as well as the construction parameter efconstruction. Table 6 summarizes the average detection rates across three subsets: Flowers, Lego Sets, and Magic Cards, unde...
-
[41]
Examples for other methods are omitted as they are nearly identical
Supplemental Visualization for Conditional Diffusion Models We provide additional visualizations for unconditional models on the MNIST dataset in Figure 5 and for condi- tional models in Figure 6. Examples for other methods are omitted as they are nearly identical. 3 A bold, digital portrait, partial woman's face, framed by a large green leaf, mysterious,...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.