Score-based Membership Inference on Diffusion Models
Pith reviewed 2026-05-18 12:25 UTC · model grok-4.3
The pith
The norm of a diffusion model's single denoiser output reveals whether an input was part of its training set.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We show that the expected denoiser output points toward a kernel-weighted local mean of nearby training samples, such that its norm encodes proximity to the training set and thereby reveals membership.
What carries the argument
The norm of the expected denoiser output acting as an indicator of proximity via kernel-weighted averaging of training samples.
If this is right
- A single forward pass through the denoiser is sufficient for membership inference.
- The proposed SimA attack achieves state-of-the-art performance with lower computational cost.
- The method extends to latent diffusion models without modification.
- Complex multi-query reconstruction methods can be replaced by this simpler statistic.
Where Pith is reading between the lines
- This suggests diffusion models encode local training data structure in their score predictions in a detectable way.
- Attacks based on this principle could be adapted to other generative models that use denoising or score estimation.
- It may be possible to use this for other privacy audits like data extraction attacks with similar efficiency.
Load-bearing premise
The model's denoiser output at a point is directed toward an average of similar training examples.
What would settle it
If the norms of denoiser outputs for training samples and non-training samples show no statistical difference, the membership inference claim would be falsified.
Figures
read the original abstract
Membership inference attacks (MIAs) against Diffusion Models (DMs) raise pressing privacy concerns by revealing whether a sample was part of the training set. While existing methods typically rely on measuring reconstruction error across multiple denoising steps as a test statistic, they often incur significant computational overhead. In this work, we present a simple yet successful attack statistic using only the predicted noise vectors from the DM's denoiser, or equivalently, the score. Specifically, we show that the expected denoiser output points toward a kernel-weighted local mean of nearby training samples, such that its norm encodes proximity to the training set and thereby reveals membership. Building on this observation, we propose SimA, a single-query attack that provides a principled, efficient alternative to existing multi-query methods. SimA consistently achieves superior performance across variants of DMs and the Latent Diffusion Models (LDMs) on eight different datasets. Its Monte Carlo variant (SimA-MC) exhibits state-of-the-art performance across all experiments, significantly outperforming baseline methods in terms of TPR@1%FPR. These results demonstrate that complex reconstruction trajectories are unnecessary for effective membership inference, establishing SimA as a highly efficient benchmark for auditing privacy in DMs and LDMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces SimA, a single-query membership inference attack against diffusion models (and latent diffusion models) that relies on the norm of the denoiser output (equivalently, the score) as the test statistic. The central theoretical claim is that the expected denoiser output on a query point approximates a kernel-weighted local mean of nearby training samples, so that larger norms indicate membership. The authors report that SimA and its Monte Carlo variant (SimA-MC) achieve superior TPR@1%FPR compared with multi-query reconstruction baselines across eight datasets and multiple DM/LDM variants.
Significance. If the claimed link between denoiser behavior and local training proximity holds, the work supplies a computationally lightweight, theoretically motivated benchmark for privacy auditing of generative models. It demonstrates that complex multi-step trajectories are not required for strong empirical performance and credits the grounding in the score-matching objective. The consistent gains across model families constitute a practical contribution even if the exact kernel interpretation requires refinement.
major comments (2)
- [§3] §3 (theoretical derivation): The step equating the expected denoiser output to a kernel-weighted local mean of training points is invoked to justify the single-query norm statistic, yet the manuscript provides no explicit error bounds, convergence conditions, or restrictions on the noise schedule and architecture under which the approximation holds. This assumption is load-bearing for the central claim that the norm directly encodes membership.
- [§4.2] §4.2 (attack definition): The transition from the expectation argument to the practical SimA statistic appears to rely on an implicit regime in which local memorized neighbors dominate global generalization; without a supporting lemma or finite-sample analysis, it is unclear how sensitive the attack is to overparameterization or capacity choices.
minor comments (2)
- [Table 1] Table 1 and Figure 2: the reported standard deviations across runs are small, but the manuscript should state whether the same random seeds and exact hyper-parameters were used for all baselines to allow direct comparison.
- [Notation] Notation: the kernel weighting function is introduced without an explicit definition of its bandwidth parameter; adding a short paragraph clarifying its selection would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address the major comments point by point below, clarifying the theoretical foundations while acknowledging areas where additional discussion and experiments will strengthen the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (theoretical derivation): The step equating the expected denoiser output to a kernel-weighted local mean of training points is invoked to justify the single-query norm statistic, yet the manuscript provides no explicit error bounds, convergence conditions, or restrictions on the noise schedule and architecture under which the approximation holds. This assumption is load-bearing for the central claim that the norm directly encodes membership.
Authors: We appreciate the referee highlighting the need for greater rigor in the theoretical derivation. Section 3 builds on the score-matching objective, under which the trained denoiser approximates the score of the data distribution; this score can be interpreted as inducing a kernel-weighted local average when the noise schedule acts as a bandwidth parameter. We acknowledge that the current manuscript does not supply explicit error bounds or convergence rates. In the revised version we will expand §3 with a dedicated paragraph that states the key assumptions (sufficiently trained model, query points near the data manifold, and noise levels where local structure dominates) and explicitly notes the lack of finite-sample guarantees as a limitation and avenue for future work. This addition will better contextualize why the norm serves as a membership signal without overstating the approximation. revision: yes
-
Referee: [§4.2] §4.2 (attack definition): The transition from the expectation argument to the practical SimA statistic appears to rely on an implicit regime in which local memorized neighbors dominate global generalization; without a supporting lemma or finite-sample analysis, it is unclear how sensitive the attack is to overparameterization or capacity choices.
Authors: We agree that the practical SimA statistic implicitly assumes a regime in which overparameterized models memorize local training structures. Our experiments across eight datasets and multiple DM/LDM architectures already demonstrate strong empirical performance under these conditions. In the revision we will add a clarifying paragraph in §4.2 that explicitly states this assumption and links it to known memorization behavior in generative models. We will also report new experiments that vary model capacity (smaller versus larger U-Net backbones) to illustrate sensitivity to overparameterization. While a full supporting lemma lies beyond the present scope, these changes will make the transition from theory to practice more transparent. revision: partial
Circularity Check
Derivation of expected denoiser output as kernel-weighted local mean is self-contained from diffusion objective
full rationale
The paper presents the central observation—that the expected denoiser output points toward a kernel-weighted local mean of nearby training samples—as a direct consequence of the score-matching training objective in diffusion models. No load-bearing step reduces by construction to a fitted parameter, self-citation chain, or renamed input; the attack statistic follows from this theoretical expectation without circular redefinition. The derivation is grounded in the model's training process rather than empirical fitting or prior author work invoked as uniqueness, and empirical results on eight datasets serve as external validation rather than the sole justification.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The trained denoiser output on a point near the data manifold approximates a kernel-weighted local mean of training samples.
Reference graph
Works this paper leans on
-
[1]
What regularized auto-encoders learn from the data-generating distribution
Guillaume Alain and Yoshua Bengio. What regularized auto-encoders learn from the data-generating distribution. The Journal of Machine Learning Research, 15 0 (1): 0 3563--3593, 2014
work page 2014
-
[2]
Dictionary learning and tensor decomposition via the sum-of-squares method
Boaz Barak, Jonathan A Kelner, and David Steurer. Dictionary learning and tensor decomposition via the sum-of-squares method. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pp.\ 143--151, 2015
work page 2015
-
[3]
Implicit Density Estimation by Local Moment Matching to Sample from Auto-Encoders
Yoshua Bengio, Guillaume Alain, and Salah Rifai. Implicit density estimation by local moment matching to sample from auto-encoders. arXiv preprint arXiv:1207.0057, 2012
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[4]
Understanding disentangling in $\beta$-VAE
Christopher P Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. Understanding disentangling in -vae. arXiv preprint arXiv:1804.03599, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[5]
Extracting training data from diffusion models
Nicolas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramer, Borja Balle, Daphne Ippolito, and Eric Wallace. Extracting training data from diffusion models. In 32nd USENIX security symposium (USENIX Security 23), pp.\ 5253--5270, 2023
work page 2023
-
[6]
Gan-leaks: A taxonomy of membership inference attacks against generative models
Dingfan Chen, Ning Yu, Yang Zhang, and Mario Fritz. Gan-leaks: A taxonomy of membership inference attacks against generative models. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pp.\ 343--362, 2020
work page 2020
-
[7]
An analysis of single-layer networks in unsupervised feature learning
Adam Coates, Andrew Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp.\ 215--223. JMLR Workshop and Conference Proceedings, 2011
work page 2011
-
[8]
Inverting the generator of a generative adversarial network
Antonia Creswell and Anil Anthony Bharath. Inverting the generator of a generative adversarial network. IEEE transactions on neural networks and learning systems, 30 0 (7): 0 1967--1974, 2018
work page 1967
-
[9]
Diffusion models beat gans on image synthesis
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34: 0 8780--8794, 2021
work page 2021
-
[10]
Jinhao Duan, Fei Kong, Shiqi Wang, Xiaoshuang Shi, and Kaidi Xu. Are diffusion models vulnerable to membership inference attacks? In International Conference on Machine Learning, pp.\ 8717--8730. PMLR, 2023
work page 2023
-
[11]
Taming transformers for high-resolution image synthesis
Patrick Esser, Robin Rombach, and Bjorn Ommer. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 12873--12883, 2021
work page 2021
-
[12]
A probabilistic fluctuation based membership inference attack for diffusion models
Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, and Tao Jiang. A probabilistic fluctuation based membership inference attack for diffusion models. arXiv preprint arXiv:2308.12143, 2023
-
[13]
On memorization in diffusion models.arXiv preprint arXiv:2310.02664, 2023b
Xiangming Gu, Chao Du, Tianyu Pang, Chongxuan Li, Min Lin, and Ye Wang. On memorization in diffusion models. arXiv preprint arXiv:2310.02664, 2023
-
[14]
Monte carlo and reconstruction membership inference attacks against generative models
Benjamin Hilprecht, Martin H \"a rterich, and Daniel Bernau. Monte carlo and reconstruction membership inference attacks against generative models. Proceedings on Privacy Enhancing Technologies, 2019
work page 2019
-
[15]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 0 6840--6851, 2020
work page 2020
-
[16]
Membership inference attacks against gans by leveraging over-representation regions
Hailong Hu and Jun Pang. Membership inference attacks against gans by leveraging over-representation regions. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp.\ 2387--2389, 2021
work page 2021
-
[17]
A family of fixed-point algorithms for independent component analysis
Aapo Hyvarinen. A family of fixed-point algorithms for independent component analysis. In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 5, pp.\ 3917--3920. IEEE, 1997
work page 1997
-
[18]
Image-to-image translation with conditional adversarial networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 1125--1134, 2017
work page 2017
-
[19]
Mason Kamb and Surya Ganguli. An analytic theory of creativity in convolutional diffusion models. arXiv preprint arXiv:2412.20292, 2024
-
[20]
Elucidating the design space of diffusion-based generative models
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.\ 26565--26577. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_fil...
work page 2022
-
[21]
Auto-encoding variational bayes, 2013
Diederik P Kingma, Max Welling, et al. Auto-encoding variational bayes, 2013
work page 2013
-
[22]
An efficient membership inference attack for the diffusion model by proximal initialization
Fei Kong, Jinhao Duan, RuiPeng Ma, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, and Kaidi Xu. An efficient membership inference attack for the diffusion model by proximal initialization. arXiv preprint arXiv:2305.18355, 2023
-
[23]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009
work page 2009
-
[24]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll \'a r, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pp.\ 740--755. Springer, 2014
work page 2014
-
[25]
Deep learning face attributes in the wild
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pp.\ 3730--3738, 2015
work page 2015
-
[26]
Locality in image diffusion models emerges from data statistics
Artem Lukoianov, Chenyang Yuan, Justin Solomon, and Vincent Sitzmann. Locality in image diffusion models emerges from data statistics. arXiv preprint arXiv:2509.09672, 2025
-
[27]
Understanding diffusion models: A unified perspective,
Calvin Luo. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970, 2022
-
[28]
Membership inference attacks against diffusion models
Tomoya Matsumoto, Takayuki Miura, and Naoto Yanai. Membership inference attacks against diffusion models. In 2023 IEEE Security and Privacy Workshops (SPW), pp.\ 77--83. IEEE, 2023
work page 2023
-
[29]
Elizbar A Nadaraya. On estimating regression. Theory of Probability & Its Applications, 9 0 (1): 0 141--142, 1964
work page 1964
-
[30]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pp.\ 8162--8171. PMLR, 2021
work page 2021
-
[31]
White-box membership inference attacks against diffusion models
Yan Pang, Tianhao Wang, Xuhui Kang, Mengdi Huai, and Yang Zhang. White-box membership inference attacks against diffusion models. arXiv preprint arXiv:2308.06405, 2023
-
[32]
Score-based generative models detect manifolds
Jakiw Pidstrigach. Score-based generative models detect manifolds. Advances in Neural Information Processing Systems, 35: 0 35852--35865, 2022
work page 2022
-
[33]
Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? In International conference on machine learning, pp.\ 5389--5400. PMLR, 2019
work page 2019
-
[34]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 10684--10695, 2022
work page 2022
-
[35]
Imagenet large scale visual recognition challenge
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115 0 (3): 0 211--252, 2015
work page 2015
-
[36]
White-box vs black-box: Bayes optimal strategies for membership inference
Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Yann Ollivier, and Herv \'e J \'e gou. White-box vs black-box: Bayes optimal strategies for membership inference. In International Conference on Machine Learning, pp.\ 5558--5567. PMLR, 2019
work page 2019
-
[37]
Laion-5b: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in neural information processing systems, 35: 0 25278--25294, 2022
work page 2022
-
[38]
Membership inference attacks against machine learning models
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pp.\ 3--18. IEEE, 2017
work page 2017
-
[39]
Diffusion art or digital forgery? investigating data replication in diffusion models
Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Diffusion art or digital forgery? investigating data replication in diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 6048--6058, 2023 a
work page 2023
-
[40]
Understanding and mitigating copying in diffusion models
Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Understanding and mitigating copying in diffusion models. Advances in Neural Information Processing Systems, 36: 0 47783--47803, 2023 b
work page 2023
-
[41]
Generative modeling by estimating gradients of the data distribution
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019
work page 2019
-
[42]
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[43]
On the l4 norm of spherical harmonics
Robert J Stanton and Alan Weinstein. On the l4 norm of spherical harmonics. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 89, pp.\ 343--358. Cambridge University Press, 1981
work page 1981
-
[44]
Neural discrete representation learning
Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017
work page 2017
-
[45]
Geoffrey S Watson. Smooth regression analysis. Sankhy \=a : The Indian Journal of Statistics, Series A , pp.\ 359--372, 1964
work page 1964
-
[46]
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the association for computational linguistics, 2: 0 67--78, 2014
work page 2014
-
[47]
Membership inference on text-to-image diffusion models via conditional likelihood discrepancy
Shengfang Zhai, Huanran Chen, Yinpeng Dong, Jiajun Li, Qingni Shen, Yansong Gao, Hang Su, and Yang Liu. Membership inference on text-to-image diffusion models via conditional likelihood discrepancy. Advances in Neural Information Processing Systems, 37: 0 74122--74146, 2024
work page 2024
- [48]
-
[49]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[50]
\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...
-
[51]
\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...
-
[52]
@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.