Optimal Guarantees for Auditing R\'enyi Differentially Private Machine Learning
Pith reviewed 2026-05-22 07:44 UTC · model grok-4.3
The pith
Auditing Rényi differential privacy with Donsker-Varadhan estimators achieves information-theoretically optimal sample complexity up to logarithmic factors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Class-restricted Donsker-Varadhan estimators, when applied to Rényi divergence estimation via hypothesis testing, deliver non-asymptotic confidence intervals for RDP auditing that separate statistical error from algorithmic leakage; these estimators further satisfy matching minimax lower bounds, proving that the derived sample-complexity guarantees are information-theoretically optimal up to logarithmic factors and thereby supplying the first optimal guarantees for black-box RDP auditing.
What carries the argument
Class-restricted Donsker-Varadhan variational estimator, which recasts Rényi divergence estimation as a variational optimization problem inside a hypothesis test to produce tight, non-asymptotic bounds.
If this is right
- Auditors obtain rigorous, non-asymptotic guarantees on how tightly a claimed RDP parameter matches the true privacy loss.
- Empirical lower bounds on privacy leakage improve markedly over prior black-box methods, especially at small and moderate Rényi orders.
- The same framework applies directly to DP-SGD training pipelines on standard image datasets without requiring internal model access.
- Sample requirements for reliable auditing scale near-optimally with the desired and the privacy parameter.
Where Pith is reading between the lines
- Regulatory or certification bodies could adopt the method as a standardized test for claimed Rényi privacy levels in deployed models.
- The variational hypothesis-testing approach may extend to auditing other privacy definitions such as zero-concentrated or concentrated DP.
- If the black-box separation continues to hold on more complex architectures, the need for white-box or model-specific auditing techniques would diminish.
Load-bearing premise
The Donsker-Varadhan estimator can be realized in a fully black-box manner on any model and data distribution without further restrictions that would couple statistical error to privacy leakage.
What would settle it
An explicit construction or numerical counter-example in which the number of samples needed to certify a given RDP parameter exceeds the paper's upper bound by more than a logarithmic factor, or in which the estimator cannot separate estimation variance from true privacy violation on a simple mechanism such as the Gaussian.
Figures
read the original abstract
We study black-box auditing for machine learning algorithms that claim R \ 'enyi differential privacy (RDP) guarantees. We introduce an auditing framework, based on hypothesis testing, that directly estimates R\'enyi divergence between neighboring executions using the Donsker-Varadhan (DV) variational estimator. Our analysis yields explicit and non-asymptotic confidence intervals for RDP auditing via class-restricted DV estimators, separating statistical estimation error from algorithmic privacy leakage. We prove matching minimax lower bounds showing that, up to logarithmic factors, our sample-complexity guarantees are information-theoretically optimal, thereby establishing the first optimal guarantees for auditing RDP via DV estimators. Empirically, we instantiate our framework for auditing DP-SGD in a fully black-box setting. Across MNIST and CIFAR-10, and over a wide range of privacy regimes, our auditors produce a strong overall improvement on empirical RDP lower bounds compared to prior state-of-the-art black-box methods especially at small and moderate R\'enyi orders where accurate auditing is most challenging.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a black-box auditing framework for Rényi differentially private ML algorithms based on hypothesis testing and Donsker-Varadhan variational estimators to directly estimate Rényi divergence between neighboring executions. It derives explicit non-asymptotic confidence intervals separating statistical estimation error from algorithmic privacy leakage, proves matching minimax lower bounds (up to logarithmic factors) establishing information-theoretic optimality for the auditing procedure, and reports empirical improvements over prior black-box methods when auditing DP-SGD on MNIST and CIFAR-10 across privacy regimes.
Significance. If the central claims hold, the work delivers the first optimal sample-complexity guarantees for auditing RDP via class-restricted DV estimators. Credit is due for the clean separation of estimation error from leakage, the information-theoretic lower-bound arguments, the explicit non-asymptotic intervals, and the reproducible empirical evaluation on standard datasets that shows gains especially at small and moderate Rényi orders.
major comments (1)
- [Theorem 4.3] Theorem 4.3 (minimax lower bound): the lower-bound construction relies on a hypothesis-testing reduction whose test-function class is not shown to coincide with the restricted class F used for the class-restricted DV estimator in the upper bound (Theorem 3.1 and §3.2). Without an explicit argument that the packing or variational lower bound applies to the same F (e.g., bounded neural networks), the claim that the sample-complexity guarantees are information-theoretically optimal for the black-box auditing procedure does not follow.
minor comments (3)
- [§3.1] §3.1: the definition of the function class F for the restricted DV estimator is introduced only informally; an explicit statement of the norm or Lipschitz constraint on F in the main text would improve readability.
- [Figure 3] Figure 3 caption: the plotted confidence intervals should state the exact Rényi order α and the number of independent runs used to compute the empirical coverage.
- [Appendix B.2] Appendix B.2: the covering-number bound used to control the log factor in the sample complexity is stated without a reference to the precise entropy integral or chaining argument employed.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for identifying this important point on the alignment between the function classes in the upper and lower bounds. We address the comment below.
read point-by-point responses
-
Referee: [Theorem 4.3] Theorem 4.3 (minimax lower bound): the lower-bound construction relies on a hypothesis-testing reduction whose test-function class is not shown to coincide with the restricted class F used for the class-restricted DV estimator in the upper bound (Theorem 3.1 and §3.2). Without an explicit argument that the packing or variational lower bound applies to the same F (e.g., bounded neural networks), the claim that the sample-complexity guarantees are information-theoretically optimal for the black-box auditing procedure does not follow.
Authors: We agree that an explicit connection is required to establish that the minimax lower bound applies to the same restricted class F used by the class-restricted DV estimator. The current manuscript derives the lower bound via a general hypothesis-testing reduction for Rényi divergence estimation. In the revision we will add a short lemma in Section 4 showing that the test functions arising from the packing construction belong to (or can be approximated by elements of) F, under the standard assumption that F consists of bounded neural networks or Lipschitz functions with sufficient capacity. This addition will make the information-theoretic optimality claim precise for the black-box auditing procedure with the restricted class, up to the logarithmic factors already stated. revision: yes
Circularity Check
No significant circularity in optimality claims or derivation chain
full rationale
The paper derives its sample-complexity upper bounds and matching minimax lower bounds via separate information-theoretic arguments and variational representations of Rényi divergence. The lower bounds are presented as independent of the specific class-restricted DV estimator instantiations used for the upper bounds and confidence intervals; no equations or steps reduce the claimed optimality to a fitted parameter, self-definition, or self-citation chain by construction. The framework explicitly separates statistical estimation error from algorithmic leakage, and the analysis is self-contained against external benchmarks rather than relying on renaming or smuggling of prior ansatzes.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Rényi divergence admits a variational representation via the Donsker-Varadhan formula that can be restricted to suitable function classes.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 3.2 ... εn(δ) = Cα,M (√(d log(K/η) + log(1/δ))/n + η) ... class-restricted DV Rényi estimators
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 3.3 ... minimax lower bound ... n ≤ c d / ε² ... packing argument via Gilbert-Varshamov
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Deep learning with differential privacy
Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. InProceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016
work page 2016
-
[2]
Daniel Alabi. The existence of error-correcting codes implies privacy lower bounds.IEEE BITS the Information Theory Magazine, pages 1–12, 2026
work page 2026
-
[3]
Privacy and security in distributed data markets
Daniel Alabi, Sainyam Galhotra, Shagufta Mehnaz, Zeyu Song, and Eugene Wu. Privacy and security in distributed data markets. InCompanion of the 2025 International Conference on Management of Data, SIGMOD/PODS ’25, 2025
work page 2025
-
[4]
Venkat Anantharam. A variational characterization of rényi divergences.IEEE Transactions on Information Theory, 64(11):6979–6989, 2018
work page 2018
-
[5]
what do you want from theory alone?
Meenatchi Sundaram Muthu Selva Annamalai, Georgi Ganev, and Emiliano De Cristofaro. "what do you want from theory alone?" experimenting with tight auditing of differentially private synthetic data generation. InProceedings of the 33rd USENIX Conference on Security Symposium, SEC ’24, USA, 2024. USENIX Association
work page 2024
-
[6]
Mutual information neural estimation
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and Devon Hjelm. Mutual information neural estimation. InProceedings of the 35th International Conference on Machine Learning (ICML). PMLR, 2018
work page 2018
-
[7]
Benjamin Bichsel, Timon Gehr, Dana Drachsler-Cohen, Petar Tsankov, and Martin T. Vechev. Dp-finder: Finding differential privacy violations by sampling and optimization. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, pages 508–524. ACM, 2018
work page 2018
-
[8]
Jeremiah Birrell, Paul Dupuis, Markos A Katsoulakis, Luc Rey-Bellet, and Jie Wang. Varia- tional representations and neural network estimation of rényi divergences.SIAM Journal on Mathematics of Data Science, 3(4):1093–1116, 2021
work page 2021
-
[9]
Function-space regularized rényi divergences
Jeremiah Birrell, Yannis Pantazis, Paul Dupuis, Luc Rey-Bellet, and Markos Katsoulakis. Function-space regularized rényi divergences. InThe Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[10]
Thomas M. Cover and Joy A. Thomas.Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, July 2006
work page 2006
-
[11]
Detecting violations of differential privacy
Zeyu Ding, Yuxin Wang, Guanhong Wang, Danfeng Zhang, and Daniel Kifer. Detecting violations of differential privacy. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, pages 475–489. ACM, 2018
work page 2018
-
[12]
Jinshuo Dong, Aaron Roth, and Weijie J Su. Gaussian differential privacy.arXiv preprint arXiv:1905.02383, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[13]
Monroe Donsker and S. R. Srinivasa Varadhan. Asymptotic evaluation of certain markov process expectations for large time, IV.Communications on Pure and Applied Mathematics, 36(2):183–212, 1983
work page 1983
-
[14]
Our data, ourselves: Privacy via distributed noise generation
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. InEUROCRYPT, 2006
work page 2006
-
[15]
Cynthia Dwork, Nitin Kohli, and Deirdre Mulligan. Differential privacy in practice: Expose your epsilons!Journal of Privacy and Confidentiality, 9, 10 2019
work page 2019
-
[16]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. InTCC, 2006. 11
work page 2006
-
[17]
Infoshape: Task-based neural data shaping via mutual information
Homa Esfahanizadeh, William Wu, Manya Ghobadi, Regina Barzilay, and Muriel Médard. Infoshape: Task-based neural data shaping via mutual information. InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023
work page 2023
-
[18]
Detecting violations of differential privacy for quantum algorithms
Ji Guan, Wang Fang, Mingyu Huang, and Mingsheng Ying. Detecting violations of differential privacy for quantum algorithms. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS 2023, Copenhagen, Denmark, November 26-30, 2023, pages 2277–2291. ACM, 2023
work page 2023
-
[19]
Matthew Jagielski, Jonathan Ullman, and Alina Oprea. Auditing differentially private machine learning: How private is private sgd?Advances in Neural Information Processing Systems, 33:22205–22216, 2020
work page 2020
-
[20]
The composition theorem for differential privacy
Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. InInternational conference on machine learning, pages 1376–1385. PMLR, 2015
work page 2015
-
[21]
Peter Kairouz, Sewoong Oh, and Pramod Viswanath. Extremal mechanisms for local differential privacy.Journal of Machine Learning Research, 17(17):1–51, 2016
work page 2016
-
[22]
Benjamin D Kim, Vipindev Adat Vasudevan, Rafael GL D’Oliveira, Alejandro Cohen, Thomas Stahlbuhk, and Muriel Médard. Cryptanalysis via machine learning based information theoretic metrics.arXiv preprint arXiv:2501.15076, 2025
-
[23]
Crypto-mine: Cryptanalysis via mutual information neural estimation
Benjamin D Kim, Vipindev Adat Vasudevan, Jongchan Woo, Alejandro Cohen, Rafael GL D’Oliveira, Thomas Stahlbuhk, and Muriel Médard. Crypto-mine: Cryptanalysis via mutual information neural estimation. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4820–4824. IEEE, 2024
work page 2024
-
[24]
Dp-auditorium: A large-scale library for auditing differential privacy
William Kong, Andrés Muñoz Medina, Mónica Ribero, and Umar Syed. Dp-auditorium: A large-scale library for auditing differential privacy. In2024 IEEE Symposium on Security and Privacy (SP), pages 110–126, 2024
work page 2024
-
[25]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009
work page 2009
-
[26]
The mnist database of handwritten digits.http://yann
Yann LeCun. The mnist database of handwritten digits.http://yann. lecun. com/exdb/mnist/, 1998
work page 1998
-
[27]
F.J. MacWilliams and N.J.A. Sloane.The Theory of Error-Correcting Codes. North-holland Publishing Company, 2nd edition, 1978
work page 1978
-
[28]
Auditingf-differential privacy in one run,
Saeed Mahloujifar, Luca Melis, and Kamalika Chaudhuri. Auditing f-differential privacy in one run.arXiv preprint arXiv:2410.22235, 2024
-
[29]
Extension of range of functions.Bulletin of the American Mathemati- cal Society, 40:837–842, 1934
Edward James McShane. Extension of range of functions.Bulletin of the American Mathemati- cal Society, 40:837–842, 1934
work page 1934
-
[30]
Ilya Mironov. Rényi differential privacy. In2017 IEEE 30th Computer Security Foundations Symposium (CSF). IEEE, 2017
work page 2017
-
[31]
Meenatchi Sundaram Muthu Selva Annamalai and Emiliano De Cristofaro. Nearly tight black-box auditing of differentially private machine learning.Advances in Neural Information Processing Systems, 37:131482–131502, 2024
work page 2024
-
[32]
Tight auditing of differentially private machine learning
Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, and Andreas Terzis. Tight auditing of differentially private machine learning. In32nd USENIX Security Symposium (USENIX Security 23), pages 1631–1648, 2023
work page 2023
-
[33]
Membership inference attacks against machine learning models
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pages 3–18. IEEE Computer Society, 2017
work page 2017
-
[34]
Privacy auditing with one (1) training run
Thomas Steinke, Milad Nasr, and Matthew Jagielski. Privacy auditing with one (1) training run. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, 2023. 12
work page 2023
-
[35]
Hassler Whitney. Analytic extensions of differentiable functions defined in closed sets.Trans- actions of the American Mathematical Society, 36(1):63–89, 1934
work page 1934
-
[36]
Enhanced membership inference attacks against machine learning models
Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, and Reza Shokri. Enhanced membership inference attacks against machine learning models. In Heng Yin, Angelos Stavrou, Cas Cremers, and Elaine Shi, editors,Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-1...
work page 2022
-
[37]
How bounded privacy loss enables two-sided confidence intervals.By imposing a bounded privacy-loss assumption (exact for pure DP and a reasonable high-probability surrogate for many RDP mechanisms) we derive finite-sample upper and lower confidence bounds using classical concentration inequalities. These results justify when empirical audits can certify b...
-
[38]
Why the DV-based estimator is statistically sound for auditing.We provide a full non- asymptotic analysis of the DV variational estimator over restricted critic classes, showing uniform convergence, explicit rates, and consistency. This formally validates the use of neural divergence estimators as statistically principled auditing tools, rather than heuri...
-
[39]
(Uniform boundedness)sup θ,z |Tθ(z)| ≤M
-
[40]
(Lipschitz parameterization) For allz∈Ω, |Tθ(z)−T θ′(z)| ≤L∥θ−θ ′∥. Define the population DV Rényi functional V(θ) := 1 α−1 logE Q h e(α−1)Tθ(X) i − 1 α logE P h eαTθ(Y) i , and its empirical estimator bVn(θ) := 1 α−1 log 1 n nX i=1 e(α−1)Tθ(Xi) ! − 1 α log 1 n nX i=1 eαTθ(Yi) ! . Define RΘ α (Q∥P) := sup θ∈Θ V(θ), bRΘ α,n(Q∥P) := sup θ∈Θ bVn(θ). Then for...
work page 1981
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.