pith. sign in

arxiv: 2601.10237 · v2 · submitted 2026-01-15 · 💻 cs.LG · cs.CR

Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD

Pith reviewed 2026-05-16 13:33 UTC · model grok-4.3

classification 💻 cs.LG cs.CR
keywords differential privacyDP-SGDf-differential privacyshuffled samplingprivacy utility tradeoffgaussian noisestochastic optimization
0
0 comments X

The pith

Shuffled DP-SGD cannot achieve strong privacy and high utility at once under standard worst-case analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that in the f-differential privacy framework, DP-SGD using shuffled sampling over one epoch with M updates faces a hard limit on its privacy-utility tradeoff. Specifically, to keep the privacy separation small, the Gaussian noise multiplier must exceed one over the square root of twice the natural log of M. This bound persists for practical values of M, leading to noise levels that degrade model performance substantially. The result highlights why achieving favorable guarantees is difficult without relaxing the adversarial model or changing the sampling approach.

Core claim

We analyze DP-SGD in the f-DP framework for shuffled sampling with M gradient updates and derive an explicit suboptimal upper bound on the achievable trade-off curve. This induces a geometric lower bound on the separation κ between the mechanism's curve and the random-guessing line. Consequently, either the noise multiplier σ satisfies σ ≥ 1/√(2 ln M) or κ ≥ (1/√8)(1 - 1/√(4π ln M)), showing that strong privacy and high utility cannot be achieved simultaneously.

What carries the argument

The separation κ, which measures the maximum vertical distance from the f-DP trade-off curve to the diagonal random-guessing line, serving as a proxy for adversarial advantage.

If this is right

  • Shuffled DP-SGD requires σ at least 1/√(2 ln M) to achieve small κ.
  • The same limitation applies to Poisson subsampling up to constant factors.
  • For typical training with moderate M, the implied noise causes notable accuracy loss.
  • As M increases the bound decreases but does so too slowly for practical relief.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners may need to adopt relaxed privacy notions or non-shuffled methods to bypass this bound.
  • The slow asymptotic improvement suggests rethinking single-epoch assumptions in private training.
  • This bound could inform minimum noise levels in DP-SGD to ensure theoretical privacy guarantees.

Load-bearing premise

The standard worst-case adversarial model in the f-DP framework applies directly without modification to the single-epoch shuffled sampling process.

What would settle it

Running shuffled DP-SGD with σ below the bound and verifying whether the observed trade-off curve separation κ stays below the predicted lower bound would falsify the claim.

Figures

Figures reproduced from arXiv: 2601.10237 by Marten van Dijk, Murat Bilgehan Ertan.

Figure 1
Figure 1. Figure 1: Trade-off view of privacy in the f-DP framework [21]. The black line shows the ideal random-guessing trade-off between type I and type II errors. The vertical red segment κ denotes the the maximum distance between the achievable f-DP trade-off and the ideal limit. primarily a modeling convenience: in practice, modern deep learning systems do not sample examples independently but instead shuffle the entire … view at source ↗
Figure 2
Figure 2. Figure 2: Illustrative geometry of the suboptimal and true trade-off functions in our impossibility [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Explicit lower bound on the separation sep Gµ(M,1,σ−1)  as a function of the number of rounds per epoch M under noise schedules of the form σ = s/√ ln M, with E = 1. where the last equality holds whenever M1/s2 ≥ 4. Finally, we simplify the expression further by observing that for sufficiently large M, M1/s2 − 4 ≥ 1 2 M1/s2 . Applying this bound to (51) yields µ(M, 1, σ−1 ) ≥ 1 √ M q 1 2 M1/s2 = 1 √ 2 M 1… view at source ↗
Figure 4
Figure 4. Figure 4: GDP-predicted separation κµ-GDP(M, E, σ−1 ) as a function of the number of rounds per epoch M under the noise schedule σ = 1/ √ 2 ln M, for several epoch counts E. We emphasize that, unlike the asymptotic µ-GDP approximation discussed above, our main separation bound is non-asymptotic. In particular, it holds for every finite M in the single-epoch setting, without requiring any limiting regime or asymptoti… view at source ↗
read the original abstract

Differentially Private Stochastic Gradient Descent (DP-SGD) is the dominant paradigm for private training, but its fundamental limitations under worst-case adversarial privacy definitions remain poorly understood. We analyze DP-SGD in the $f$-differential privacy framework, which characterizes privacy via hypothesis-testing trade-off curves, and study shuffled sampling over a single epoch with $M$ gradient updates. We derive an explicit suboptimal upper bound on the achievable trade-off curve. This result induces a geometric lower bound on the separation $\kappa$ which is the maximum distance between the mechanism's trade-off curve and the ideal random-guessing line. Because a large separation implies significant adversarial advantage, meaningful privacy requires small $\kappa$. However, we prove that enforcing a small separation imposes a strict lower bound on the Gaussian noise multiplier $\sigma$, which directly limits the achievable utility. In particular, under the standard worst-case adversarial model, shuffled DP-SGD must satisfy $\sigma \ge \frac{1}{\sqrt{2\ln M}}$ $\quad\text{or}\quad$ $\kappa \ge\ \frac{1}{\sqrt{8}}\!\left(1-\frac{1}{\sqrt{4\pi\ln M}}\right)$, and thus cannot simultaneously achieve strong privacy and high utility. Although this bound vanishes asymptotically as $M \to \infty$, the convergence is extremely slow: even for practically relevant numbers of updates the required noise magnitude remains substantial. We further show that the same limitation extends to Poisson subsampling up to constant factors. Our experiments confirm that the noise levels implied by this bound leads to significant accuracy degradation at realistic training settings, thus showing a critical bottleneck in DP-SGD under standard worst-case adversarial assumptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that under the f-DP framework, single-epoch shuffled DP-SGD with M updates cannot simultaneously achieve strong privacy and high utility. It derives an explicit suboptimal upper bound on the achievable trade-off curve that induces a geometric lower bound on the separation κ, yielding the requirement that σ ≥ 1/√(2 ln M) or κ ≥ (1/√8)(1 - 1/√(4π ln M)). The same limitation extends to Poisson subsampling up to constant factors, and experiments show significant accuracy degradation at the implied noise levels.

Significance. If the derived bound is tight, the work identifies a concrete bottleneck in DP-SGD under standard worst-case f-DP assumptions, with the slow asymptotic vanishing of the bound as M → ∞ having direct practical implications. The geometric interpretation via κ and the experimental validation of utility loss are positive contributions, but the explicit suboptimality of the upper bound limits the strength of the impossibility claim.

major comments (2)
  1. [Abstract and §3] Abstract and main theorem: the lower bound on κ is obtained by converting an explicitly suboptimal upper bound on the trade-off curve into a geometric separation. Because the paper states the upper bound is suboptimal without a matching lower bound or tightness verification for the shuffled mechanism, the induced κ threshold may be loose; the true curve could permit smaller κ (hence stronger privacy) at the same σ, so the claimed impossibility does not necessarily follow.
  2. [§3] §3 (derivation): the analysis assumes the standard worst-case adversarial model in f-DP applies directly to single-epoch shuffled sampling. The paper should verify whether the dependencies introduced by shuffling weaken this worst-case bound, as this assumption is load-bearing for converting the trade-off upper bound into the stated σ-or-κ requirement.
minor comments (1)
  1. [Abstract] The nested square-root expression for the κ bound in the abstract is difficult to parse at a glance; a parenthesized or multi-line rendering would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our paper. We address the major concerns below and have made revisions to clarify the scope of our results.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and main theorem: the lower bound on κ is obtained by converting an explicitly suboptimal upper bound on the trade-off curve into a geometric separation. Because the paper states the upper bound is suboptimal without a matching lower bound or tightness verification for the shuffled mechanism, the induced κ threshold may be loose; the true curve could permit smaller κ (hence stronger privacy) at the same σ, so the claimed impossibility does not necessarily follow.

    Authors: We acknowledge that the upper bound derived in the paper is explicitly suboptimal, and thus the resulting lower bound on κ is not necessarily tight. This provides a sufficient condition for the privacy-utility limitation but may overestimate the required separation. We have revised the abstract and Section 3 to more clearly state that the bound is conservative and that tighter analyses could potentially allow better trade-offs. Nevertheless, even this bound demonstrates substantial practical implications, as validated by the experiments showing accuracy degradation. revision: partial

  2. Referee: [§3] §3 (derivation): the analysis assumes the standard worst-case adversarial model in f-DP applies directly to single-epoch shuffled sampling. The paper should verify whether the dependencies introduced by shuffling weaken this worst-case bound, as this assumption is load-bearing for converting the trade-off upper bound into the stated σ-or-κ requirement.

    Authors: The f-DP trade-off curve is inherently defined for the worst-case neighboring datasets, and our upper bound applies to the shuffled DP-SGD mechanism as a whole. Shuffling introduces dependencies, but these do not invalidate the worst-case analysis; the bound holds regardless of the sampling order because it is based on the overall distribution of the mechanism output. We have added a paragraph in §3 to elaborate on why the standard model applies and that dependencies from shuffling are accounted for in the mechanism definition. A complete characterization of the exact trade-off curve for shuffling is beyond the scope of this work but would be a valuable direction for future research. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained from f-DP definitions

full rationale

The paper derives an explicit suboptimal upper bound on the achievable f-DP trade-off curve for single-epoch shuffled sampling directly from the hypothesis-testing characterization and worst-case adversarial model. This bound is then converted geometrically into a lower bound on separation κ, yielding the stated σ or κ threshold. No step reduces by construction to a fitted parameter renamed as prediction, a self-definition, or a load-bearing self-citation chain; the suboptimality is stated explicitly and the assumptions (standard f-DP model) are external. The result is therefore independent of its own outputs and receives score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper's claims rest on standard assumptions in the f-differential privacy literature and the definition of shuffled DP-SGD sampling; no new parameters are fitted or entities invented.

axioms (2)
  • domain assumption The f-differential privacy framework accurately captures privacy via hypothesis testing trade-offs
    Central to deriving the trade-off curve bound
  • domain assumption Shuffled sampling over single epoch with M updates follows standard DP-SGD procedure
    Used for the analysis of the mechanism

pith-pipeline@v0.9.0 · 5615 in / 1384 out tokens · 23231 ms · 2026-05-16T13:33:21.893173+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds

    cs.LG 2026-05 conditional novelty 7.0

    Tight closed-form bounds via Berry-Esseen show DP-SGD with random shuffling achieves near-ideal privacy (trade-off close to 1-a) for σ ≥ √(3/ln M) and large M, with δ linear in epochs restricting E to O(√M) and an asy...

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    Mart´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Man ´e, Rajat Monga, Sherry Moore, Derek Mu...

  2. [2]

    Goodfellow, H

    Mart´ın Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and Shai Halevi, editors,Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, O...

  3. [3]

    John M. Abowd. The u.s. census bureau adopts differential privacy. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Min- ing, KDD ’18, page 2867, New York, NY , USA, 2018. Association for Computing Machinery. ISBN 9781450355520. doi: 10.1145/3219819.3226070. URL https://doi.org/10.1145/ 3219819.3226070

  4. [4]

    Large-scale differentially private bert, 2021

    Rohan Anil, Badih Ghazi, Vineet Gupta, Ravi Kumar, and Pasin Manurangsi. Large-scale differentially private bert, 2021. URLhttps://arxiv.org/abs/2108.01624

  5. [5]

    Differential privacy has disparate impact on model accuracy

    Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. Differential privacy has disparate impact on model accuracy. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Flo- rence d’Alch´e-Buc, Emily B. Fox, and Roman Garnett, editors,Advances in Neural Information 20 Processing Systems 32: Annual Conference on Neural Information Processing Systems...

  6. [6]

    Bal, Dick H

    Henri E. Bal, Dick H. J. Epema, Cees de Laat, Rob van Nieuwpoort, John W. Romein, Frank J. Seinstra, Cees Snoek, and Harry A. G. Wijshoff. A medium-scale distributed system for computer science research: Infrastructure for the long term.Computer, 49(5):54–63, 2016. doi: 10.1109/MC.2016.127. URLhttps://doi.org/10.1109/MC.2016.127

  7. [7]

    Privacy amplification by subsampling: Tight analyses via couplings and divergences

    Borja Balle, Gilles Barthe, and Marco Gaboardi. Privacy amplification by subsampling: Tight analyses via couplings and divergences. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicol`o Cesa-Bianchi, and Roman Garnett, editors,Advances in Neural Infor- mation Processing Systems 31: Annual Conference on Neural Information Processing S...

  8. [8]

    JAX- Privacy: Algorithms for privacy-preserving machine learning in jax, 2025

    Borja Balle, Leonard Berrada, Zachary Charles, Christopher A Choquette-Choo, Soham De, Vadym Doroshenko, Dj Dvijotham, Andrew Galen, Arun Ganesh, Sahra Ghalebikesabi, Jamie Hayes, Peter Kairouz, Ryan McKenna, Brendan McMahan, Aneesh Pappu, Natalia Ponomareva, Mikhail Pravilov, Keith Rush, Samuel L Smith, and Robert Stanforth. JAX- Privacy: Algorithms for ...

  9. [9]

    Differentially pri- vate stochastic gradient descent with fixed-size minibatches: Tighter RDP guarantees with or without replacement

    Jeremiah Birrell, Reza Ebrahimi, Rouzbeh Behnia, and Jason Pacheco. Differentially pri- vate stochastic gradient descent with fixed-size minibatches: Tighter RDP guarantees with or without replacement. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information ...

  10. [11]

    URLhttps://arxiv.org/abs/2105.07985

  11. [12]

    JAX: composable transformations of Python+NumPy programs, 2018

    James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/jax-ml/jax

  12. [13]

    Zhiqi Bu, Jinshuo Dong, Qi Long, and Weijie J. Su. Deep learning with gaussian differential privacy.CoRR, abs/1911.11607, 2019. URLhttp://arxiv.org/abs/1911.11607

  13. [14]

    Gs-wgan: A gradient-sanitized approach for learning differentially private generators, 2021

    Dingfan Chen, Tribhuvanesh Orekondy, and Mario Fritz. Gs-wgan: A gradient-sanitized approach for learning differentially private generators, 2021. URL https://arxiv.org/abs/ 2006.08265

  14. [15]

    Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang. How private are DP-SGD implementations? In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 o...

  15. [16]

    URLhttps://proceedings.mlr.press/v235/chua24a.html

    PMLR. URLhttps://proceedings.mlr.press/v235/chua24a.html

  16. [17]

    Scalable DP-SGD: shuffling vs

    Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang. Scalable DP-SGD: shuffling vs. poisson subsampling. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tom- czak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Conference on Neu...

  17. [18]

    Smith, and Borja Balle

    Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, and Borja Balle. Unlocking high-accuracy differentially private image classification through scale, 2022. URL https: //arxiv.org/abs/2204.13650

  18. [19]

    , author Dong, W

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 248–255, Miami, FLorida, USA, 2009. IEEE Computer Society. doi: 10.1109/CVPR.2009.5206848. UR...

  19. [20]

    BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAAC...

  20. [21]

    Collecting telemetry data privately,

    Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately,

  21. [22]

    URLhttps://arxiv.org/abs/1712.01524

  22. [23]

    Differentially private diffusion models, 2023

    Tim Dockhorn, Tianshi Cao, Arash Vahdat, and Karsten Kreis. Differentially private diffusion models, 2023. URLhttps://arxiv.org/abs/2210.09929

  23. [24]

    Jinshuo Dong, Aaron Roth, and Weijie J. Su. Gaussian differential privacy.CoRR, abs/1905.02383, 2019. URLhttp://arxiv.org/abs/1905.02383

  24. [25]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In9th International Conference on Learning Representations, ICLR 2021, V...

  25. [26]

    A firm foundation for private data analysis.Commun

    Cynthia Dwork. A firm foundation for private data analysis.Commun. ACM, 54(1):86–95, 2011. doi: 10.1145/1866739.1866758. URLhttps://doi.org/10.1145/1866739.1866758

  26. [27]

    2014.The Algorithmic Foundations of Differential Privacy

    Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy.Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014. doi: 10.1561/0400000042. URL https: //doi.org/10.1561/0400000042

  27. [28]

    Our data, ourselves: Privacy via distributed noise generation,

    Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Serge Vaudenay, editor, Advances in Cryptology - EUROCRYPT 2006, 25th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28 - June 1, 2006,...

  28. [29]

    Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors,Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, volume 3876 ofLecture Notes in Computer Science, pages 265–284, New York...

  29. [30]

    The web never forgets: Persistent tracking mechanisms in the wild,

    ´Ulfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. InProceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS ’14, page 1054–1067, New York, NY , USA, 2014. 22 Association for Computing Machinery. ISBN 9781450329576. doi: 10.1145/2660267.2660348. U...

  30. [31]

    Amplification by shuffling: From local to central differential privacy via anonymity

    ´Ulfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In Timothy M. Chan, editor,Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, ...

  31. [32]

    URLhttps://doi.org/10.1137/1.9781611975482.151

  32. [33]

    Korhonen, A single-exponential time 2-approximation algorithm for treewidth, in: IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS 2021), 2022, pp

    Vitaly Feldman, Audra McMillan, and Kunal Talwar. Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling. In62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021, Denver, CO, USA, February 7-10, 2022, pages 954–964, Denver, CO, USA, 2021. IEEE. doi: 10.1109/FOCS52979.2021.00096. URL https://d...

  33. [34]

    Stronger privacy amplification by shuffling for renyi and approximate differential privacy

    Vitaly Feldman, Audra McMillan, and Kunal Talwar. Stronger privacy amplification by shuffling for renyi and approximate differential privacy. In Nikhil Bansal and Viswanath Nagarajan, editors,Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 4966–4981, Florence, Italy, 2023. SIAM. doi...

  34. [35]

    Ozdaglar, and Pablo A

    Mert G ¨urb¨uzbalaban, Asuman E. Ozdaglar, and Pablo A. Parrilo. Why random reshuffling beats stochastic gradient descent.Math. Program., 186(1):49–84, 2021. doi: 10.1007/ S10107-019-01440-W. URLhttps://doi.org/10.1007/s10107-019-01440-w

  35. [36]

    HaoChen and Suvrit Sra

    Jeff Z. HaoChen and Suvrit Sra. Random shuffling beats SGD after finite epochs. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 2624–2633, Long Beach, California, USA,

  36. [37]

    URLhttp://proceedings.mlr.press/v97/haochen19a.html

    PMLR. URLhttp://proceedings.mlr.press/v97/haochen19a.html

  37. [38]

    Exploring the limits of differentially private deep learning with group-wise clipping, 2022

    Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, and Jiang Bian. Exploring the limits of differentially private deep learning with group-wise clipping, 2022. URLhttps://arxiv.org/abs/2212.01539

  38. [39]

    Deep residual learning for image recognition,

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV , USA, June 27-30, 2016, pages 770–778, Las Vegas, NV , USA, 2016. IEEE Computer Society. doi: 10.1109/CVPR.2016.90. URL https://doi.org/10.1109/CVPR. 2016.90

  39. [40]

    Dp-nmt: Scalable differentially-private machine translation, 2024

    Timour Igamberdiev, Doan Nam Long Vu, Felix K ¨unnecke, Zhuo Yu, Jannik Holmer, and Ivan Habernal. Dp-nmt: Scalable differentially-private machine translation, 2024. URL https://arxiv.org/abs/2311.14465

  40. [41]

    Practical and private (deep) learning without sampling or shuffling

    Peter Kairouz, Brendan Mcmahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu. Practical and private (deep) learning without sampling or shuffling. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 5213–5225, Virtual, 18–24 ...

  41. [42]

    Computing tight differential privacy guar- antees using FFT

    Antti Koskela, Joonas J ¨alk¨o, and Antti Honkela. Computing tight differential privacy guar- antees using FFT. In Silvia Chiappa and Roberto Calandra, editors,The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], volume 108 ofProceedings of Machine Learning Research,...

  42. [43]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009. URLhttps://www.cs.utoronto.ca/ ~kriz/learning-features-2009-TR.pdf. 23

  43. [44]

    Toward training at imagenet scale with differential privacy

    Alexey Kurakin, Steve Chien, Shuang Song, Roxana Geambasu, Andreas Terzis, and Abhradeep Thakurta. Toward training at imagenet scale with differential privacy.CoRR, abs/2201.12328: 1–25, 2022. URLhttps://arxiv.org/abs/2201.12328

  44. [45]

    Datasets: A community library for natural language processing

    Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, Joe Davison, Mario ˇSaˇsko, Gunjan Chhablani, Bhavitvya Malik, Simon Brandeis, Teven Le Scao, Victor Sanh, Canwen Xu, Nicolas Patry, Angelina McMillan-Major, Philipp Schmid, Sylvain Gugge...

  45. [46]

    Dickerson

    Hao Liang, Wanrong Zhang, Xinlei He, Kaishun Wu, and Hong Xing. An improved privacy and utility analysis of differentially private SGD with bounded domain and smooth losses.CoRR, abs/2502.17772:1–19, 2025. doi: 10.48550/ARXIV .2502.17772. URL https://doi.org/10. 48550/arXiv.2502.17772

  46. [47]

    Choquette-Choo, Badih Ghazi, George Kaissis, Ravi Kumar, Ruibo Liu, Da Yu, and Chiyuan Zhang

    Ryan McKenna, Yangsibo Huang, Amer Sinha, Borja Balle, Zachary Charles, Christopher A. Choquette-Choo, Badih Ghazi, George Kaissis, Ravi Kumar, Ruibo Liu, Da Yu, and Chiyuan Zhang. Scaling laws for differentially private language models, 2025. URL https://arxiv. org/abs/2501.18914

  47. [48]

    On the accuracy of password strength meters,

    Sebastian Meiser and Esfandiar Mohammadi. Tight on budget?: Tight bounds for r-fold approximate differential privacy. In David Lie, Mohammad Mannan, Michael Backes, and XiaoFeng Wang, editors,Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, pages 247–264, Toronto, ON...

  48. [49]

    Convergence analysis of distributed stochastic gradient descent with shuffling.Neurocomputing, 337:46–57, 2019

    Qi Meng, Wei Chen, Yue Wang, Zhi-Ming Ma, and Tie-Yan Liu. Convergence analysis of distributed stochastic gradient descent with shuffling.Neurocomputing, 337:46–57, 2019. doi: 10.1016/J.NEUCOM.2019.01.037. URL https://doi.org/10.1016/j.neucom.2019.01. 037

  49. [50]

    URLhttp://dx.doi.org/10.1109/ CSF.2017.11

    Ilya Mironov. R ´enyi differential privacy. In30th IEEE Computer Security Foundations Symposium, CSF 2017, Santa Barbara, CA, USA, August 21-25, 2017, pages 263–275, Barbara, CA, USA, 2017. IEEE Computer Society. doi: 10.1109/CSF.2017.11. URL https://doi. org/10.1109/CSF.2017.11

  50. [51]

    arXiv preprint arXiv:1908.10530 (2019)

    Ilya Mironov, Kunal Talwar, and Li Zhang. R´enyi differential privacy of the sampled gaussian mechanism.CoRR, abs/1908.10530, 2019. URLhttp://arxiv.org/abs/1908.10530

  51. [52]

    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. InNIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011. URL http://ufldl.stanford. edu/housenumbers/nips2011_housenumbers.pdf

  52. [53]

    Nguyen, Quoc Tran-Dinh, Dzung T

    Lam M. Nguyen, Quoc Tran-Dinh, Dzung T. Phan, Phuong Ha Nguyen, and Marten van Dijk. A unified convergence analysis for shuffling-type gradient methods.J. Mach. Learn. Res., 22: 207:1–207:44, 2021. URLhttps://jmlr.org/papers/v22/20-1238.html

  53. [54]

    URLhttps://doi.org/10.1613/jair.1.14649

    Natalia Ponomareva, Hussein Hazimeh, Alex Kurakin, Zheng Xu, Carson Denison, H. Brendan McMahan, Sergei Vassilvitskii, Steve Chien, and Abhradeep Guha Thakurta. How to dp-fy ML: A practical guide to machine learning with differential privacy.J. Artif. Intell. Res., 77:1113– 1201, 2023. doi: 10.1613/JAIR.1.14649. URL https://doi.org/10.1613/jair.1.14649

  54. [55]

    LAION-5B: an open large-scale dataset for training next generation image-text models

    Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wight- man, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert 24 Kaczmarczyk, and Jenia Jitsev. LAION-5B: an open large-scale dataset for training next generation image-text mo...

  55. [56]

    Towards understanding the impact of model size on differential private classification.CoRR, abs/2111.13895:1–14, 2021

    Yinchen Shen, Zhiguo Wang, Ruoyu Sun, and Xiaojing Shen. Towards understanding the impact of model size on differential private classification.CoRR, abs/2111.13895:1–14, 2021. URLhttps://arxiv.org/abs/2111.13895

  56. [57]

    Amer Sinha, Thomas Mesnard, Ryan McKenna, Daogao Liu, Christopher A. Choquette- Choo, Yangsibo Huang, Da Yu, George Kaissis, Zachary Charles, Ruibo Liu, Lynn Chua, Pritish Kamath, Pasin Manurangsi, Steve He, Chiyuan Zhang, Badih Ghazi, Borja De Balle Pigem, Prem Eruvbetine, Tris Warkentin, Armand Joulin, and Ravi Kumar. Vaultgemma: A differentially privat...

  57. [58]

    Sommer, Sebastian Meiser, and Esfandiar Mohammadi

    David M. Sommer, Sebastian Meiser, and Esfandiar Mohammadi. Privacy loss classes: The cen- tral limit theorem in differential privacy.Proc. Priv. Enhancing Technol., 2019(2):245–269, 2019. doi: 10.2478/POPETS-2019-0029. URL https://doi.org/10.2478/popets-2019-0029

  58. [59]

    arXiv preprint arXiv:2401.04343 (2024)

    Xinyu Tang, Ashwinee Panda, Milad Nasr, Saeed Mahloujifar, and Prateek Mittal. Private fine-tuning of large language models with zeroth-order optimization, 2025. URL https: //arxiv.org/abs/2401.04343

  59. [60]

    TensorFlow Datasets, a collection of ready-to-use datasets

    TensorFlow. TensorFlow Datasets, a collection of ready-to-use datasets. https://www. tensorflow.org/datasets

  60. [61]

    Oseledets

    Nurislam Tursynbek, Aleksandr Petiushko, and Ivan V . Oseledets. Robustness threats of differential privacy.CoRR, abs/2012.07828:1–16, 2020. URL https://arxiv.org/abs/ 2012.07828

  61. [62]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V . N. Vishwanathan, and Roman Garnett, editors,Advances in Neural Information Processing Systems 30: Annual Conference...

  62. [63]

    Chendi Wang, Buxin Su, Jiayuan Ye, Reza Shokri, and Weijie J. Su. Unified en- hancement of privacy bounds for mixture mechanisms via f-differential privacy. In Al- ice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual Con- ference on Neural Information Pr...

  63. [64]

    PAC privacy: Automatic privacy measurement and control of data processing

    Hanshen Xiao and Srinivas Devadas. PAC privacy: Automatic privacy measurement and control of data processing. In Helena Handschuh and Anna Lysyanskaya, editors,Advances in Cryptology - CRYPTO 2023 - 43rd Annual International Cryptology Conference, CRYPTO 2023, Santa Barbara, CA, USA, August 20-24, 2023, Proceedings, Part II, volume 14082 ofLecture Notes i...

  64. [65]

    Opacus: User-friendly differential privacy library in pytorch

    Ashkan Yousefpour, Igor Shilov, Alexandre Sablayrolles, Davide Testuggine, Karthik Prasad, Mani Malek, John Nguyen, Sayan Ghosh, Akash Bharadwaj, Jessica Zhao, Graham Cormode, and Ilya Mironov. Opacus: User-friendly differential privacy library in pytorch.CoRR, abs/2109.12298, 2021. URLhttps://arxiv.org/abs/2109.12298. 25

  65. [66]

    Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, and Robert Sim

    Xiang Yue, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, and Robert Sim. Synthetic text generation with differential privacy: A simple and practical recipe, 2023. URLhttps://arxiv.org/abs/2210.14348

  66. [67]

    Root mean square layer normalization

    Biao Zhang and Rico Sennrich. Root mean square layer normalization. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alch ´e-Buc, Emily B. Fox, and Roman Garnett, editors,Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC...

  67. [68]

    Character-level convolutional net- works for text classification

    Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. Character-level convolutional net- works for text classification. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors,Advances in Neural Information Pro- cessing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montre...

  68. [69]

    P ´erez, Marten van Dijk, and Lydia Y

    Chaoyi Zhu, Jiayi Tang, Juan F. P ´erez, Marten van Dijk, and Lydia Y . Chen. DP-TLDM: differentially private tabular latent diffusion model. In Mila Dalla Preda, Sebastian Schrittwieser, Vincent Naessens, and Bjorn De Sutter, editors,Availability, Reliability and Security - 20th International Conference, ARES 2025, Ghent, Belgium, August 11-14, 2025, Pro...

  69. [70]

    doi: 10.1007/978-3-032-00624-0 \ 17

    Springer. doi: 10.1007/978-3-032-00624-0 \ 17. URL https://doi.org/10.1007/ 978-3-032-00624-0_17

  70. [71]

    Poission subsampled r´enyi differential privacy

    Yuqing Zhu and Yu-Xiang Wang. Poission subsampled r´enyi differential privacy. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 7634– 7642, ””, 09–15 Jun 2019. PMLR. URL https://proceedings.mlr.press/v97/zhu19c. html

  71. [72]

    Optimal accounting of differential privacy via characteristic function

    Yuqing Zhu, Jinshuo Dong, and Yu-Xiang Wang. Optimal accounting of differential privacy via characteristic function. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors,International Conference on Artificial Intelligence and Statistics, AISTATS 2022, 28-30 March 2022, Virtual Event, volume 151 ofProceedings of Machine Learning Research...

  72. [73]

    Figure 3 plots the fully explicit lower bound obtained by combining the Gaussian tail bound in Eq

    Figures 3 and 4 visualize the implications of Lemma F.1 and its asymptotic instantiation for the separation metric as the number of rounds per epoch M increases. Figure 3 plots the fully explicit lower bound obtained by combining the Gaussian tail bound in Eq. (48) with the explicit lower bound on the µ-GDP parameter derived in Eqs. (51)–(52) under the no...