Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD
Pith reviewed 2026-05-16 13:33 UTC · model grok-4.3
The pith
Shuffled DP-SGD cannot achieve strong privacy and high utility at once under standard worst-case analysis.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We analyze DP-SGD in the f-DP framework for shuffled sampling with M gradient updates and derive an explicit suboptimal upper bound on the achievable trade-off curve. This induces a geometric lower bound on the separation κ between the mechanism's curve and the random-guessing line. Consequently, either the noise multiplier σ satisfies σ ≥ 1/√(2 ln M) or κ ≥ (1/√8)(1 - 1/√(4π ln M)), showing that strong privacy and high utility cannot be achieved simultaneously.
What carries the argument
The separation κ, which measures the maximum vertical distance from the f-DP trade-off curve to the diagonal random-guessing line, serving as a proxy for adversarial advantage.
If this is right
- Shuffled DP-SGD requires σ at least 1/√(2 ln M) to achieve small κ.
- The same limitation applies to Poisson subsampling up to constant factors.
- For typical training with moderate M, the implied noise causes notable accuracy loss.
- As M increases the bound decreases but does so too slowly for practical relief.
Where Pith is reading between the lines
- Practitioners may need to adopt relaxed privacy notions or non-shuffled methods to bypass this bound.
- The slow asymptotic improvement suggests rethinking single-epoch assumptions in private training.
- This bound could inform minimum noise levels in DP-SGD to ensure theoretical privacy guarantees.
Load-bearing premise
The standard worst-case adversarial model in the f-DP framework applies directly without modification to the single-epoch shuffled sampling process.
What would settle it
Running shuffled DP-SGD with σ below the bound and verifying whether the observed trade-off curve separation κ stays below the predicted lower bound would falsify the claim.
Figures
read the original abstract
Differentially Private Stochastic Gradient Descent (DP-SGD) is the dominant paradigm for private training, but its fundamental limitations under worst-case adversarial privacy definitions remain poorly understood. We analyze DP-SGD in the $f$-differential privacy framework, which characterizes privacy via hypothesis-testing trade-off curves, and study shuffled sampling over a single epoch with $M$ gradient updates. We derive an explicit suboptimal upper bound on the achievable trade-off curve. This result induces a geometric lower bound on the separation $\kappa$ which is the maximum distance between the mechanism's trade-off curve and the ideal random-guessing line. Because a large separation implies significant adversarial advantage, meaningful privacy requires small $\kappa$. However, we prove that enforcing a small separation imposes a strict lower bound on the Gaussian noise multiplier $\sigma$, which directly limits the achievable utility. In particular, under the standard worst-case adversarial model, shuffled DP-SGD must satisfy $\sigma \ge \frac{1}{\sqrt{2\ln M}}$ $\quad\text{or}\quad$ $\kappa \ge\ \frac{1}{\sqrt{8}}\!\left(1-\frac{1}{\sqrt{4\pi\ln M}}\right)$, and thus cannot simultaneously achieve strong privacy and high utility. Although this bound vanishes asymptotically as $M \to \infty$, the convergence is extremely slow: even for practically relevant numbers of updates the required noise magnitude remains substantial. We further show that the same limitation extends to Poisson subsampling up to constant factors. Our experiments confirm that the noise levels implied by this bound leads to significant accuracy degradation at realistic training settings, thus showing a critical bottleneck in DP-SGD under standard worst-case adversarial assumptions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that under the f-DP framework, single-epoch shuffled DP-SGD with M updates cannot simultaneously achieve strong privacy and high utility. It derives an explicit suboptimal upper bound on the achievable trade-off curve that induces a geometric lower bound on the separation κ, yielding the requirement that σ ≥ 1/√(2 ln M) or κ ≥ (1/√8)(1 - 1/√(4π ln M)). The same limitation extends to Poisson subsampling up to constant factors, and experiments show significant accuracy degradation at the implied noise levels.
Significance. If the derived bound is tight, the work identifies a concrete bottleneck in DP-SGD under standard worst-case f-DP assumptions, with the slow asymptotic vanishing of the bound as M → ∞ having direct practical implications. The geometric interpretation via κ and the experimental validation of utility loss are positive contributions, but the explicit suboptimality of the upper bound limits the strength of the impossibility claim.
major comments (2)
- [Abstract and §3] Abstract and main theorem: the lower bound on κ is obtained by converting an explicitly suboptimal upper bound on the trade-off curve into a geometric separation. Because the paper states the upper bound is suboptimal without a matching lower bound or tightness verification for the shuffled mechanism, the induced κ threshold may be loose; the true curve could permit smaller κ (hence stronger privacy) at the same σ, so the claimed impossibility does not necessarily follow.
- [§3] §3 (derivation): the analysis assumes the standard worst-case adversarial model in f-DP applies directly to single-epoch shuffled sampling. The paper should verify whether the dependencies introduced by shuffling weaken this worst-case bound, as this assumption is load-bearing for converting the trade-off upper bound into the stated σ-or-κ requirement.
minor comments (1)
- [Abstract] The nested square-root expression for the κ bound in the abstract is difficult to parse at a glance; a parenthesized or multi-line rendering would improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our paper. We address the major concerns below and have made revisions to clarify the scope of our results.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and main theorem: the lower bound on κ is obtained by converting an explicitly suboptimal upper bound on the trade-off curve into a geometric separation. Because the paper states the upper bound is suboptimal without a matching lower bound or tightness verification for the shuffled mechanism, the induced κ threshold may be loose; the true curve could permit smaller κ (hence stronger privacy) at the same σ, so the claimed impossibility does not necessarily follow.
Authors: We acknowledge that the upper bound derived in the paper is explicitly suboptimal, and thus the resulting lower bound on κ is not necessarily tight. This provides a sufficient condition for the privacy-utility limitation but may overestimate the required separation. We have revised the abstract and Section 3 to more clearly state that the bound is conservative and that tighter analyses could potentially allow better trade-offs. Nevertheless, even this bound demonstrates substantial practical implications, as validated by the experiments showing accuracy degradation. revision: partial
-
Referee: [§3] §3 (derivation): the analysis assumes the standard worst-case adversarial model in f-DP applies directly to single-epoch shuffled sampling. The paper should verify whether the dependencies introduced by shuffling weaken this worst-case bound, as this assumption is load-bearing for converting the trade-off upper bound into the stated σ-or-κ requirement.
Authors: The f-DP trade-off curve is inherently defined for the worst-case neighboring datasets, and our upper bound applies to the shuffled DP-SGD mechanism as a whole. Shuffling introduces dependencies, but these do not invalidate the worst-case analysis; the bound holds regardless of the sampling order because it is based on the overall distribution of the mechanism output. We have added a paragraph in §3 to elaborate on why the standard model applies and that dependencies from shuffling are accounted for in the mechanism definition. A complete characterization of the exact trade-off curve for shuffling is beyond the scope of this work but would be a valuable direction for future research. revision: yes
Circularity Check
No significant circularity; derivation self-contained from f-DP definitions
full rationale
The paper derives an explicit suboptimal upper bound on the achievable f-DP trade-off curve for single-epoch shuffled sampling directly from the hypothesis-testing characterization and worst-case adversarial model. This bound is then converted geometrically into a lower bound on separation κ, yielding the stated σ or κ threshold. No step reduces by construction to a fitted parameter renamed as prediction, a self-definition, or a load-bearing self-citation chain; the suboptimality is stated explicitly and the assumptions (standard f-DP model) are external. The result is therefore independent of its own outputs and receives score 0.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The f-differential privacy framework accurately captures privacy via hypothesis testing trade-offs
- domain assumption Shuffled sampling over single epoch with M updates follows standard DP-SGD procedure
Forward citations
Cited by 1 Pith paper
-
Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds
Tight closed-form bounds via Berry-Esseen show DP-SGD with random shuffling achieves near-ideal privacy (trade-off close to 1-a) for σ ≥ √(3/ln M) and large M, with δ linear in epochs restricting E to O(√M) and an asy...
Reference graph
Works this paper leans on
-
[1]
Mart´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Man ´e, Rajat Monga, Sherry Moore, Derek Mu...
work page 2015
-
[2]
Mart´ın Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and Shai Halevi, editors,Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, O...
-
[3]
John M. Abowd. The u.s. census bureau adopts differential privacy. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Min- ing, KDD ’18, page 2867, New York, NY , USA, 2018. Association for Computing Machinery. ISBN 9781450355520. doi: 10.1145/3219819.3226070. URL https://doi.org/10.1145/ 3219819.3226070
-
[4]
Large-scale differentially private bert, 2021
Rohan Anil, Badih Ghazi, Vineet Gupta, Ravi Kumar, and Pasin Manurangsi. Large-scale differentially private bert, 2021. URLhttps://arxiv.org/abs/2108.01624
-
[5]
Differential privacy has disparate impact on model accuracy
Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. Differential privacy has disparate impact on model accuracy. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Flo- rence d’Alch´e-Buc, Emily B. Fox, and Roman Garnett, editors,Advances in Neural Information 20 Processing Systems 32: Annual Conference on Neural Information Processing Systems...
work page 2019
-
[6]
Henri E. Bal, Dick H. J. Epema, Cees de Laat, Rob van Nieuwpoort, John W. Romein, Frank J. Seinstra, Cees Snoek, and Harry A. G. Wijshoff. A medium-scale distributed system for computer science research: Infrastructure for the long term.Computer, 49(5):54–63, 2016. doi: 10.1109/MC.2016.127. URLhttps://doi.org/10.1109/MC.2016.127
-
[7]
Privacy amplification by subsampling: Tight analyses via couplings and divergences
Borja Balle, Gilles Barthe, and Marco Gaboardi. Privacy amplification by subsampling: Tight analyses via couplings and divergences. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicol`o Cesa-Bianchi, and Roman Garnett, editors,Advances in Neural Infor- mation Processing Systems 31: Annual Conference on Neural Information Processing S...
work page 2018
-
[8]
JAX- Privacy: Algorithms for privacy-preserving machine learning in jax, 2025
Borja Balle, Leonard Berrada, Zachary Charles, Christopher A Choquette-Choo, Soham De, Vadym Doroshenko, Dj Dvijotham, Andrew Galen, Arun Ganesh, Sahra Ghalebikesabi, Jamie Hayes, Peter Kairouz, Ryan McKenna, Brendan McMahan, Aneesh Pappu, Natalia Ponomareva, Mikhail Pravilov, Keith Rush, Samuel L Smith, and Robert Stanforth. JAX- Privacy: Algorithms for ...
work page 2025
-
[9]
Jeremiah Birrell, Reza Ebrahimi, Rouzbeh Behnia, and Jason Pacheco. Differentially pri- vate stochastic gradient descent with fixed-size minibatches: Tighter RDP guarantees with or without replacement. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information ...
work page 2024
- [11]
-
[12]
JAX: composable transformations of Python+NumPy programs, 2018
James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/jax-ml/jax
work page 2018
- [13]
-
[14]
Gs-wgan: A gradient-sanitized approach for learning differentially private generators, 2021
Dingfan Chen, Tribhuvanesh Orekondy, and Mario Fritz. Gs-wgan: A gradient-sanitized approach for learning differentially private generators, 2021. URL https://arxiv.org/abs/ 2006.08265
-
[15]
Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang. How private are DP-SGD implementations? In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 o...
-
[16]
URLhttps://proceedings.mlr.press/v235/chua24a.html
PMLR. URLhttps://proceedings.mlr.press/v235/chua24a.html
-
[17]
Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang. Scalable DP-SGD: shuffling vs. poisson subsampling. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tom- czak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Conference on Neu...
work page 2024
-
[18]
Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, and Borja Balle. Unlocking high-accuracy differentially private image classification through scale, 2022. URL https: //arxiv.org/abs/2204.13650
-
[19]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 248–255, Miami, FLorida, USA, 2009. IEEE Computer Society. doi: 10.1109/CVPR.2009.5206848. UR...
-
[20]
BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAAC...
-
[21]
Collecting telemetry data privately,
Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately,
-
[22]
URLhttps://arxiv.org/abs/1712.01524
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
Differentially private diffusion models, 2023
Tim Dockhorn, Tianshi Cao, Arash Vahdat, and Karsten Kreis. Differentially private diffusion models, 2023. URLhttps://arxiv.org/abs/2210.09929
-
[24]
Jinshuo Dong, Aaron Roth, and Weijie J. Su. Gaussian differential privacy.CoRR, abs/1905.02383, 2019. URLhttp://arxiv.org/abs/1905.02383
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[25]
An image is worth 16x16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In9th International Conference on Learning Representations, ICLR 2021, V...
work page 2021
-
[26]
A firm foundation for private data analysis.Commun
Cynthia Dwork. A firm foundation for private data analysis.Commun. ACM, 54(1):86–95, 2011. doi: 10.1145/1866739.1866758. URLhttps://doi.org/10.1145/1866739.1866758
-
[27]
2014.The Algorithmic Foundations of Differential Privacy
Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy.Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014. doi: 10.1561/0400000042. URL https: //doi.org/10.1561/0400000042
-
[28]
Our data, ourselves: Privacy via distributed noise generation,
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Serge Vaudenay, editor, Advances in Cryptology - EUROCRYPT 2006, 25th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28 - June 1, 2006,...
-
[29]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors,Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, volume 3876 ofLecture Notes in Computer Science, pages 265–284, New York...
-
[30]
The web never forgets: Persistent tracking mechanisms in the wild,
´Ulfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. InProceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS ’14, page 1054–1067, New York, NY , USA, 2014. 22 Association for Computing Machinery. ISBN 9781450329576. doi: 10.1145/2660267.2660348. U...
-
[31]
Amplification by shuffling: From local to central differential privacy via anonymity
´Ulfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In Timothy M. Chan, editor,Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, ...
-
[32]
URLhttps://doi.org/10.1137/1.9781611975482.151
-
[33]
Vitaly Feldman, Audra McMillan, and Kunal Talwar. Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling. In62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021, Denver, CO, USA, February 7-10, 2022, pages 954–964, Denver, CO, USA, 2021. IEEE. doi: 10.1109/FOCS52979.2021.00096. URL https://d...
-
[34]
Stronger privacy amplification by shuffling for renyi and approximate differential privacy
Vitaly Feldman, Audra McMillan, and Kunal Talwar. Stronger privacy amplification by shuffling for renyi and approximate differential privacy. In Nikhil Bansal and Viswanath Nagarajan, editors,Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 4966–4981, Florence, Italy, 2023. SIAM. doi...
-
[35]
Mert G ¨urb¨uzbalaban, Asuman E. Ozdaglar, and Pablo A. Parrilo. Why random reshuffling beats stochastic gradient descent.Math. Program., 186(1):49–84, 2021. doi: 10.1007/ S10107-019-01440-W. URLhttps://doi.org/10.1007/s10107-019-01440-w
-
[36]
Jeff Z. HaoChen and Suvrit Sra. Random shuffling beats SGD after finite epochs. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 2624–2633, Long Beach, California, USA,
work page 2019
-
[37]
URLhttp://proceedings.mlr.press/v97/haochen19a.html
PMLR. URLhttp://proceedings.mlr.press/v97/haochen19a.html
-
[38]
Exploring the limits of differentially private deep learning with group-wise clipping, 2022
Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, and Jiang Bian. Exploring the limits of differentially private deep learning with group-wise clipping, 2022. URLhttps://arxiv.org/abs/2212.01539
-
[39]
Deep residual learning for image recognition,
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV , USA, June 27-30, 2016, pages 770–778, Las Vegas, NV , USA, 2016. IEEE Computer Society. doi: 10.1109/CVPR.2016.90. URL https://doi.org/10.1109/CVPR. 2016.90
-
[40]
Dp-nmt: Scalable differentially-private machine translation, 2024
Timour Igamberdiev, Doan Nam Long Vu, Felix K ¨unnecke, Zhuo Yu, Jannik Holmer, and Ivan Habernal. Dp-nmt: Scalable differentially-private machine translation, 2024. URL https://arxiv.org/abs/2311.14465
-
[41]
Practical and private (deep) learning without sampling or shuffling
Peter Kairouz, Brendan Mcmahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu. Practical and private (deep) learning without sampling or shuffling. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 5213–5225, Virtual, 18–24 ...
work page 2021
-
[42]
Computing tight differential privacy guar- antees using FFT
Antti Koskela, Joonas J ¨alk¨o, and Antti Honkela. Computing tight differential privacy guar- antees using FFT. In Silvia Chiappa and Roberto Calandra, editors,The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], volume 108 ofProceedings of Machine Learning Research,...
work page 2020
-
[43]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009. URLhttps://www.cs.utoronto.ca/ ~kriz/learning-features-2009-TR.pdf. 23
work page 2009
-
[44]
Toward training at imagenet scale with differential privacy
Alexey Kurakin, Steve Chien, Shuang Song, Roxana Geambasu, Andreas Terzis, and Abhradeep Thakurta. Toward training at imagenet scale with differential privacy.CoRR, abs/2201.12328: 1–25, 2022. URLhttps://arxiv.org/abs/2201.12328
-
[45]
Datasets: A community library for natural language processing
Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, Joe Davison, Mario ˇSaˇsko, Gunjan Chhablani, Bhavitvya Malik, Simon Brandeis, Teven Le Scao, Victor Sanh, Canwen Xu, Nicolas Patry, Angelina McMillan-Major, Philipp Schmid, Sylvain Gugge...
work page 2021
-
[46]
Hao Liang, Wanrong Zhang, Xinlei He, Kaishun Wu, and Hong Xing. An improved privacy and utility analysis of differentially private SGD with bounded domain and smooth losses.CoRR, abs/2502.17772:1–19, 2025. doi: 10.48550/ARXIV .2502.17772. URL https://doi.org/10. 48550/arXiv.2502.17772
work page internal anchor Pith review doi:10.48550/arxiv 2025
-
[47]
Choquette-Choo, Badih Ghazi, George Kaissis, Ravi Kumar, Ruibo Liu, Da Yu, and Chiyuan Zhang
Ryan McKenna, Yangsibo Huang, Amer Sinha, Borja Balle, Zachary Charles, Christopher A. Choquette-Choo, Badih Ghazi, George Kaissis, Ravi Kumar, Ruibo Liu, Da Yu, and Chiyuan Zhang. Scaling laws for differentially private language models, 2025. URL https://arxiv. org/abs/2501.18914
-
[48]
On the accuracy of password strength meters,
Sebastian Meiser and Esfandiar Mohammadi. Tight on budget?: Tight bounds for r-fold approximate differential privacy. In David Lie, Mohammad Mannan, Michael Backes, and XiaoFeng Wang, editors,Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, pages 247–264, Toronto, ON...
-
[49]
Qi Meng, Wei Chen, Yue Wang, Zhi-Ming Ma, and Tie-Yan Liu. Convergence analysis of distributed stochastic gradient descent with shuffling.Neurocomputing, 337:46–57, 2019. doi: 10.1016/J.NEUCOM.2019.01.037. URL https://doi.org/10.1016/j.neucom.2019.01. 037
-
[50]
URLhttp://dx.doi.org/10.1109/ CSF.2017.11
Ilya Mironov. R ´enyi differential privacy. In30th IEEE Computer Security Foundations Symposium, CSF 2017, Santa Barbara, CA, USA, August 21-25, 2017, pages 263–275, Barbara, CA, USA, 2017. IEEE Computer Society. doi: 10.1109/CSF.2017.11. URL https://doi. org/10.1109/CSF.2017.11
-
[51]
arXiv preprint arXiv:1908.10530 (2019)
Ilya Mironov, Kunal Talwar, and Li Zhang. R´enyi differential privacy of the sampled gaussian mechanism.CoRR, abs/1908.10530, 2019. URLhttp://arxiv.org/abs/1908.10530
-
[52]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. InNIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011. URL http://ufldl.stanford. edu/housenumbers/nips2011_housenumbers.pdf
work page 2011
-
[53]
Nguyen, Quoc Tran-Dinh, Dzung T
Lam M. Nguyen, Quoc Tran-Dinh, Dzung T. Phan, Phuong Ha Nguyen, and Marten van Dijk. A unified convergence analysis for shuffling-type gradient methods.J. Mach. Learn. Res., 22: 207:1–207:44, 2021. URLhttps://jmlr.org/papers/v22/20-1238.html
work page 2021
-
[54]
URLhttps://doi.org/10.1613/jair.1.14649
Natalia Ponomareva, Hussein Hazimeh, Alex Kurakin, Zheng Xu, Carson Denison, H. Brendan McMahan, Sergei Vassilvitskii, Steve Chien, and Abhradeep Guha Thakurta. How to dp-fy ML: A practical guide to machine learning with differential privacy.J. Artif. Intell. Res., 77:1113– 1201, 2023. doi: 10.1613/JAIR.1.14649. URL https://doi.org/10.1613/jair.1.14649
-
[55]
LAION-5B: an open large-scale dataset for training next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wight- man, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert 24 Kaczmarczyk, and Jenia Jitsev. LAION-5B: an open large-scale dataset for training next generation image-text mo...
work page 2022
-
[56]
Yinchen Shen, Zhiguo Wang, Ruoyu Sun, and Xiaojing Shen. Towards understanding the impact of model size on differential private classification.CoRR, abs/2111.13895:1–14, 2021. URLhttps://arxiv.org/abs/2111.13895
-
[57]
Amer Sinha, Thomas Mesnard, Ryan McKenna, Daogao Liu, Christopher A. Choquette- Choo, Yangsibo Huang, Da Yu, George Kaissis, Zachary Charles, Ruibo Liu, Lynn Chua, Pritish Kamath, Pasin Manurangsi, Steve He, Chiyuan Zhang, Badih Ghazi, Borja De Balle Pigem, Prem Eruvbetine, Tris Warkentin, Armand Joulin, and Ravi Kumar. Vaultgemma: A differentially privat...
-
[58]
Sommer, Sebastian Meiser, and Esfandiar Mohammadi
David M. Sommer, Sebastian Meiser, and Esfandiar Mohammadi. Privacy loss classes: The cen- tral limit theorem in differential privacy.Proc. Priv. Enhancing Technol., 2019(2):245–269, 2019. doi: 10.2478/POPETS-2019-0029. URL https://doi.org/10.2478/popets-2019-0029
-
[59]
arXiv preprint arXiv:2401.04343 (2024)
Xinyu Tang, Ashwinee Panda, Milad Nasr, Saeed Mahloujifar, and Prateek Mittal. Private fine-tuning of large language models with zeroth-order optimization, 2025. URL https: //arxiv.org/abs/2401.04343
-
[60]
TensorFlow Datasets, a collection of ready-to-use datasets
TensorFlow. TensorFlow Datasets, a collection of ready-to-use datasets. https://www. tensorflow.org/datasets
- [61]
-
[62]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V . N. Vishwanathan, and Roman Garnett, editors,Advances in Neural Information Processing Systems 30: Annual Conference...
work page 2017
-
[63]
Chendi Wang, Buxin Su, Jiayuan Ye, Reza Shokri, and Weijie J. Su. Unified en- hancement of privacy bounds for mixture mechanisms via f-differential privacy. In Al- ice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual Con- ference on Neural Information Pr...
work page 2023
-
[64]
PAC privacy: Automatic privacy measurement and control of data processing
Hanshen Xiao and Srinivas Devadas. PAC privacy: Automatic privacy measurement and control of data processing. In Helena Handschuh and Anna Lysyanskaya, editors,Advances in Cryptology - CRYPTO 2023 - 43rd Annual International Cryptology Conference, CRYPTO 2023, Santa Barbara, CA, USA, August 20-24, 2023, Proceedings, Part II, volume 14082 ofLecture Notes i...
-
[65]
Opacus: User-friendly differential privacy library in pytorch
Ashkan Yousefpour, Igor Shilov, Alexandre Sablayrolles, Davide Testuggine, Karthik Prasad, Mani Malek, John Nguyen, Sayan Ghosh, Akash Bharadwaj, Jessica Zhao, Graham Cormode, and Ilya Mironov. Opacus: User-friendly differential privacy library in pytorch.CoRR, abs/2109.12298, 2021. URLhttps://arxiv.org/abs/2109.12298. 25
-
[66]
Xiang Yue, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, and Robert Sim. Synthetic text generation with differential privacy: A simple and practical recipe, 2023. URLhttps://arxiv.org/abs/2210.14348
-
[67]
Root mean square layer normalization
Biao Zhang and Rico Sennrich. Root mean square layer normalization. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alch ´e-Buc, Emily B. Fox, and Roman Garnett, editors,Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC...
work page 2019
-
[68]
Character-level convolutional net- works for text classification
Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. Character-level convolutional net- works for text classification. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors,Advances in Neural Information Pro- cessing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montre...
work page 2015
-
[69]
P ´erez, Marten van Dijk, and Lydia Y
Chaoyi Zhu, Jiayi Tang, Juan F. P ´erez, Marten van Dijk, and Lydia Y . Chen. DP-TLDM: differentially private tabular latent diffusion model. In Mila Dalla Preda, Sebastian Schrittwieser, Vincent Naessens, and Bjorn De Sutter, editors,Availability, Reliability and Security - 20th International Conference, ARES 2025, Ghent, Belgium, August 11-14, 2025, Pro...
work page 2025
-
[70]
doi: 10.1007/978-3-032-00624-0 \ 17
Springer. doi: 10.1007/978-3-032-00624-0 \ 17. URL https://doi.org/10.1007/ 978-3-032-00624-0_17
-
[71]
Poission subsampled r´enyi differential privacy
Yuqing Zhu and Yu-Xiang Wang. Poission subsampled r´enyi differential privacy. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 7634– 7642, ””, 09–15 Jun 2019. PMLR. URL https://proceedings.mlr.press/v97/zhu19c. html
work page 2019
-
[72]
Optimal accounting of differential privacy via characteristic function
Yuqing Zhu, Jinshuo Dong, and Yu-Xiang Wang. Optimal accounting of differential privacy via characteristic function. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors,International Conference on Artificial Intelligence and Statistics, AISTATS 2022, 28-30 March 2022, Virtual Event, volume 151 ofProceedings of Machine Learning Research...
work page 2022
-
[73]
Figure 3 plots the fully explicit lower bound obtained by combining the Gaussian tail bound in Eq
Figures 3 and 4 visualize the implications of Lemma F.1 and its asymptotic instantiation for the separation metric as the number of rounds per epoch M increases. Figure 3 plots the fully explicit lower bound obtained by combining the Gaussian tail bound in Eq. (48) with the explicit lower bound on the µ-GDP parameter derived in Eqs. (51)–(52) under the no...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.