pith. sign in

arxiv: 2606.04384 · v1 · pith:OMOUB3EFnew · submitted 2026-06-03 · 💻 cs.LG · cs.CR· stat.ML

Revisiting Privacy Amplification by Subsampling in Selective Release DPSGD

Pith reviewed 2026-06-28 06:54 UTC · model grok-4.3

classification 💻 cs.LG cs.CRstat.ML
keywords differential privacyDPSGDselective releaseprivacy amplificationsubsamplinggradient clippingmachine learning
0
0 comments X

The pith

Correcting the privacy analysis for selective release in DPSGD produces rigorous guarantees and high model utility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that prior work on differentially private selective update and release overlooked how the selective mechanism changes the effective sampling probability, weakening its privacy claims. It re-derives the privacy analysis to account for that variation and introduces DPSR-CG, which performs selective release on the basis of clipped gradients. Experiments across MNIST, CIFAR-10, IMDB and FMNIST show the new method meets its stated privacy bounds while delivering better accuracy and faster convergence than standard DPSGD. A sympathetic reader would care because the result suggests one can avoid the usual heavy utility penalty of noise injection without giving up formal privacy.

Core claim

The selective release mechanism alters the per-step sampling probability in a way that prior accounting ignored; by grounding selective release in clipped gradients and applying a fresh privacy analysis that incorporates this variation, DPSR-CG satisfies strict differential privacy while preserving substantially more model utility than either vanilla DPSGD or the earlier DPSUR algorithm.

What carries the argument

DPSR-CG mechanism, which triggers release only on sufficiently large clipped gradients and uses a privacy bound that explicitly tracks the resulting change in sampling probability.

If this is right

  • DPSR-CG satisfies strict differential privacy where DPSUR's accounting was incomplete.
  • Model accuracy and convergence speed improve on image and text classification tasks without relaxing the privacy target.
  • Gradient clipping becomes the explicit trigger for both noise addition and selective release.
  • The corrected analysis restores the validity of privacy amplification by subsampling in this adaptive setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar sampling-probability shifts may appear in other adaptive or selective private-training schemes and would require parallel re-analysis.
  • If the new bound is tight, it could guide the design of release thresholds that trade privacy loss against utility more precisely.
  • The approach suggests that utility gains in private SGD often come from reducing the number of noisy steps rather than from reducing noise magnitude alone.

Load-bearing premise

The new privacy analysis correctly captures every effect the selective release step has on the overall sampling distribution.

What would settle it

An empirical privacy audit or membership-inference attack on models trained with DPSR-CG that either violates the claimed epsilon bound or shows no utility gain over DPSUR under identical privacy parameters.

Figures

Figures reproduced from arXiv: 2606.04384 by Fang Xie, Xiaobo Huang.

Figure 1
Figure 1. Figure 1: Workflow of DPSUR. 3.2 Review of Privacy Accounting in DPSUR Before analyzing the theoretical limitations of existing methods, it is essential to review the baseline privacy accounting mechanism employed by the selective release paradigm, specifically DPSUR. DPSUR optimizes the overall privacy budget by leveraging the Gaussian Mechanism with Selective Release. The core premise of their accounting framework… view at source ↗
Figure 2
Figure 2. Figure 2: Workflow of the DPSR-CG. The Validator Module [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Proportion of Good/Bad Accept and Rejection in Accept and Rejection set of IMDB dataset [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Proportion of Good/Bad Accept and Rejection in Accept and Rejection set of FMNIST dataset [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Machine learning's reliance on sensitive data necessitates privacy-preserving techniques like Differentially Private Stochastic Gradient Descent (DPSGD). However, DPSGD suffers from substantial utility degradation and slow convergence due to gradient clipping and noise injection. Prior works have attempted to improve DPSGD from various perspectives; notably, the Differentially Private Selective Update and Release (DPSUR) algorithm has achieved remarkable model utility. However, the privacy accounting in DPSUR overlooks the variation in sampling probability introduced by the selective release mechanism, which compromises the rigor of its privacy guarantees. To address these limitations, we re-evaluate the privacy analysis of the selective release mechanism and propose a novel algorithm: Differentially Private Selective Release based on Clipped Gradients (DPSR-CG). Through a rigorous, newly derived privacy analysis and extensive experiments on multiple datasets (MNIST, CIFAR-10, IMDB, and FMNIST), we demonstrate that our DPSR-CG mechanism maintains strict privacy guarantees while achieving exceptional model performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript identifies an oversight in the privacy accounting of DPSUR, which fails to account for data-dependent variation in sampling probability caused by the selective release mechanism after gradient clipping. It proposes the DPSR-CG algorithm with a newly derived privacy analysis intended to restore strict (ε,δ)-DP guarantees, and reports strong empirical results across MNIST, CIFAR-10, IMDB, and FMNIST.

Significance. If the new privacy analysis is correct and handles the dependence between clipped gradient norms and release decisions, the work would strengthen the theoretical foundation for selective-release variants of DPSGD and could enable higher-utility private training. The multi-dataset experimental evaluation is a positive feature when accompanied by proper baselines.

major comments (2)
  1. [Abstract and privacy analysis] Abstract and privacy analysis (likely §3–4): The central claim rests on a 'rigorous, newly derived privacy analysis' that corrects DPSUR by addressing variation in sampling probability. However, no derivation steps, key equations, theorem statements, or proof sketches are visible, preventing verification that the bound properly extends subsampling amplification to the data-dependent case via coupling or privacy-loss random variables.
  2. [§4 (privacy theorem)] §4 (or equivalent privacy theorem): Standard privacy amplification by subsampling assumes fixed, data-independent probabilities. The selective release decision depends on the clipped gradient norm, inducing dependence; without an explicit argument showing how the new bound accounts for this (or why it does not require an extra term), the claimed strict (ε,δ)-DP guarantee remains unverified and load-bearing for the paper's contribution.
minor comments (1)
  1. [Abstract] The abstract asserts 'exceptional model performance' and 'extensive experiments' but supplies no quantitative results, tables, baselines (e.g., vs. DPSUR or standard DPSGD), privacy parameters (ε,δ), or error bars, making it impossible to assess the empirical claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful review and for identifying the need for greater clarity around our privacy analysis. We address the two major comments point by point below. We will revise the manuscript to make the derivation fully explicit.

read point-by-point responses
  1. Referee: [Abstract and privacy analysis] Abstract and privacy analysis (likely §3–4): The central claim rests on a 'rigorous, newly derived privacy analysis' that corrects DPSUR by addressing variation in sampling probability. However, no derivation steps, key equations, theorem statements, or proof sketches are visible, preventing verification that the bound properly extends subsampling amplification to the data-dependent case via coupling or privacy-loss random variables.

    Authors: We agree that the current presentation does not make the derivation steps sufficiently visible. The analysis appears in §4, but we will add an explicit theorem statement, the key equations for the privacy-loss random variables, and a proof sketch that shows the coupling argument used to extend subsampling amplification to the data-dependent sampling probabilities induced by selective release after clipping. revision: yes

  2. Referee: [§4 (privacy theorem)] §4 (or equivalent privacy theorem): Standard privacy amplification by subsampling assumes fixed, data-independent probabilities. The selective release decision depends on the clipped gradient norm, inducing dependence; without an explicit argument showing how the new bound accounts for this (or why it does not require an extra term), the claimed strict (ε,δ)-DP guarantee remains unverified and load-bearing for the paper's contribution.

    Authors: The referee correctly notes the challenge posed by data-dependent probabilities. Our §4 analysis derives an adjusted bound that incorporates the dependence between clipped gradient norms and release decisions. In the revision we will expand this section with the explicit argument (via privacy-loss random variables) demonstrating that the bound accounts for the dependence without requiring an additional term, thereby verifying the claimed (ε,δ)-DP guarantee. revision: yes

Circularity Check

0 steps flagged

No circularity: new privacy analysis presented as independent derivation without reduction to fitted inputs or self-citations

full rationale

The paper's central contribution is a newly derived privacy analysis for the DPSR-CG mechanism that explicitly accounts for data-dependent sampling probability variation induced by selective release after clipping. The abstract and visible text contain no equations, parameter fits, or self-citations that reduce this analysis to its own inputs by construction. No self-definitional loops, fitted-input predictions, or load-bearing self-citations are exhibited. The derivation is treated as self-contained against the prior DPSUR flaw, yielding a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no information on free parameters, axioms, or invented entities can be extracted from the provided text.

pith-pipeline@v0.9.1-grok · 5700 in / 1250 out tokens · 36193 ms · 2026-06-28T06:54:39.913854+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 3 linked inside Pith

  1. [1]

    Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Xiaobo Huang and Fang Xie Table 3: Results of classification accuracy Dataset Method 𝜖=1𝜖=2𝜖=3non-private MNIST (Image Dataset) DPSR-CG 99.06% 99.22% 99.24% DPSUR [15] 96.32% 97.51% 98.12% DPSGD Matrix Mech...

  2. [2]

    Galen Andrew, Om Thakkar, Brendan McMahan, and Swaroop Ramaswamy

  3. [3]

    InAdvances in Neural Information Processing Systems, Vol

    Differentially private learning with adaptive clipping. InAdvances in Neural Information Processing Systems, Vol. 34. 17455–17466

  4. [4]

    George J Annas. 2003. HIPAA regulations—a new era of medical-record privacy? New England Journal of Medicine348, 15 (2003), 1486–1490

  5. [5]

    Borja Balle and Yu-Xiang Wang. 2018. Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. InInternational Conference on Machine Learning. PMLR, 394–403

  6. [6]

    Raef Bassily, Adam Smith, and Abhradeep Thakurta. 2014. Private empirical risk minimization: Efficient algorithms and tight error bounds. In2014 IEEE 55th Annual Symposium on Foundations of Computer Science. IEEE, 464–473

  7. [7]

    Zhiqi Bu, Jinshuo Dong, Qi Long, and Weijie J Su. [n.d.]. Deep learning with Gaussion differential privacy.Harvard Data Science Review23 ([n. d.]), 10–1162

  8. [8]

    Zhiqi Bu, Hua Wang, Zongyu Dai, and Qi Long. 2023. On the convergence and calibration of deep learning with differential privacy.Transactions on Machine Learning Research(2023)

  9. [9]

    Xiangyi Chen, Steven Z Wu, and Mingyi Hong. [n.d.]. Understanding gradi- ent clipping in private SGD: A geometric perspective. InAdvances in Neural Information Processing Systems, Vol. 33. 13773–13782

  10. [10]

    Christopher Choquette-Choo, Arun Ganesh, Saminul Haque, Thomas Steinke, and Abhradeep Guha Thakurta. 2025. Near-exact privacy amplification for matrix mechanisms. InInternational Conference on Learning Representations. 98772–98802

  11. [11]

    Christopher A Choquette-Choo, Arun Ganesh, Ryan McKenna, H Brendan McMa- han, John Rush, Abhradeep Guha Thakurta, and Zheng Xu. 2023. (amplified) banded matrix factorization: A unified approach to private training. InAdvances in Neural Information Processing Systems, Vol. 36. 74856–74889. Revisiting Privacy Amplification by Subsampling in Selective Release DPSGD

  12. [12]

    Rachel Cummings and Deven Desai. 2018. The role of differential privacy in GDPR compliance. InFAT’18: Proceedings of the Conference on Fairness, Account- ability, and Transparency, Vol. 20. 2

  13. [13]

    Rong Du, Qingqing Ye, Yue Fu, Haibo Hu, Jin Li, Chengfang Fang, and Jie Shi

  14. [14]

    In2023 IEEE 39th International Conference on Data Engineering (ICDE)

    Differential aggregation against general colluding attackers. In2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 2180–2193

  15. [15]

    Jiawei Duan, Haibo Hu, Qingqing Ye, and Xinyue Sun. 2025. Analyzing and optimizing perturbation of DP-SGD geometrically. In2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE, 3439–3452

  16. [16]

    Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of differential privacy.Foundations and Trends®in Theoretical Computer Science9, 3-4 (2014), 211–487

  17. [17]

    Jie Fu, Qingqing Ye, Haibo Hu, Zhili Chen, Lulu Wang, Kuncan Wang, and Xun Ran. 2023. DPSUR: Accelerating differentially private stochastic gradient descent using selective update and release.arXiv:2311.14056(2023)

  18. [18]

    Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. 2015. Escaping from saddle points—online stochastic gradient for tensor decomposition. InConference on Learning Theory. PMLR, 797–842

  19. [19]

    Xiaobo Huang and Fang Xie. 2026. Step-Wise Dual Dynamic DPSGD: Enhancing Performance on Imbalanced Medical Datasets with Differential Privacy.Entropy 28, 4 (2026), 409

  20. [20]

    Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu. 2021. Practical and private (deep) learning without sampling or shuffling. InInternational Conference on Machine Learning. PMLR, 5213–5225

  21. [21]

    Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2015. The composition theorem for differential privacy. InInternational conference on machine learning. PMLR, 1376–1385

  22. [22]

    Nikhil Ketkar. 2017. Introduction to Keras. InDeep learning with python: a hands-on introduction. Springer, 97–111

  23. [23]

    Anastasia Koloskova, Hadrien Hendrikx, and Sebastian U Stich. 2023. Revis- iting gradient clipping: Stochastic bias and tight convergence guarantees. In International Conference on Machine Learning. PMLR, 17343–17363

  24. [24]

    2009.Learning multiple layers of features from tiny images

    Alex Krizhevsky and Geoffrey Hinton. 2009.Learning multiple layers of features from tiny images. Master’s thesis. Department of Computer Science, University of Toronto

  25. [25]

    Jaewoo Lee and Daniel Kifer. 2018. Concentrated differentially private gradient descent with adaptive per-iteration privacy budget. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1656–1665

  26. [26]

    Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. InProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 142–150

  27. [27]

    Ilya Mironov. 2017. Rényi differential privacy. In2017 IEEE 30th Computer Security Foundations Symposium (CSF). IEEE, 263–275

  28. [28]

    Milad Nasr, Reza Shokri, and Amir Houmansadr. 2019. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In2019 IEEE Symposium on Security and Privacy (SP). IEEE, 739–753

  29. [29]

    Nicolas Papernot, Abhradeep Thakurta, Shuang Song, Steve Chien, and Úlfar Er- lingsson. 2021. Tempered sigmoid activations for deep learning with differential privacy. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 9312–9321

  30. [30]

    NhatHai Phan, Xintao Wu, Han Hu, and Dejing Dou. 2017. Adaptive laplace mechanism: Differential privacy preservation in deep learning. In2017 IEEE international Conference on Data Mining (ICDM). IEEE, 385–394

  31. [31]

    Venkatadheeraj Pichapati, Ananda Theertha Suresh, Felix X Yu, Sashank J Reddi, and Sanjiv Kumar. 2019. Adaclip: Adaptive clipping for private SGD. arXiv:1908.07643(2019)

  32. [32]

    Formerly Data Protection. 2018. General data protection regulation (GDPR). Intersoft Consulting, Accessed in October24, 1 (2018)

  33. [33]

    Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. 2018. Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models.arXiv:1806.01246 (2018)

  34. [34]

    Shuang Song, Kamalika Chaudhuri, and Anand D Sarwate. 2013. Stochastic gra- dient descent with differentially private updates. In2013 IEEE Global Conference on Signal and Information Processing. IEEE, 245–248

  35. [35]

    Florian Tramer and Dan Boneh. 2020. Differentially private learning needs better features (or much more data).arXiv:2011.11660(2020)

  36. [36]

    Yu-Lin Tsai, Yizhe Li, Chia-Mu Yu, Xuebin Ren, Po-Yu Chen, Zekai Chen, and Francois Buet-Golfouse. 2025. Differentially private fine-tuning of diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision. 4561–4571

  37. [37]

    Chengkun Wei, Weixian Li, Gong Chen, and Wenzhi Chen. 2025. DC-SGD: Differentially Private SGD with Dynamic Clipping through Gradient Norm Distribution Estimation.IEEE Transactions on Information Forensics and Security (2025)

  38. [38]

    Jianxin Wei, Ergute Bao, Xiaokui Xiao, and Yin Yang. 2022. DPIS: An enhanced mechanism for differentially private SGD with importance sampling. InPro- ceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 2885–2899

  39. [39]

    Liyao Xiang, Jingbo Yang, and Baochun Li. 2019. Differentially-private deep learn- ing from an optimization perspective. InIEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 559–567

  40. [40]

    Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms.arXiv:1708.07747 (2017)

  41. [41]

    Zhiying Xu, Shuyu Shi, Alex X Liu, Jun Zhao, and Lin Chen. 2020. An adaptive and fast convergent approach to differentially private deep learning. InIEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 1867–1876

  42. [42]

    Xiaodong Yang, Huishuai Zhang, Wei Chen, and Tie-Yan Liu. 2022. Normal- ized/clipped SGD with perturbation for differentially private non-convex opti- mization.arXiv:2206.13033(2022)

  43. [43]

    LeCun Yann. 2010. MNIST handwritten digit database.ATT Labs.(2010)

  44. [44]

    Guanzi Yao. 2024. Privacy-preserving low-rank instruction tuning for large language models via DP-LoRA.Journal of Computer Technology and Software3, 5 (2024)

  45. [45]

    Jingzhao Zhang, Tianxing He, Suvrit Sra, and Ali Jadbabaie. 2019. Why gradient clipping accelerates training: A theoretical justification for adaptivity.arXiv preprint arXiv:1905.11881(2019)

  46. [46]

    Xinyang Zhang, Shouling Ji, and Ting Wang. 2018. Differentially private releasing via deep generative model (technical report).arXiv:1801.01594(2018)