Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds
Pith reviewed 2026-05-08 13:18 UTC · model grok-4.3
The pith
DP-SGD with random shuffling subsampling yields tight closed-form trade-off function bounds that converge to the ideal 1-a diagonal under suitable epoch scaling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the regime sigma greater than or equal to the square root of three over ln M, the trade-off function for shuffling-based DP-SGD satisfies tight closed-form bounds up to constant factors derived from the Berry-Esseen theorem applied to the sum of bounded random variables from the shuffling process; moreover, when E equals c_M squared times M with c_M approaching zero, the E-fold composition of the trade-off function converges uniformly to 1 minus a in a, with delta having only an O of square root of E dependency.
What carries the argument
The trade-off function in the f-DP framework for shuffling-based subsampling, with its distribution characterized via the Berry-Esseen theorem on the privacy-loss random variable.
If this is right
- For one epoch with sigma equals 1 and delta equals 1/100, roughly 1.14 million rounds and 11.4 million samples suffice to reach a trade-off function at least 1 minus a minus delta.
- Composition over E epochs produces a delta that grows linearly with E, which restricts usable E to order square root of M to keep the bound meaningful.
- Under the scaling E equals c_M squared times M with c_M to zero, the delta dependency improves to O of square root of E while still converging to the ideal diagonal.
- The explicit bounds stand in contrast to negative results known for the smaller-sigma regime sigma less than or equal to one over square root of two ln M.
Where Pith is reading between the lines
- Shuffling may be preferable to Poisson subsampling in practice when interpretable closed-form privacy accounting is needed for large training runs.
- The open question of explicit convergence rates left in the paper points to a natural next step of replacing Berry-Esseen with sharper concentration tools.
- The uniform convergence result suggests that privacy loss can be made arbitrarily close to random guessing by modest growth in rounds relative to epochs, which could guide hyper-parameter choices in private deep learning.
- The same shuffling analysis may extend to other iterative private optimization methods where the per-step randomness is permutation-based rather than independent.
Load-bearing premise
The noise multiplier must be at least the square root of three over the log of the number of rounds, and the Berry-Esseen theorem must give a sufficiently accurate approximation to the distribution of the summed privacy losses from shuffling.
What would settle it
Numerical computation of the exact privacy-loss distribution for shuffling DP-SGD at sigma equals 1 and M around one million showing the realized trade-off function deviates by more than a small constant from the claimed closed-form bounds or fails to approach 1 minus a uniformly under the stated epoch scaling.
Figures
read the original abstract
We derive a tight analysis of the trade-off function for Differentially Private Stochastic Gradient Descent (DP-SGD) with subsampling based on random shuffling within the $f$-DP framework. Our analysis covers the regime $\sigma \geq \sqrt{3/\ln M}$, where $\sigma$ is the noise multiplier and $M$ is the number of rounds within a single epoch. Unlike $f$-DP analyses for Poisson subsampling, which yield non-closed implicit formulas that can be machine computed but are non-transparent, random shuffling admits a tight analysis yielding transparent and interpretable closed-form bounds. Our concrete bounds, derived via the Berry-Esseen theorem, are tight up to constant factors within the proof framework. We demonstrate worked parameter settings for a single epoch ($E=1$) with a corresponding trade-off function $\geq 1-a-\delta$, that is, only $\delta$ below the ideal random guessing diagonal $1-a$: For $\delta = 1/100$ and $\sigma = 1$, roughly $M \approx 1.14\times 10^6$ rounds and $N \approx 1.14\times 10^7$ training samples suffice to achieve meaningful differential privacy. This is in contrast to recent negative results for the regime $\sigma \leq 1/\sqrt{2 \ln M}$. Our concrete bounds can be composed over multiple epochs leading to $\delta$ having a linear in $E$ dependency, which restricts $E=O(\sqrt{M})$. To go beyond Berry--Esseen, we introduce a new proof technique based on a generalization of the law of large numbers that yields an asymptotic random guessing diagonal-limit result: if $E=c_M^2M$ with $c_M\to 0$, then the $E$-fold composed trade-off function satisfies $f^{\otimes E}(a)\to 1-a$ uniformly in $a\in[0,1]$ with $\delta$ having only an $O(\sqrt{E})$ dependency. We compare this asymptotic regime with the corresponding Poisson subsampling asymptotic, and highlight the characterization of explicit convergence rates as an open question.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives tight upper and lower bounds on the trade-off functions for differentially private stochastic gradient descent (DP-SGD) employing subsampling via random shuffling, framed within the f-DP paradigm. The analysis is restricted to the noise regime σ ≥ √(3/ln M) and leverages the Berry-Esseen theorem to obtain closed-form expressions that are claimed to be tight up to constant factors. Concrete numerical instantiations are provided for a single epoch, and an asymptotic result is established for the composition over multiple epochs using a novel generalization of the law of large numbers, showing convergence to the ideal random-guessing trade-off under suitable scaling of the number of epochs E with M.
Significance. Should the central claims hold, the work would be significant for providing interpretable, closed-form privacy analyses for random shuffling, a common practical mechanism in DP-SGD implementations, in contrast to the typically implicit and numerically computed bounds for Poisson subsampling. The explicit parameter settings (e.g., achieving δ = 0.01 with σ = 1 and large but feasible M and N) and the asymptotic uniform convergence result offer practical guidance and theoretical insight into when shuffling-based DP-SGD can approach ideal privacy guarantees. The introduction of a new concentration technique beyond Berry-Esseen is a methodological contribution.
major comments (2)
- [Main bound derivation / Berry-Esseen application] The proof invoking the Berry-Esseen theorem (in the section deriving the closed-form bounds): The cumulative privacy loss is modeled as a sum of bounded but dependent random variables due to the random shuffling (without replacement) process. Standard Berry-Esseen theorems apply to independent summands; the manuscript should explicitly state the version of Berry-Esseen employed (or any extension for exchangeable/dependent variables) and confirm that the approximation error bound C β3 / (σ³ √M) remains valid and does not exceed the claimed tightness in the regime σ ≥ √(3/ln M). This underpins the concrete bounds and the numerical example with M ≈ 1.14×10^6.
- [Asymptotic analysis] Asymptotic result (in the section on the new proof technique): The claim that if E = c_M² M with c_M → 0 then f⊗E(a) → 1-a uniformly in a ∈ [0,1] with only O(√E) dependency on δ relies on the new law-of-large-numbers generalization. Please provide more details on the convergence rate and how the error terms compose over E epochs to ensure the uniform convergence holds as stated.
minor comments (3)
- [Preliminaries / f-DP framework] Clarify the exact definition of the trade-off function f and how the lower and upper bounds are established symmetrically.
- [Discussion / Comparison section] In the comparison with Poisson subsampling, include a brief discussion of computational aspects or when one might prefer shuffling despite the regime restriction.
- [Numerical examples] The manuscript mentions 'worked parameter settings'; consider adding a table summarizing the parameters for different δ values.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the work's significance and for the constructive feedback on the technical details of the proofs. We address each major comment below and will revise the manuscript to incorporate the requested clarifications.
read point-by-point responses
-
Referee: The proof invoking the Berry-Esseen theorem (in the section deriving the closed-form bounds): The cumulative privacy loss is modeled as a sum of bounded but dependent random variables due to the random shuffling (without replacement) process. Standard Berry-Esseen theorems apply to independent summands; the manuscript should explicitly state the version of Berry-Esseen employed (or any extension for exchangeable/dependent variables) and confirm that the approximation error bound C β3 / (σ³ √M) remains valid and does not exceed the claimed tightness in the regime σ ≥ √(3/ln M). This underpins the concrete bounds and the numerical example with M ≈ 1.14×10^6.
Authors: We appreciate the referee highlighting this point for improved rigor. The privacy loss variables under random shuffling form an exchangeable sequence, and our derivation applies a Berry-Esseen-type bound valid for exchangeable (dependent) random variables. We will revise the manuscript to explicitly cite the specific version of the theorem employed for exchangeable sums and add a short verification that the error term C β3 / (σ³ √M) remains o(1) and does not compromise the claimed tightness (up to constants) in the regime σ ≥ √(3/ln M). This addresses the concern directly while preserving all stated results. revision: yes
-
Referee: The claim that if E = c_M² M with c_M → 0 then f⊗E(a) → 1-a uniformly in a ∈ [0,1] with only O(√E) dependency on δ relies on the new law-of-large-numbers generalization. Please provide more details on the convergence rate and how the error terms compose over E epochs to ensure the uniform convergence holds as stated.
Authors: We thank the referee for requesting elaboration on the novel technique. The generalized law of large numbers bounds the per-epoch deviation of the average privacy loss by O(1/√M) uniformly. Since epochs are independent, the errors compose additively in the exponentiated sense, yielding a total deviation of order √E / √M for the E-fold composition. Substituting E = c_M² M with c_M → 0 makes this o(1) uniformly over a ∈ [0,1], giving the stated convergence with O(√E) δ-dependence. We will expand the relevant section and add an appendix with the full proof sketch of the generalized LLN, explicit rates, and error composition to make this fully transparent. revision: yes
Circularity Check
No circularity: bounds derived from external theorems
full rationale
The derivation applies the Berry-Esseen theorem to obtain finite-M closed-form bounds on the trade-off function and invokes a generalization of the law of large numbers for the asymptotic uniform convergence result. Both are independent external results whose statements do not reference the target f-DP trade-off function or the paper's own parameters. No self-definitional equations, no fitted inputs relabeled as predictions, and no load-bearing self-citations appear. The central claims therefore remain non-tautological with respect to the paper's inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Berry-Esseen theorem applies to the sum of bounded random variables arising from the shuffling process
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.