Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds

Marten van Dijk; Murat Bilgehan Ertan

arxiv: 2605.06259 · v2 · pith:HFWLFTPAnew · submitted 2026-05-07 · 💻 cs.LG · cs.CR

Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds

Marten van Dijk , Murat Bilgehan Ertan This is my paper

Pith reviewed 2026-05-08 13:18 UTC · model grok-4.3

classification 💻 cs.LG cs.CR

keywords differential privacyDP-SGDrandom shufflingf-DPtrade-off functionsubsamplingBerry-Esseen theoremprivacy composition

0 comments

The pith

DP-SGD with random shuffling subsampling yields tight closed-form trade-off function bounds that converge to the ideal 1-a diagonal under suitable epoch scaling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives explicit upper and lower bounds on the f-DP trade-off function for DP-SGD when subsampling uses random shuffling rather than Poisson sampling. The analysis applies in the regime where the noise multiplier sigma is at least the square root of three divided by the natural log of the number of rounds M per epoch. It produces transparent closed-form expressions via the Berry-Esseen theorem on the privacy loss from the shuffle process, in contrast to the non-closed implicit formulas typical for Poisson subsampling. For multiple epochs the work further shows that if the number of epochs E scales as c_M squared times M with c_M going to zero, the composed trade-off function converges uniformly to the random-guessing diagonal 1 minus a, with delta depending only on the square root of E.

Core claim

In the regime sigma greater than or equal to the square root of three over ln M, the trade-off function for shuffling-based DP-SGD satisfies tight closed-form bounds up to constant factors derived from the Berry-Esseen theorem applied to the sum of bounded random variables from the shuffling process; moreover, when E equals c_M squared times M with c_M approaching zero, the E-fold composition of the trade-off function converges uniformly to 1 minus a in a, with delta having only an O of square root of E dependency.

What carries the argument

The trade-off function in the f-DP framework for shuffling-based subsampling, with its distribution characterized via the Berry-Esseen theorem on the privacy-loss random variable.

If this is right

For one epoch with sigma equals 1 and delta equals 1/100, roughly 1.14 million rounds and 11.4 million samples suffice to reach a trade-off function at least 1 minus a minus delta.
Composition over E epochs produces a delta that grows linearly with E, which restricts usable E to order square root of M to keep the bound meaningful.
Under the scaling E equals c_M squared times M with c_M to zero, the delta dependency improves to O of square root of E while still converging to the ideal diagonal.
The explicit bounds stand in contrast to negative results known for the smaller-sigma regime sigma less than or equal to one over square root of two ln M.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Shuffling may be preferable to Poisson subsampling in practice when interpretable closed-form privacy accounting is needed for large training runs.
The open question of explicit convergence rates left in the paper points to a natural next step of replacing Berry-Esseen with sharper concentration tools.
The uniform convergence result suggests that privacy loss can be made arbitrarily close to random guessing by modest growth in rounds relative to epochs, which could guide hyper-parameter choices in private deep learning.
The same shuffling analysis may extend to other iterative private optimization methods where the per-step randomness is permutation-based rather than independent.

Load-bearing premise

The noise multiplier must be at least the square root of three over the log of the number of rounds, and the Berry-Esseen theorem must give a sufficiently accurate approximation to the distribution of the summed privacy losses from shuffling.

What would settle it

Numerical computation of the exact privacy-loss distribution for shuffling DP-SGD at sigma equals 1 and M around one million showing the realized trade-off function deviates by more than a small constant from the claimed closed-form bounds or fails to approach 1 minus a uniformly under the stated epoch scaling.

Figures

Figures reproduced from arXiv: 2605.06259 by Marten van Dijk, Murat Bilgehan Ertan.

**Figure 1.** Figure 1: Privacy parameter δ vs. number of rounds M for σ ∈ {0.5, 0.75, 1, 1.5, 2}, computed from the full lower bound of Theorem 4.1 (E = 1); dashed lines mark δ = 0.01, 10−4 . The U-shape in σ (with σ≈1 near-optimal) reflects the exponential growth of the Berry-Esseen coefficient for small σ and its polynomial growth for large σ. Each curve begins at the smallest M satisfying σ ≥ p 3/ ln M and the validity condit… view at source ↗

**Figure 2.** Figure 2: Asymptotic GDP-µ coefficient (per unit c = limM→∞ cM, E = c 2 MM) for random shuffling (Theorem 5.2) vs. subsampled Gaussian (Cor. 5.4 of [8]). Left: both coefficients diverge as σ → 0 + and decay like 1/σ as σ → ∞ (log scale). Right: the ratio µsub/µshuffle decreases monotonically from √ 2 (as σ→0 +) to 1 (as σ→∞); marker: midpoint σ ≈1.668 where the ratio equals (1 + √ 2)/2≈1.207. Finally, let c ≥ 0 and … view at source ↗

**Figure 3.** Figure 3: Required rounds M vs. noise multiplier σ from the closed-form two-term approximation (within ∼0.1% of the exact Theorem 4.1 solve), for δ ∈ {10−2 , 10−3 , 10−4} at E = 1. Each curve is U-shaped: for small σ, M grows exponentially via the e 3/(2σ 2 ) factor in the Berry-Esseen bound on δ; for large σ, M grows polynomially via the (1 − e −1/σ2 ) −3 factor. The minimum sits near σ≈1 for all δ. This requiremen… view at source ↗

read the original abstract

We derive a tight analysis of the trade-off function for Differentially Private Stochastic Gradient Descent (DP-SGD) with subsampling based on random shuffling within the $f$-DP framework. Our analysis covers the regime $\sigma \geq \sqrt{3/\ln M}$, where $\sigma$ is the noise multiplier and $M$ is the number of rounds within a single epoch. Unlike $f$-DP analyses for Poisson subsampling, which yield non-closed implicit formulas that can be machine computed but are non-transparent, random shuffling admits a tight analysis yielding transparent and interpretable closed-form bounds. Our concrete bounds, derived via the Berry-Esseen theorem, are tight up to constant factors within the proof framework. We demonstrate worked parameter settings for a single epoch ($E=1$) with a corresponding trade-off function $\geq 1-a-\delta$, that is, only $\delta$ below the ideal random guessing diagonal $1-a$: For $\delta = 1/100$ and $\sigma = 1$, roughly $M \approx 1.14\times 10^6$ rounds and $N \approx 1.14\times 10^7$ training samples suffice to achieve meaningful differential privacy. This is in contrast to recent negative results for the regime $\sigma \leq 1/\sqrt{2 \ln M}$. Our concrete bounds can be composed over multiple epochs leading to $\delta$ having a linear in $E$ dependency, which restricts $E=O(\sqrt{M})$. To go beyond Berry--Esseen, we introduce a new proof technique based on a generalization of the law of large numbers that yields an asymptotic random guessing diagonal-limit result: if $E=c_M^2M$ with $c_M\to 0$, then the $E$-fold composed trade-off function satisfies $f^{\otimes E}(a)\to 1-a$ uniformly in $a\in[0,1]$ with $\delta$ having only an $O(\sqrt{E})$ dependency. We compare this asymptotic regime with the corresponding Poisson subsampling asymptotic, and highlight the characterization of explicit convergence rates as an open question.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first closed-form f-DP bounds for random-shuffle subsampling in DP-SGD plus a clean multi-epoch asymptotic, but the Berry-Esseen step on dependent sums is the part that needs checking.

read the letter

The main new piece is the explicit closed-form trade-off bounds for DP-SGD under random shuffling, which the authors derive in the regime σ ≥ √(3/ln M). This is distinct from the implicit, machine-computable formulas that exist for Poisson subsampling, and the paper supplies concrete numbers that practitioners can plug in directly. For one epoch with σ=1 and δ=1/100, they show M around 1.14 million rounds and N ten times larger gets the trade-off function within 0.01 of the ideal diagonal 1-a. They also add a generalized law-of-large-numbers argument that handles many epochs: when E scales as c_M²M with c_M going to zero, the composed function converges uniformly to 1-a with δ only O(√E). That asymptotic comparison to Poisson subsampling is useful and cleanly stated.

Referee Report

2 major / 3 minor

Summary. The manuscript derives tight upper and lower bounds on the trade-off functions for differentially private stochastic gradient descent (DP-SGD) employing subsampling via random shuffling, framed within the f-DP paradigm. The analysis is restricted to the noise regime σ ≥ √(3/ln M) and leverages the Berry-Esseen theorem to obtain closed-form expressions that are claimed to be tight up to constant factors. Concrete numerical instantiations are provided for a single epoch, and an asymptotic result is established for the composition over multiple epochs using a novel generalization of the law of large numbers, showing convergence to the ideal random-guessing trade-off under suitable scaling of the number of epochs E with M.

Significance. Should the central claims hold, the work would be significant for providing interpretable, closed-form privacy analyses for random shuffling, a common practical mechanism in DP-SGD implementations, in contrast to the typically implicit and numerically computed bounds for Poisson subsampling. The explicit parameter settings (e.g., achieving δ = 0.01 with σ = 1 and large but feasible M and N) and the asymptotic uniform convergence result offer practical guidance and theoretical insight into when shuffling-based DP-SGD can approach ideal privacy guarantees. The introduction of a new concentration technique beyond Berry-Esseen is a methodological contribution.

major comments (2)

[Main bound derivation / Berry-Esseen application] The proof invoking the Berry-Esseen theorem (in the section deriving the closed-form bounds): The cumulative privacy loss is modeled as a sum of bounded but dependent random variables due to the random shuffling (without replacement) process. Standard Berry-Esseen theorems apply to independent summands; the manuscript should explicitly state the version of Berry-Esseen employed (or any extension for exchangeable/dependent variables) and confirm that the approximation error bound C β3 / (σ³ √M) remains valid and does not exceed the claimed tightness in the regime σ ≥ √(3/ln M). This underpins the concrete bounds and the numerical example with M ≈ 1.14×10^6.
[Asymptotic analysis] Asymptotic result (in the section on the new proof technique): The claim that if E = c_M² M with c_M → 0 then f⊗E(a) → 1-a uniformly in a ∈ [0,1] with only O(√E) dependency on δ relies on the new law-of-large-numbers generalization. Please provide more details on the convergence rate and how the error terms compose over E epochs to ensure the uniform convergence holds as stated.

minor comments (3)

[Preliminaries / f-DP framework] Clarify the exact definition of the trade-off function f and how the lower and upper bounds are established symmetrically.
[Discussion / Comparison section] In the comparison with Poisson subsampling, include a brief discussion of computational aspects or when one might prefer shuffling despite the regime restriction.
[Numerical examples] The manuscript mentions 'worked parameter settings'; consider adding a table summarizing the parameters for different δ values.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive assessment of the work's significance and for the constructive feedback on the technical details of the proofs. We address each major comment below and will revise the manuscript to incorporate the requested clarifications.

read point-by-point responses

Referee: The proof invoking the Berry-Esseen theorem (in the section deriving the closed-form bounds): The cumulative privacy loss is modeled as a sum of bounded but dependent random variables due to the random shuffling (without replacement) process. Standard Berry-Esseen theorems apply to independent summands; the manuscript should explicitly state the version of Berry-Esseen employed (or any extension for exchangeable/dependent variables) and confirm that the approximation error bound C β3 / (σ³ √M) remains valid and does not exceed the claimed tightness in the regime σ ≥ √(3/ln M). This underpins the concrete bounds and the numerical example with M ≈ 1.14×10^6.

Authors: We appreciate the referee highlighting this point for improved rigor. The privacy loss variables under random shuffling form an exchangeable sequence, and our derivation applies a Berry-Esseen-type bound valid for exchangeable (dependent) random variables. We will revise the manuscript to explicitly cite the specific version of the theorem employed for exchangeable sums and add a short verification that the error term C β3 / (σ³ √M) remains o(1) and does not compromise the claimed tightness (up to constants) in the regime σ ≥ √(3/ln M). This addresses the concern directly while preserving all stated results. revision: yes
Referee: The claim that if E = c_M² M with c_M → 0 then f⊗E(a) → 1-a uniformly in a ∈ [0,1] with only O(√E) dependency on δ relies on the new law-of-large-numbers generalization. Please provide more details on the convergence rate and how the error terms compose over E epochs to ensure the uniform convergence holds as stated.

Authors: We thank the referee for requesting elaboration on the novel technique. The generalized law of large numbers bounds the per-epoch deviation of the average privacy loss by O(1/√M) uniformly. Since epochs are independent, the errors compose additively in the exponentiated sense, yielding a total deviation of order √E / √M for the E-fold composition. Substituting E = c_M² M with c_M → 0 makes this o(1) uniformly over a ∈ [0,1], giving the stated convergence with O(√E) δ-dependence. We will expand the relevant section and add an appendix with the full proof sketch of the generalized LLN, explicit rates, and error composition to make this fully transparent. revision: yes

Circularity Check

0 steps flagged

No circularity: bounds derived from external theorems

full rationale

The derivation applies the Berry-Esseen theorem to obtain finite-M closed-form bounds on the trade-off function and invokes a generalization of the law of large numbers for the asymptotic uniform convergence result. Both are independent external results whose statements do not reference the target f-DP trade-off function or the paper's own parameters. No self-definitional equations, no fitted inputs relabeled as predictions, and no load-bearing self-citations appear. The central claims therefore remain non-tautological with respect to the paper's inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard probability theorems rather than new postulates or fitted constants.

axioms (1)

standard math Berry-Esseen theorem applies to the sum of bounded random variables arising from the shuffling process
Invoked to obtain the explicit approximation error in the regime σ ≥ √(3/ln M)

pith-pipeline@v0.9.0 · 5707 in / 1508 out tokens · 131372 ms · 2026-05-08T13:18:54.489496+00:00 · methodology

Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)