Recognition: 2 theorem links
· Lean TheoremOptimal Rates for Pure varepsilon-Differentially Private Stochastic Convex Optimization with Heavy Tails
Pith reviewed 2026-05-10 18:48 UTC · model grok-4.3
The pith
Pure ε-differential privacy achieves the minimax optimal excess risk rate for stochastic convex optimization with heavy-tailed gradients.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We characterize the minimax optimal excess-risk rate for pure ε-DP heavy-tailed SCO up to logarithmic factors. Our algorithm achieves this rate in polynomial time with high probability. Moreover, it runs in deterministic polynomial time when the worst-case Lipschitz parameter is polynomially bounded. For important structured problem classes including hinge/ReLU-type and absolute-value losses on Euclidean balls, ellipsoids, and polytopes, we achieve deterministic polynomial time even when the worst-case Lipschitz parameter is infinite. We complement the upper bound with a nearly matching high-probability lower bound.
What carries the argument
A novel framework for privately optimizing Lipschitz extensions of the empirical loss, which converts the heavy-tailed problem into one with controlled sensitivity while preserving convexity.
If this is right
- The optimal rate holds for any convex loss whose stochastic gradients have bounded k-th moments.
- Structured losses such as hinge and absolute-value functions on balls, ellipsoids, and polytopes admit deterministic polynomial-time algorithms even when gradients are unbounded.
- When a polynomial bound on the Lipschitz constant happens to exist, the algorithm becomes deterministic polynomial time.
- The same rate is tight up to logs via a matching high-probability lower bound.
Where Pith is reading between the lines
- The Lipschitz-extension technique may transfer to other private optimization settings where only moment bounds are realistic.
- Efficient approximation schemes for these extensions could enable practical deployment on new loss families beyond the structured cases studied.
- Similar moment-based analyses might improve non-private heavy-tailed SCO rates when uniform Lipschitz assumptions are removed.
Load-bearing premise
The stochastic gradients satisfy a bounded k-th moment rather than a uniform Lipschitz bound, and that Lipschitz extensions of the empirical loss can be computed or approximated efficiently for the target problem classes.
What would settle it
A concrete SCO instance with bounded k-th moments in which every pure ε-DP algorithm incurs excess risk worse than the claimed rate by more than logarithmic factors, or in which the algorithm exceeds polynomial runtime with non-negligible probability.
read the original abstract
We study stochastic convex optimization (SCO) with heavy-tailed gradients under pure $\varepsilon$-differential privacy (DP). Instead of assuming a bound on the worst-case Lipschitz parameter of the loss, we assume only a bounded $k$-th moment. This assumption allows for unbounded, heavy-tailed stochastic gradient distributions, and can yield sharper excess risk bounds. Prior work characterized the minimax optimal rate for $\rho$-zero-concentrated DP SCO up to logarithmic factors in this setting, but the pure $\varepsilon$-DP case has remained open. We characterize the minimax optimal excess-risk rate for pure $\varepsilon$-DP heavy-tailed SCO up to logarithmic factors. Our algorithm achieves this rate in polynomial time with high probability. Moreover, it runs in deterministic polynomial time when the worst-case Lipschitz parameter is polynomially bounded. For important structured problem classes -- including hinge/ReLU-type and absolute-value losses on Euclidean balls, ellipsoids, and polytopes -- we achieve deterministic polynomial time even when the worst-case Lipschitz parameter is infinite. Our approach is based on a novel framework for privately optimizing Lipschitz extensions of the empirical loss. We complement our upper bound with a nearly matching high-probability lower bound.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies stochastic convex optimization under pure ε-differential privacy with heavy-tailed stochastic gradients, replacing the usual bounded-Lipschitz assumption with a bounded k-th moment condition. It claims to characterize the minimax optimal excess-risk rate up to logarithmic factors, gives an algorithm achieving the rate in polynomial time with high probability (deterministically for structured classes such as hinge/ReLU losses on balls/ellipsoids/polytopes), and supplies a nearly matching high-probability lower bound. The method rests on a novel framework that reduces private optimization to Lipschitz extensions of the empirical loss.
Significance. If the claimed rates and constructions hold, the work is significant: it resolves the open pure-ε-DP case for heavy-tailed SCO (previously settled only for zCDP), supplies explicit deterministic polynomial-time oracles for natural structured problems, and gives high-probability matching bounds. These elements strengthen the contribution beyond typical rate characterizations.
minor comments (2)
- [Abstract] Abstract: the precise excess-risk rate (including dependence on k, ε, n, d) is not written explicitly; adding the expression would make the main result immediately verifiable from the abstract.
- [Introduction] The manuscript would benefit from a short table or paragraph in the introduction that contrasts the new pure-ε-DP rate with the prior zCDP rate under the same moment assumption.
Simulated Author's Rebuttal
We thank the referee for their positive summary of our work, recognition of its significance in resolving the pure ε-DP case for heavy-tailed SCO, and recommendation of minor revision. The referee's description of our contributions (minimax rates up to logs, polynomial-time algorithms via Lipschitz extensions, deterministic oracles for structured classes, and high-probability lower bounds) is accurate.
Circularity Check
No significant circularity; upper and lower bounds are independently constructed
full rationale
The paper derives its upper bound via an explicit novel framework that reduces private SCO to Lipschitz extensions of the empirical loss, with deterministic polynomial-time oracles given for structured classes (hinge/ReLU on balls/ellipsoids/polytopes) and high-probability sampling approximations for the general case. The nearly matching high-probability lower bound is a separate construction under the same bounded k-th moment assumption and does not reduce to any fitted parameter, self-definition, or prior self-citation chain. The cited prior work addresses only the distinct zCDP setting and is not load-bearing for the pure ε-DP characterization here.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Stochastic gradients have bounded k-th moment
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We characterize the minimax optimal excess-risk rate for pure ε-DP heavy-tailed SCO up to logarithmic factors... based on a novel framework for privately optimizing Lipschitz extensions of the empirical loss.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our approach is based on a novel framework for privately optimizing Lipschitz extensions of the empirical loss.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Private adaptive gradient methods for convex optimization
Hilal Asi, John Duchi, Alireza Fallah, Omid Javidbakht, and Kunal Talwar. Private adaptive gradient methods for convex optimization. In International Conference on Machine Learning , pages 383--392. PMLR, 2021
2021
-
[2]
Private stochastic convex optimization: Optimal rates in _1 geometry
Hilal Asi, Vitaly Feldman, Tomer Koren, and Kunal Talwar. Private stochastic convex optimization: Optimal rates in _1 geometry. In ICML , 2021
2021
-
[3]
Private stochastic convex optimization with heavy tails: Near-optimality from simple reductions
Hilal Asi, Daogao Liu, and Kevin Tian. Private stochastic convex optimization with heavy tails: Near-optimality from simple reductions. Advances in Neural Information Processing Systems , 37:59174--59215, 2024
2024
-
[4]
Privacy and statistical risk: Formalisms and minimax bounds
Rina Foygel Barber and John C Duchi. Privacy and statistical risk: Formalisms and minimax bounds. arXiv preprint arXiv:1412.4451 , 2014
-
[5]
Private stochastic convex optimization with optimal rates
Raef Bassily, Vitaly Feldman, Kunal Talwar, and Abhradeep Thakurta. Private stochastic convex optimization with optimal rates. In Advances in Neural Information Processing Systems , volume 32, 2019
2019
-
[6]
Differentially private algorithms for the stochastic saddle point problem with optimal rates for the strong gap
Raef Bassily, Crist \'o bal Guzm \'a n, and Michael Menart. Differentially private algorithms for the stochastic saddle point problem with optimal rates for the strong gap. In The Thirty Sixth Annual Conference on Learning Theory , pages 2482--2508. PMLR, 2023
2023
-
[7]
Private empirical risk minimization: Efficient algorithms and tight error bounds
Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science , pages 464--473. IEEE, 2014
2014
-
[8]
Lectures on modern convex optimization: analysis, algorithms, and engineering applications
Aharon Ben-Tal and Arkadi Nemirovski. Lectures on modern convex optimization: analysis, algorithms, and engineering applications . SIAM, 2001
2001
-
[9]
Differentially private empirical risk minimization
Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research , 12(3), 2011
2011
-
[10]
Adaptive subgradient methods for online learning and stochastic optimization
John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research , 12(7), 2011
2011
-
[11]
Comparison of lasserre’s measure-based bounds for polynomial optimization to bounds obtained by simulated annealing
Etienne De Klerk and Monique Laurent. Comparison of lasserre’s measure-based bounds for polynomial optimization to bounds obtained by simulated annealing. Mathematics of Operations Research , 43(4):1317--1325, 2018
2018
-
[12]
Calibrating noise to sensitivity in private data analysis
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference , pages 265--284. Springer, 2006
2006
-
[13]
Private stochastic convex optimization: optimal rates in linear time
Vitaly Feldman, Tomer Koren, and Kunal Talwar. Private stochastic convex optimization: optimal rates in linear time. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages 439--449, 2020
2020
-
[14]
High dimensional differentially private stochastic optimization with heavy-tailed data
Lijie Hu, Shuo Ni, Hanshen Xiao, and Di Wang. High dimensional differentially private stochastic optimization with heavy-tailed data. In Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems , pages 227--236, 2022
2022
-
[15]
Train faster, generalize better: Stability of stochastic gradient descent
Moritz Hardt, Ben Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning , volume 48 of Proceedings of Machine Learning Research , pages 1225--1234, New York, New York, USA, 20--22 Jun...
2016
-
[16]
Convex analysis and minimization algorithms I & II , volume 305
Jean-Baptiste Hiriart-Urruty and Claude Lemar \'e chal. Convex analysis and minimization algorithms I & II , volume 305. Springer science & business media, 2013
2013
-
[17]
Improved rates for differentially private stochastic convex optimization with heavy-tailed data
Gautam Kamath, Xingtu Liu, and Huanyu Zhang. Improved rates for differentially private stochastic convex optimization with heavy-tailed data. In International Conference on Machine Learning , pages 10633--10660. PMLR, 2022
2022
-
[18]
Differentially private bilevel optimization: Efficient algorithms with near-optimal rates
Andrew Lowy and Daogao Liu. Differentially private bilevel optimization: Efficient algorithms with near-optimal rates. In The Thirty-ninth Annual Conference on Neural Information Processing Systems , 2025
2025
-
[19]
Faster algorithms for user-level private stochastic convex optimization
Andrew Lowy, Daogao Liu, and Hilal Asi. Faster algorithms for user-level private stochastic convex optimization. Advances in Neural Information Processing Systems , 37:96071--96100, 2024
2024
-
[20]
Private stochastic optimization with large worst-case lipschitz parameter
Andrew Lowy and Meisam Razaviyayn. Private stochastic optimization with large worst-case lipschitz parameter. Journal of Privacy and Confidentiality , 15:1, 2025
2025
-
[21]
R \'e nyi differential privacy
Ilya Mironov. R \'e nyi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF) , pages 263--275. IEEE, 2017
2017
-
[22]
High-probability minimax lower bounds,
Tianyi Ma, Kabir A Verchand, and Richard J Samworth. High-probability minimax lower bounds. arXiv preprint arXiv:2406.13447 , 2024
-
[23]
Nemirovski and D
A. Nemirovski and D. Yudin. Problem complexity and method efficiency in optimization. Chichester, 1983
1983
-
[24]
Online learning and online convex optimization
Shai Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and Trends in Machine Learning , 4(2):107--194, 2012
2012
-
[25]
Differentially private learning with small public data
Jun Wang and Zhi-Hua Zhou. Differentially private learning with small public data. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 34, pages 6219--6226, 2020
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.