arxiv: 2604.06492 · v2 · submitted 2026-04-07 · 💻 cs.LG · cs.CR· stat.ML

Recognition: 2 theorem links

· Lean Theorem

Optimal Rates for Pure varepsilon-Differentially Private Stochastic Convex Optimization with Heavy Tails

Andrew Lowy

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:48 UTC · model grok-4.3

classification 💻 cs.LG cs.CRstat.ML

keywords differential privacystochastic convex optimizationheavy-tailed gradientsminimax ratesLipschitz extensionsexcess risk

0 comments

The pith

Pure ε-differential privacy achieves the minimax optimal excess risk rate for stochastic convex optimization with heavy-tailed gradients.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes the best possible excess risk achievable when optimizing convex functions from stochastic gradients whose distributions have heavy tails, while satisfying pure ε-differential privacy. It closes the gap left by earlier work on zero-concentrated DP by giving both a matching upper bound achieved by a polynomial-time algorithm and a nearly tight lower bound. The result matters because many real datasets produce unbounded gradients, and pure DP supplies a strong, easy-to-interpret privacy guarantee that had remained open in this setting. The method works by privately optimizing Lipschitz extensions of the empirical loss, which handles the lack of a uniform Lipschitz bound.

Core claim

We characterize the minimax optimal excess-risk rate for pure ε-DP heavy-tailed SCO up to logarithmic factors. Our algorithm achieves this rate in polynomial time with high probability. Moreover, it runs in deterministic polynomial time when the worst-case Lipschitz parameter is polynomially bounded. For important structured problem classes including hinge/ReLU-type and absolute-value losses on Euclidean balls, ellipsoids, and polytopes, we achieve deterministic polynomial time even when the worst-case Lipschitz parameter is infinite. We complement the upper bound with a nearly matching high-probability lower bound.

What carries the argument

A novel framework for privately optimizing Lipschitz extensions of the empirical loss, which converts the heavy-tailed problem into one with controlled sensitivity while preserving convexity.

If this is right

The optimal rate holds for any convex loss whose stochastic gradients have bounded k-th moments.
Structured losses such as hinge and absolute-value functions on balls, ellipsoids, and polytopes admit deterministic polynomial-time algorithms even when gradients are unbounded.
When a polynomial bound on the Lipschitz constant happens to exist, the algorithm becomes deterministic polynomial time.
The same rate is tight up to logs via a matching high-probability lower bound.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The Lipschitz-extension technique may transfer to other private optimization settings where only moment bounds are realistic.
Efficient approximation schemes for these extensions could enable practical deployment on new loss families beyond the structured cases studied.
Similar moment-based analyses might improve non-private heavy-tailed SCO rates when uniform Lipschitz assumptions are removed.

Load-bearing premise

The stochastic gradients satisfy a bounded k-th moment rather than a uniform Lipschitz bound, and that Lipschitz extensions of the empirical loss can be computed or approximated efficiently for the target problem classes.

What would settle it

A concrete SCO instance with bounded k-th moments in which every pure ε-DP algorithm incurs excess risk worse than the claimed rate by more than logarithmic factors, or in which the algorithm exceeds polynomial runtime with non-negligible probability.

read the original abstract

We study stochastic convex optimization (SCO) with heavy-tailed gradients under pure $\varepsilon$-differential privacy (DP). Instead of assuming a bound on the worst-case Lipschitz parameter of the loss, we assume only a bounded $k$-th moment. This assumption allows for unbounded, heavy-tailed stochastic gradient distributions, and can yield sharper excess risk bounds. Prior work characterized the minimax optimal rate for $\rho$-zero-concentrated DP SCO up to logarithmic factors in this setting, but the pure $\varepsilon$-DP case has remained open. We characterize the minimax optimal excess-risk rate for pure $\varepsilon$-DP heavy-tailed SCO up to logarithmic factors. Our algorithm achieves this rate in polynomial time with high probability. Moreover, it runs in deterministic polynomial time when the worst-case Lipschitz parameter is polynomially bounded. For important structured problem classes -- including hinge/ReLU-type and absolute-value losses on Euclidean balls, ellipsoids, and polytopes -- we achieve deterministic polynomial time even when the worst-case Lipschitz parameter is infinite. Our approach is based on a novel framework for privately optimizing Lipschitz extensions of the empirical loss. We complement our upper bound with a nearly matching high-probability lower bound.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper studies stochastic convex optimization under pure ε-differential privacy with heavy-tailed stochastic gradients, replacing the usual bounded-Lipschitz assumption with a bounded k-th moment condition. It claims to characterize the minimax optimal excess-risk rate up to logarithmic factors, gives an algorithm achieving the rate in polynomial time with high probability (deterministically for structured classes such as hinge/ReLU losses on balls/ellipsoids/polytopes), and supplies a nearly matching high-probability lower bound. The method rests on a novel framework that reduces private optimization to Lipschitz extensions of the empirical loss.

Significance. If the claimed rates and constructions hold, the work is significant: it resolves the open pure-ε-DP case for heavy-tailed SCO (previously settled only for zCDP), supplies explicit deterministic polynomial-time oracles for natural structured problems, and gives high-probability matching bounds. These elements strengthen the contribution beyond typical rate characterizations.

minor comments (2)

[Abstract] Abstract: the precise excess-risk rate (including dependence on k, ε, n, d) is not written explicitly; adding the expression would make the main result immediately verifiable from the abstract.
[Introduction] The manuscript would benefit from a short table or paragraph in the introduction that contrasts the new pure-ε-DP rate with the prior zCDP rate under the same moment assumption.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work, recognition of its significance in resolving the pure ε-DP case for heavy-tailed SCO, and recommendation of minor revision. The referee's description of our contributions (minimax rates up to logs, polynomial-time algorithms via Lipschitz extensions, deterministic oracles for structured classes, and high-probability lower bounds) is accurate.

Circularity Check

0 steps flagged

No significant circularity; upper and lower bounds are independently constructed

full rationale

The paper derives its upper bound via an explicit novel framework that reduces private SCO to Lipschitz extensions of the empirical loss, with deterministic polynomial-time oracles given for structured classes (hinge/ReLU on balls/ellipsoids/polytopes) and high-probability sampling approximations for the general case. The nearly matching high-probability lower bound is a separate construction under the same bounded k-th moment assumption and does not reduce to any fitted parameter, self-definition, or prior self-citation chain. The cited prior work addresses only the distinct zCDP setting and is not load-bearing for the pure ε-DP characterization here.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim rests on the bounded k-th moment domain assumption and the existence of tractable Lipschitz extensions for the stated loss classes; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Stochastic gradients have bounded k-th moment
Replaces the stronger bounded-Lipschitz assumption to permit heavy tails while still controlling excess risk.

pith-pipeline@v0.9.0 · 5510 in / 1207 out tokens · 46119 ms · 2026-05-10T18:48:06.613932+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We characterize the minimax optimal excess-risk rate for pure ε-DP heavy-tailed SCO up to logarithmic factors... based on a novel framework for privately optimizing Lipschitz extensions of the empirical loss.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our approach is based on a novel framework for privately optimizing Lipschitz extensions of the empirical loss.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 2 canonical work pages

[1]

Private adaptive gradient methods for convex optimization

Hilal Asi, John Duchi, Alireza Fallah, Omid Javidbakht, and Kunal Talwar. Private adaptive gradient methods for convex optimization. In International Conference on Machine Learning , pages 383--392. PMLR, 2021

2021
[2]

Private stochastic convex optimization: Optimal rates in _1 geometry

Hilal Asi, Vitaly Feldman, Tomer Koren, and Kunal Talwar. Private stochastic convex optimization: Optimal rates in _1 geometry. In ICML , 2021

2021
[3]

Private stochastic convex optimization with heavy tails: Near-optimality from simple reductions

Hilal Asi, Daogao Liu, and Kevin Tian. Private stochastic convex optimization with heavy tails: Near-optimality from simple reductions. Advances in Neural Information Processing Systems , 37:59174--59215, 2024

2024
[4]

Privacy and statistical risk: Formalisms and minimax bounds

Rina Foygel Barber and John C Duchi. Privacy and statistical risk: Formalisms and minimax bounds. arXiv preprint arXiv:1412.4451 , 2014

work page arXiv 2014
[5]

Private stochastic convex optimization with optimal rates

Raef Bassily, Vitaly Feldman, Kunal Talwar, and Abhradeep Thakurta. Private stochastic convex optimization with optimal rates. In Advances in Neural Information Processing Systems , volume 32, 2019

2019
[6]

Differentially private algorithms for the stochastic saddle point problem with optimal rates for the strong gap

Raef Bassily, Crist \'o bal Guzm \'a n, and Michael Menart. Differentially private algorithms for the stochastic saddle point problem with optimal rates for the strong gap. In The Thirty Sixth Annual Conference on Learning Theory , pages 2482--2508. PMLR, 2023

2023
[7]

Private empirical risk minimization: Efficient algorithms and tight error bounds

Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science , pages 464--473. IEEE, 2014

2014
[8]

Lectures on modern convex optimization: analysis, algorithms, and engineering applications

Aharon Ben-Tal and Arkadi Nemirovski. Lectures on modern convex optimization: analysis, algorithms, and engineering applications . SIAM, 2001

2001
[9]

Differentially private empirical risk minimization

Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research , 12(3), 2011

2011
[10]

Adaptive subgradient methods for online learning and stochastic optimization

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research , 12(7), 2011

2011
[11]

Comparison of lasserre’s measure-based bounds for polynomial optimization to bounds obtained by simulated annealing

Etienne De Klerk and Monique Laurent. Comparison of lasserre’s measure-based bounds for polynomial optimization to bounds obtained by simulated annealing. Mathematics of Operations Research , 43(4):1317--1325, 2018

2018
[12]

Calibrating noise to sensitivity in private data analysis

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference , pages 265--284. Springer, 2006

2006
[13]

Private stochastic convex optimization: optimal rates in linear time

Vitaly Feldman, Tomer Koren, and Kunal Talwar. Private stochastic convex optimization: optimal rates in linear time. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages 439--449, 2020

2020
[14]

High dimensional differentially private stochastic optimization with heavy-tailed data

Lijie Hu, Shuo Ni, Hanshen Xiao, and Di Wang. High dimensional differentially private stochastic optimization with heavy-tailed data. In Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems , pages 227--236, 2022

2022
[15]

Train faster, generalize better: Stability of stochastic gradient descent

Moritz Hardt, Ben Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning , volume 48 of Proceedings of Machine Learning Research , pages 1225--1234, New York, New York, USA, 20--22 Jun...

2016
[16]

Convex analysis and minimization algorithms I & II , volume 305

Jean-Baptiste Hiriart-Urruty and Claude Lemar \'e chal. Convex analysis and minimization algorithms I & II , volume 305. Springer science & business media, 2013

2013
[17]

Improved rates for differentially private stochastic convex optimization with heavy-tailed data

Gautam Kamath, Xingtu Liu, and Huanyu Zhang. Improved rates for differentially private stochastic convex optimization with heavy-tailed data. In International Conference on Machine Learning , pages 10633--10660. PMLR, 2022

2022
[18]

Differentially private bilevel optimization: Efficient algorithms with near-optimal rates

Andrew Lowy and Daogao Liu. Differentially private bilevel optimization: Efficient algorithms with near-optimal rates. In The Thirty-ninth Annual Conference on Neural Information Processing Systems , 2025

2025
[19]

Faster algorithms for user-level private stochastic convex optimization

Andrew Lowy, Daogao Liu, and Hilal Asi. Faster algorithms for user-level private stochastic convex optimization. Advances in Neural Information Processing Systems , 37:96071--96100, 2024

2024
[20]

Private stochastic optimization with large worst-case lipschitz parameter

Andrew Lowy and Meisam Razaviyayn. Private stochastic optimization with large worst-case lipschitz parameter. Journal of Privacy and Confidentiality , 15:1, 2025

2025
[21]

R \'e nyi differential privacy

Ilya Mironov. R \'e nyi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF) , pages 263--275. IEEE, 2017

2017
[22]

High-probability minimax lower bounds,

Tianyi Ma, Kabir A Verchand, and Richard J Samworth. High-probability minimax lower bounds. arXiv preprint arXiv:2406.13447 , 2024

work page arXiv 2024
[23]

Nemirovski and D

A. Nemirovski and D. Yudin. Problem complexity and method efficiency in optimization. Chichester, 1983

1983
[24]

Online learning and online convex optimization

Shai Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and Trends in Machine Learning , 4(2):107--194, 2012

2012
[25]

Differentially private learning with small public data

Jun Wang and Zhi-Hua Zhou. Differentially private learning with small public data. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 34, pages 6219--6226, 2020

2020