Near-Optimal Pure Machine Unlearning for Smooth Strongly Convex Losses
Pith reviewed 2026-06-28 15:49 UTC · model grok-4.3
The pith
Approximate ε-unlearning achieves excess risk equal to statistical error plus a penalty that interpolates between retraining and an exponentially smaller term as ε/d grows.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove upper and lower bounds on the excess population risk of approximate ε-unlearning that are tight up to a condition-number factor. For mean estimation over the unit ball the bounds match. The optimal rate is the usual statistical error plus an unlearning penalty that interpolates between the retraining-from-scratch rate and an exponentially smaller term as ε/d grows, where d is the dimension of the model. In particular, when ε ≫ d, our ε-unlearning algorithm offers an exponential accuracy improvement over retraining the model from scratch and differentially private baselines. On the other hand, when ε ≤ d, retraining from scratch is optimal.
What carries the argument
Upper and lower bounds on excess population risk for approximate ε-unlearning that quantify the additional unlearning penalty and its dependence on the ratio ε/d.
If this is right
- When ε ≫ d the ε-unlearning algorithm yields exponential accuracy gains over retraining from scratch.
- When ε ≤ d retraining from scratch is optimal.
- The derived rates are tight up to a condition-number factor for general smooth strongly convex losses.
- For mean estimation over the unit ball the upper and lower bounds coincide exactly.
Where Pith is reading between the lines
- In high-dimensional settings unlearning may offer little statistical benefit unless ε scales with d.
- The same interpolation structure could be tested for other notions of unlearning or for non-convex losses.
- Algorithm designers could use the ε/d threshold to decide whether to unlearn or retrain.
- Closing the remaining condition-number gap would require a refined analysis technique.
Load-bearing premise
The loss functions must be smooth and strongly convex.
What would settle it
An experiment on mean estimation over the unit ball in which the excess risk of any ε-unlearning procedure exceeds the predicted lower bound by more than a constant factor would falsify the matching-bounds claim.
Figures
read the original abstract
Machine unlearning is motivated by legal and user-facing requirements to remove the influence of individuals' data from trained models, such as the right to be forgotten. Prior work has developed algorithms and error bounds for unlearning in smooth strongly convex stochastic optimization, but the fundamental statistical cost of unlearning has remained unclear. We nearly resolve this problem by proving upper and lower bounds on the excess population risk of approximate $\varepsilon$-unlearning; our bounds are tight up to a condition-number factor. For mean estimation over the unit ball, our upper and lower bounds match. The optimal rate is the usual statistical error plus an unlearning penalty that interpolates between the retraining-from-scratch rate and an exponentially smaller term as $\varepsilon/d$ grows, where $d$ is the dimension of the model. In particular, when $\varepsilon \gg d$, our $\varepsilon$-unlearning algorithm offers an exponential accuracy improvement over retraining the model from scratch and differentially private baselines. On the other hand, when $\varepsilon \le d$, retraining from scratch is optimal.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proves upper and lower bounds on the excess population risk of approximate ε-unlearning for smooth strongly convex losses in stochastic optimization. The bounds are tight up to a condition-number factor, match exactly for mean estimation over the unit ball, and yield an optimal rate consisting of the usual statistical error plus an unlearning penalty that interpolates between the retraining-from-scratch rate and an exponentially smaller term as ε/d grows (with ε ≫ d yielding exponential improvement over retraining and DP baselines, while ε ≤ d makes retraining optimal).
Significance. If the derivations hold, the work nearly resolves the fundamental statistical cost of unlearning in the stated setting by supplying the first near-matching upper/lower bounds, with an explicit interpolation that cleanly separates regimes. The exact matching for mean estimation and the parameter-free derivation from the problem setting are notable strengths.
minor comments (3)
- [§1] §1, paragraph 3: the phrase 'nearly resolve this problem' is slightly vague; explicitly state which aspect (e.g., the condition-number gap) remains open.
- The dependence of the unlearning penalty on ε/d is described qualitatively in the abstract and introduction; a single displayed equation summarizing the three regimes (ε ≫ d, ε ≈ d, ε ≤ d) would improve clarity.
- [§3] Notation for the excess risk decomposition (statistical term + unlearning penalty) is introduced piecemeal; a consolidated definition early in §3 would help.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work and the recommendation for minor revision. The provided summary accurately reflects the manuscript's contributions on near-matching upper and lower bounds for the excess risk of ε-unlearning, the interpolation between regimes, and the exact match for mean estimation.
Circularity Check
No significant circularity detected
full rationale
The paper's central contribution consists of upper and lower bounds on excess population risk for approximate ε-unlearning under smooth strongly convex losses, with exact matching shown for mean estimation over the unit ball. These bounds are derived directly from the optimization setting and statistical assumptions stated in the abstract, without any reduction of a 'prediction' to a fitted parameter, self-definitional quantities, or load-bearing self-citations. The interpolation of the unlearning penalty with ε/d follows from the regime distinctions in the analysis and does not rely on renaming known results or smuggling ansatzes. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Loss functions are smooth and strongly convex
Reference graph
Works this paper leans on
-
[1]
2015 IEEE 56th Annual Symposium on Foundations of Computer Science , pages=
A faster cutting plane method and its implications for combinatorial and convex optimization , author=. 2015 IEEE 56th Annual Symposium on Foundations of Computer Science , pages=. 2015 , organization=
2015
-
[2]
Probability Theory and Related Fields , volume=
Evolving sets, mixing and heat kernel bounds , author=. Probability Theory and Related Fields , volume=. 2005 , publisher=
2005
-
[3]
Information and Computation , volume=
Approximate counting, uniform generation and rapidly mixing Markov chains , author=. Information and Computation , volume=. 1989 , publisher=
1989
-
[4]
Proceedings of the twenty-third annual ACM symposium on Theory of computing , pages=
Sampling and integration of near log-concave functions , author=. Proceedings of the twenty-third annual ACM symposium on Theory of computing , pages=
-
[5]
Conference on Learning Theory , pages=
Structured logconcave sampling with a restricted gaussian oracle , author=. Conference on Learning Theory , pages=. 2021 , organization=
2021
-
[6]
Mathematics of Operations Research , volume=
Comparison of Lasserre’s measure-based bounds for polynomial optimization to bounds obtained by simulated annealing , author=. Mathematics of Operations Research , volume=. 2018 , publisher=
2018
-
[7]
arXiv preprint arXiv:1902.03736 , year=
A short note on concentration inequalities for random vectors with subgaussian norm , author=. arXiv preprint arXiv:1902.03736 , year=
Pith/arXiv arXiv 1902
-
[8]
, author=
Stochastic Convex Optimization. , author=. COLT , volume=
-
[9]
Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=
Private convex optimization in general norms , author=. Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=. 2023 , organization=
2023
-
[10]
Theory of Cryptography Conference , pages=
Concentrated differential privacy: Simplifications, extensions, and lower bounds , author=. Theory of Cryptography Conference , pages=. 2016 , organization=
2016
-
[11]
arXiv preprint arXiv:2301.00457 , year=
Resqueing parallel and private stochastic convex optimization , author=. arXiv preprint arXiv:2301.00457 , year=
-
[12]
Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages=
Private stochastic convex optimization: optimal rates in linear time , author=. Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages=
-
[13]
Advances in Neural Information Processing Systems , volume=
Learning with user-level privacy , author=. Advances in Neural Information Processing Systems , volume=
-
[14]
Automatica , volume=
On stochastic gradient and subgradient methods with adaptive steplength sequences , author=. Automatica , volume=. 2012 , publisher=
2012
-
[15]
SIAM Journal on Optimization , volume=
Randomized smoothing for stochastic optimization , author=. SIAM Journal on Optimization , volume=. 2012 , publisher=
2012
-
[17]
2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=
Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling , author=. 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2022 , organization=
2021
-
[18]
Advances in Neural Information Processing Systems , volume=
Stability of stochastic gradient descent on nonsmooth convex losses , author=. Advances in Neural Information Processing Systems , volume=
-
[19]
The Journal of Machine Learning Research , volume=
Stability and generalization , author=. The Journal of Machine Learning Research , volume=. 2002 , publisher=
2002
-
[20]
Advances in Neural Information Processing Systems , volume=
Private non-smooth erm and sco in subquadratic steps , author=. Advances in Neural Information Processing Systems , volume=
-
[21]
Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing , pages=
Composable and versatile privacy via truncated cdp , author=. Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing , pages=
-
[22]
Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages=
Deep learning with differential privacy , author=. Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages=
2016
-
[23]
International Conference on Machine Learning , pages=
Friendlycore: Practical differentially private aggregation , author=. International Conference on Machine Learning , pages=. 2022 , organization=
2022
-
[24]
User-level Private Stochastic Convex Optimization with Optimal Rates , author=
-
[25]
arXiv preprint arXiv:2305.04912 , year=
On User-Level Private Convex Optimization , author=. arXiv preprint arXiv:2305.04912 , year=
-
[26]
Hilal Asi and Vitaly Feldman and Tomer Koren and Kunal Talwar , title =
-
[27]
Advances in Neural Information Processing Systems , volume = 32, pages=
Private stochastic convex optimization with optimal rates , author=. Advances in Neural Information Processing Systems , volume = 32, pages=
-
[28]
Proceedings of the 52nd Annual ACM on the Theory of Computing , pages=
Private stochastic convex optimization: optimal rates in linear time , author=. Proceedings of the 52nd Annual ACM on the Theory of Computing , pages=
-
[29]
ICML , pages=
Private Adaptive Gradient Methods for Convex Optimization , author=. ICML , pages=
-
[30]
Advances in Neural Information Processing Systems , year=
User-level private learning via correlated sampling , author=. Advances in Neural Information Processing Systems , year=
-
[31]
Advances in Neural Information Processing Systems , volume=
Learning discrete distributions: user vs item-level privacy , author=. Advances in Neural Information Processing Systems , volume=
-
[32]
International Conference on Artificial Intelligence and Statistics , pages=
Discrete distribution estimation under user-level local differential privacy , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=
2023
-
[33]
Adapting to function difficulty and growth conditions in private optimization , volume =
Hilal Asi and Daniel Levy and John Duchi , booktitle = nips2021, pages =. Adapting to function difficulty and growth conditions in private optimization , volume =
-
[34]
arXiv:2302.09699 [cs.LG] , year=
Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks , author=. arXiv:2302.09699 [cs.LG] , year=
-
[35]
Raef Bassily and Cristobal Guzman and Anupama Nandi , title =
-
[36]
International Conference on Algorithmic Learning Theory , pages=
Private Stochastic Optimization with Large Worst-Case Lipschitz Parameter: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses , author=. International Conference on Algorithmic Learning Theory , pages=. 2023 , organization=
2023
-
[37]
International Conference on Machine Learning , pages=
Faster rates of convergence to stationary points in differentially private optimization , author=. International Conference on Machine Learning , pages=. 2023 , organization=
2023
-
[38]
Proceedings of the forty-sixth annual ACM symposium on Theory of computing , pages=
Fingerprinting codes and the price of approximate differential privacy , author=. Proceedings of the forty-sixth annual ACM symposium on Theory of computing , pages=
-
[39]
Journal of Privacy and Confidentiality , volume=
Between Pure and Approximate Differential Privacy , author=. Journal of Privacy and Confidentiality , volume=
-
[40]
End-to-End Arguments in System Design , author =. ACM Trans. Comput. Syst. , publisher =. doi:10.1145/357401.357402 , issn =
-
[41]
In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp
Deep Learning with Differential Privacy , author =. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security , publisher =. doi:10.1145/2976749.2978318 , isbn = 9781450341394, url =
-
[42]
Algorithmic Learning Theory , pages =
Differentially private assouad, fano, and le cam , author =. Algorithmic Learning Theory , pages =
-
[43]
IEEE Transactions on Information Theory , volume = 58, number = 5, pages =
Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , author =. IEEE Transactions on Information Theory , volume = 58, number = 5, pages =
-
[44]
FeO2: Federated Learning with Opt-Out Differential Privacy , author =
-
[45]
Proceedings of the 23rd ACM Conference on Economics and Computation , location =
Optimal and Differentially Private Data Acquisition: Central and Local Mechanisms , author =. Proceedings of the 23rd ACM Conference on Economics and Computation , location =. doi:10.1145/3490486.3538329 , isbn = 9781450391504, url =
-
[46]
Limits of private learning with access to public data , author =
-
[47]
International Conference on Machine Learning , pages =
Public data-assisted mirror descent for private model training , author =. International Conference on Machine Learning , pages =
-
[48]
Differentially private learning with adaptive clipping , author =
-
[49]
Learning with privacy at scale , author =
-
[50]
Differential Privacy Overview , author =
-
[51]
NeurIPS 2019 Expo Talk Abstract , url =
Private Federated Learning , author =. NeurIPS 2019 Expo Talk Abstract , url =
2019
-
[52]
IEEE Internet of Things Journal , publisher =
Local differential privacy for deep learning , author =. IEEE Internet of Things Journal , publisher =
-
[53]
On the convergence of SGD with biased gradients , author =
-
[54]
Near instance-optimality in differential privacy , author =
-
[55]
International Conference on Machine Learning , pages =
Private adaptive gradient methods for convex optimization , author =. International Conference on Machine Learning , pages =
-
[56]
Advances in Neural Information Processing Systems , publisher =
Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms , author =. Advances in Neural Information Processing Systems , publisher =
-
[57]
International Conference on Machine Learning , publisher =
Private stochastic convex optimization: Optimal rates in _1 geometry , author =. International Conference on Machine Learning , publisher =
-
[58]
International Conference on Machine Learning , pages =
Optimal algorithms for mean estimation under local differential privacy , author =. International Conference on Machine Learning , pages =
-
[59]
, author =
BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model. , author =. USENIX Security Symposium , pages =
-
[60]
Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation
Discussion of “Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation” , author =. Electronic Journal of Statistics , publisher =
-
[61]
Advances in Neural Information Processing Systems , publisher =
Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences , author =. Advances in Neural Information Processing Systems , publisher =
-
[62]
Annual International Cryptology Conference , pages =
The privacy blanket of the shuffle model , author =. Annual International Cryptology Conference , pages =
-
[63]
Privacy Amplification via Random Check-Ins , author =
-
[64]
Model-agnostic private learning , author =
-
[65]
Advances in Neural Information Processing Systems , volume = 33, pages =
Stability of stochastic gradient descent on nonsmooth convex losses , author =. Advances in Neural Information Processing Systems , volume = 33, pages =
-
[66]
International Conference on Machine Learning , pages =
Private query release assisted by public data , author =. International Conference on Machine Learning , pages =
-
[67]
Conference on Learning Theory , pages =
Non-euclidean differentially private stochastic convex optimization , author =. Conference on Learning Theory , pages =
-
[68]
Privacy and statistical risk: Formalisms and minimax bounds , author =
-
[69]
approximate differential privacy , author =
Private learning and sanitization: Pure vs. approximate differential privacy , author =. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques: 16th International Workshop, APPROX 2013, and 17th International Workshop, RANDOM 2013, Berkeley, CA, USA, August 21-23, 2013. Proceedings , pages =
2013
-
[70]
Overfitting or perfect fitting? risk bounds for classification and regression rules that interpolate , author =
-
[71]
Private Stochastic Convex Optimization with Optimal Rates , author =
-
[72]
Differentially Private Stochastic Optimization: New Results in Convex and Non-Convex Settings , author =
-
[73]
Protection against reconstruction and its applications in private federated learning , author =
-
[74]
arXiv preprint arXiv:2208.07984 , booktitle =
Private Estimation with Public Data , author =. arXiv preprint arXiv:2208.07984 , booktitle =
-
[75]
Foundations of Computational Mathematics , publisher =
Covariance’s loss is privacy’s gain: Computationally efficient, private and accurate synthetic data , author =. Foundations of Computational Mathematics , publisher =
-
[76]
The Journal of Machine Learning Research , publisher =
Stability and generalization , author =. The Journal of Machine Learning Research , publisher =
-
[77]
2014 IEEE 55th Annual Symposium on Foundations of Computer Science , pages =
Private empirical risk minimization: Efficient algorithms and tight error bounds , author =. 2014 IEEE 55th Annual Symposium on Foundations of Computer Science , pages =
2014
-
[78]
Foundations and Trends
Convex optimization: Algorithms and complexity , author =. Foundations and Trends
-
[79]
Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , author =. Proceedings, Part I, of the 14th International Conference on Theory of Cryptography - Volume 9985 , publisher =. doi:10.1007/978-3-662-53641-4_24 , isbn = 9783662536407, url =
-
[80]
USENIX Security Symposium , volume = 6, pages =
Extracting Training Data from Large Language Models , author =. USENIX Security Symposium , volume = 6, pages =
-
[81]
Lower bounds for non-convex stochastic optimization , author =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.