pith. sign in

arxiv: 2605.15388 · v1 · pith:3JOTCEYRnew · submitted 2026-05-14 · 💻 cs.LG

Unified High-Probability Analysis of Stochastic Variance-Reduced Estimation

Pith reviewed 2026-05-19 16:34 UTC · model grok-4.3

classification 💻 cs.LG
keywords stochastic optimizationvariance reductionhigh-probability boundsFreedman inequalitymirror descentexpectation constraintsoracle complexitymartingale concentration
0
0 comments X

The pith

A unified recursion with memory, reset, and correction terms yields high-probability bounds for variance-reduced stochastic estimators in normed spaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a recursive framework for stochastic variance-reduced estimation built from memory retention, a reset probability, and a correction term that accounts for changes in the iterate. This single structure recovers classical methods such as SPIDER, STORM, and PAGE, supports a bias-variance decomposition of estimation error, and motivates new second-order variants. The central result is a unified high-probability bound on the estimation error, obtained via a new dimension-free vector-valued Freedman inequality that applies to random sums of vector martingales in smooth normed spaces. The bound holds in both Euclidean and non-Euclidean geometries, including Banach spaces used by mirror descent. As a direct consequence the work establishes the first Õ(ε^{-3}) oracle-complexity guarantee for stochastic optimization problems that include expectation constraints, improving on the previous Õ(ε^{-4}) rate.

Core claim

The central claim is that the three-component recursive estimator satisfies a high-probability bound on its estimation error, proved using a newly introduced dimension-free vector-valued Freedman inequality for random sums of vector martingales in smooth normed spaces; this bound applies equally in Euclidean and Banach settings and directly produces improved high-probability oracle complexities for both unconstrained mirror-descent optimization and stochastic optimization with expectation constraints.

What carries the argument

The recursion defined by memory retention, reset probability, and iterate-movement correction, together with the dimension-free vector-valued Freedman inequality for vector martingale sums.

If this is right

  • The framework recovers classical estimators including momentum, SPIDER, STORM, and PAGE.
  • It motivates new second-order variants of variance-reduced estimators.
  • High-probability oracle complexities are obtained for unconstrained mirror-descent optimization with only logarithmic dependence on the confidence level.
  • The first Õ(ε^{-3}) oracle-complexity bounds are derived for stochastic optimization with expectation constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dimension-free character of the inequality suggests the bounds remain useful even when dimension grows without introducing extra logarithmic factors.
  • The same recursive structure and inequality may be applied to derive high-probability guarantees for other families of stochastic algorithms beyond variance reduction.
  • The extension to Banach spaces indicates possible use in optimization problems whose geometry is not captured by Euclidean or Hilbert-space norms.

Load-bearing premise

The newly introduced dimension-free vector-valued Freedman inequality holds for random sums of vector martingales in smooth normed spaces, including Banach spaces.

What would settle it

A concrete counterexample consisting of a vector martingale sequence in a smooth normed space for which the claimed high-probability bound on the sum is violated would disprove the central result.

read the original abstract

Stochastic estimators are fundamental to large-scale optimization, where population quantities must be inferred from noisy oracle observations. Although influential methods such as momentum, SPIDER, STORM, and PAGE have been highly successful, their analyses are largely estimator-specific and expectation-based, obscuring the structural tradeoffs that determine reliability. In this paper, we develop a unified framework for stochastic variance-reduced estimation based on a recursion with three components: memory retention, reset probability, and a correction term for iterate movement. This framework recovers several classical estimators, motivates new second-order variants, and yields a bias-variance decomposition of estimation error. Our main result is a unified high-probability bound proved using a new dimension-free vector-valued Freedman inequality, valid for smooth normed spaces involving random sums of vector martingales. The result applies in both Euclidean and non-Euclidean settings, including the analysis of mirror-descent-based methods in Banach spaces. As applications, we obtain high-probability oracle complexities for unconstrained optimization with mirror descent, establishing the logarithmic dependence on the confidence level. We also derive the first $\tilde{\mathcal{O}}(\varepsilon^{-3})$ oracle-complexity bounds for stochastic optimization with expectation constraints, improving upon the existing $\tilde{\mathcal{O}}(\varepsilon^{-4})$ complexity by leveraging variance-reduced estimation for the first time in this setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a unified recursion framework for stochastic variance-reduced estimation consisting of memory retention, reset probability, and a correction term for iterate movement. This recovers classical estimators (momentum, SPIDER, STORM, PAGE), motivates new second-order variants, and yields a bias-variance decomposition. The central result is a unified high-probability bound obtained via a new dimension-free vector-valued Freedman inequality for random sums of vector martingales in smooth normed spaces (including Banach spaces). Applications give high-probability oracle complexities for mirror-descent unconstrained optimization (with logarithmic dependence on confidence level) and the first Õ(ε^{-3}) oracle complexity for stochastic optimization with expectation constraints, improving on prior Õ(ε^{-4}) bounds.

Significance. If the new vector-valued Freedman inequality holds with the claimed dimension-free property in smooth normed spaces, the work provides a valuable structural unification of high-probability analyses for variance-reduced methods, extending beyond Euclidean settings to mirror descent in Banach spaces. The bias-variance decomposition and recovery of multiple estimators clarify trade-offs, while the improved complexity for expectation-constrained problems is a concrete advance. The framework's generality could influence subsequent analyses in stochastic optimization.

major comments (2)
  1. [§4] §4 (Theorem 4.1, the dimension-free vector-valued Freedman inequality): the proof must explicitly confirm that the tail bound remains free of hidden dependence on the modulus of smoothness or separability assumptions when applied to the non-Euclidean norms arising in mirror descent; otherwise the downstream Õ(ε^{-3}) claim for expectation-constrained optimization (Theorem 5.4) cannot be guaranteed without additional regularity not controlled in the final expression.
  2. [§5.3] §5.3 (oracle complexity for expectation-constrained problems): the reduction from the unified high-probability bound to the Õ(ε^{-3}) rate assumes the martingale increments satisfy the exact conditions of the new inequality without extra logarithmic factors from the constraint functions; this step should be expanded with an explicit error propagation argument to substantiate the improvement over prior Õ(ε^{-4}) results.
minor comments (2)
  1. [§3] Notation for the reset probability and correction term in the unified recursion (Definition 3.1) should be aligned with the bias-variance decomposition in Lemma 3.2 to avoid ambiguity when recovering SPIDER and STORM.
  2. [§6] Figure 1 (comparison of estimators) lacks error bars or explicit post-hoc selection details; adding these would strengthen the empirical support for the high-probability claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed review and valuable feedback on our manuscript. We have carefully considered the major comments and provide point-by-point responses below. We will make the suggested clarifications in the revised version.

read point-by-point responses
  1. Referee: [§4] §4 (Theorem 4.1, the dimension-free vector-valued Freedman inequality): the proof must explicitly confirm that the tail bound remains free of hidden dependence on the modulus of smoothness or separability assumptions when applied to the non-Euclidean norms arising in mirror descent; otherwise the downstream Õ(ε^{-3}) claim for expectation-constrained optimization (Theorem 5.4) cannot be guaranteed without additional regularity not controlled in the final expression.

    Authors: We thank the referee for highlighting this important aspect of the proof. The vector-valued Freedman inequality in Theorem 4.1 is derived in a manner that ensures the tail bound is dimension-free and does not depend on the modulus of smoothness or separability assumptions beyond the standard smoothness of the norm, which is already part of the setup for mirror descent in Banach spaces. The constants in the bound are uniform and independent of these parameters in the final expression. To address the concern explicitly, we will revise the proof in the appendix to include a dedicated paragraph confirming this independence and its implications for the mirror descent application. This will also strengthen the guarantee for the complexity in Theorem 5.4. revision: yes

  2. Referee: [§5.3] §5.3 (oracle complexity for expectation-constrained problems): the reduction from the unified high-probability bound to the Õ(ε^{-3}) rate assumes the martingale increments satisfy the exact conditions of the new inequality without extra logarithmic factors from the constraint functions; this step should be expanded with an explicit error propagation argument to substantiate the improvement over prior Õ(ε^{-4}) results.

    Authors: We agree that an explicit error propagation argument would improve the clarity of the reduction in Section 5.3. In the revised manuscript, we will add a detailed subsection or paragraph that propagates the errors from the constraint functions through the high-probability bound, demonstrating that no additional logarithmic factors are introduced beyond those already present in the unified bound. This will rigorously justify the Õ(ε^{-3}) oracle complexity and highlight the improvement over previous Õ(ε^{-4}) results. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained; new inequality introduced independently

full rationale

The paper introduces a recursion for variance-reduced estimators and derives a unified high-probability bound by applying a newly stated dimension-free vector-valued Freedman inequality for martingales in smooth normed spaces. This inequality is positioned as an independent technical contribution that enables the bound and downstream oracle complexities (including the Õ(ε^{-3}) result). No quoted equations or steps reduce the main result to a fitted parameter, a self-citation chain, or a definitional renaming; the framework recovers known estimators via the recursion but does not force the high-probability claim by construction. The derivation is therefore self-contained against external benchmarks once the new inequality is accepted.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only view limits visibility; the framework relies on standard smoothness and normed-space assumptions plus the new inequality as a mathematical tool.

axioms (1)
  • domain assumption Smoothness assumptions in normed spaces for the vector martingale analysis
    Invoked to ensure the new Freedman inequality applies to both Euclidean and Banach-space mirror-descent settings.

pith-pipeline@v0.9.0 · 5784 in / 1227 out tokens · 44254 ms · 2026-05-19T16:34:21.800788+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

155 extracted references · 155 canonical work pages · 6 internal anchors

  1. [1]

    Global update tracking: A decentralized learning algorithm for heterogeneous data.Advances in neural information processing systems, 36:48939–48961, 2023

    Sai Aparna Aketi, Abolfazl Hashemi, and Kaushik Roy. Global update tracking: A decentralized learning algorithm for heterogeneous data.Advances in neural information processing systems, 36:48939–48961, 2023

  2. [2]

    Towards weaker variance assumptions for stochastic optimization.arXiv preprint arXiv:2504.09951, 2025

    Ahmet Alacaoglu, Yura Malitsky, and Stephen J Wright. Towards weaker variance assumptions for stochastic optimization.arXiv preprint arXiv:2504.09951, 2025

  3. [3]

    On modification of an adaptive stochastic mirror descent algorithm for convex opti- mization problems with functional constraints

    Mohammad S Alkousa. On modification of an adaptive stochastic mirror descent algorithm for convex opti- mization problems with functional constraints. InComputational Mathematics and Applications, pages 47–63. Springer, 2020

  4. [4]

    Neon2: Finding local minima via first-order oracles

    Zeyuan Allen-Zhu and Yuanzhi Li. Neon2: Finding local minima via first-order oracles. InAdvances in Neural Information Processing Systems, pages 3716–3726, 2018

  5. [5]

    Springer, Cham, Switzerland, second edition, 2021

    Luigi Ambrosio, Elia Brué, and Daniele Semola.Lectures on optimal transport. Springer, Cham, Switzerland, second edition, 2021

  6. [6]

    Second-order information in non-convex stochastic optimization: Power and limitations

    Yossi Arjevani, Yair Carmon, John C Duchi, Dylan J Foster, Ayush Sekhari, and Karthik Sridharan. Second-order information in non-convex stochastic optimization: Power and limitations. InConference on Learning Theory, pages 242–299. PMLR, 2020

  7. [7]

    Lower bounds for non-convex stochastic optimization.Mathematical Programming, pages 1–50, 2022

    Yossi Arjevani, Yair Carmon, John C Duchi, Dylan J Foster, Nathan Srebro, and Blake Woodworth. Lower bounds for non-convex stochastic optimization.Mathematical Programming, pages 1–50, 2022

  8. [8]

    Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity.SIAM Journal on Optimization, 29(3):2257–2290, 2019

    Hilal Asi and John C Duchi. Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity.SIAM Journal on Optimization, 29(3):2257–2290, 2019

  9. [9]

    Non-asymptotic analysis of stochastic approximation algorithms for machine learning

    Francis Bach and Eric Moulines. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. InAdvances in Neural Information Processing Systems, volume 24, 2011

  10. [10]

    Federated learning review: Fundamentals, enabling technologies, and future applications.Information processing & management, 59(6):103061, 2022

    Syreen Banabilah, Moayad Aloqaily, Eitaa Alsayed, Nida Malik, and Yaser Jararweh. Federated learning review: Fundamentals, enabling technologies, and future applications.Information processing & management, 59(6):103061, 2022

  11. [11]

    Springer Science & Business Media, 2012

    Viorel Barbu and Teodor Precupanu.Convexity and optimization in Banach spaces. Springer Science & Business Media, 2012

  12. [12]

    Bregman monotone optimization algorithms

    Heinz H Bauschke, Jonathan M Borwein, and Patrick L Combettes. Bregman monotone optimization algorithms. SIAM Journal on control and optimization, 42(2):596–636, 2003

  13. [13]

    Mirror descent and convex optimization problems with non-smooth inequality constraints

    Anastasia Bayandina, Pavel Dvurechensky, Alexander Gasnikov, Fedor Stonyakin, and Alexander Titov. Mirror descent and convex optimization problems with non-smooth inequality constraints. InLarge-scale and distributed optimization, pages 181–213. Springer, 2018

  14. [14]

    Mirror descent and nonlinear projected subgradient methods for convex optimization.Operations Research Letters, 31(3):167–175, 2003

    Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization.Operations Research Letters, 31(3):167–175, 2003

  15. [15]

    Academic press, 2014

    Dimitri P Bertsekas.Constrained optimization and Lagrange multiplier methods. Academic press, 2014

  16. [16]

    Approximation methods which converge with probability one.The Annals of Mathematical Statistics, pages 382–386, 1954

    Julius R Blum. Approximation methods which converge with probability one.The Annals of Mathematical Statistics, pages 382–386, 1954

  17. [17]

    Optimal primal-dual algorithm with last iterate convergence guarantees for stochastic convex optimization problems.arXiv preprint arXiv:2410.18513, 2024

    Digvijay Boob and Mohammad Khalafi. Optimal primal-dual algorithm with last iterate convergence guarantees for stochastic convex optimization problems.arXiv preprint arXiv:2410.18513, 2024

  18. [18]

    Variance reduction.Wiley statsRef: Statistics reference online, pages 1–6, 2017

    Zdravko Botev and Ad Ridder. Variance reduction.Wiley statsRef: Statistics reference online, pages 1–6, 2017

  19. [19]

    Optimization methods for large-scale learning.SIAM Review, 60(2):223–311, 2018

    Léon Bottou, Frank E Curtis, and Jorge Nocedal. Optimization methods for large-scale learning.SIAM Review, 60(2):223–311, 2018. 10

  20. [20]

    Oxford University Press, Oxford, UK, 2013

    Stéphane Boucheron, Gábor Lugosi, and Pascal Massart.Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford, UK, 2013

  21. [21]

    Springer, 2011

    Haim Brezis and Haim Brézis.Functional analysis, Sobolev spaces and partial differential equations, volume 2. Springer, 2011

  22. [22]

    A proximal point method for the variational inequality problem in banach spaces.SIAM Journal on Control and Optimization, 39(5):1633–1649, 2000

    Regina Sandra Burachik and Susana Scheimberg. A proximal point method for the variational inequality problem in banach spaces.SIAM Journal on Control and Optimization, 39(5):1633–1649, 2000

  23. [23]

    Making sgd parameter-free

    Yair Carmon and Oliver Hinder. Making sgd parameter-free. InConference on Learning Theory, pages 2360–2389. PMLR, 2022

  24. [24]

    A first-order primal-dual algorithm for convex problems with applications to imaging.Journal of mathematical imaging and vision, 40:120–145, 2011

    Antonin Chambolle and Thomas Pock. A first-order primal-dual algorithm for convex problems with applications to imaging.Journal of mathematical imaging and vision, 40:120–145, 2011

  25. [25]

    Federated learning empowered recommendation model for financial consumer services.IEEE Transactions on Consumer Electronics, 70(1):2508–2516, 2023

    Pushpita Chatterjee, Debashis Das, and Danda B Rawat. Federated learning empowered recommendation model for financial consumer services.IEEE Transactions on Consumer Electronics, 70(1):2508–2516, 2023

  26. [26]

    Fednmut–federated noisy model update tracking convergence analysis.arXiv preprint arXiv:2403.13247, 2024

    Vishnu Pandi Chellapandi, Antesh Upadhyay, Abolfazl Hashemi, and Stanislaw H ˙Zak. Fednmut–federated noisy model update tracking convergence analysis.arXiv preprint arXiv:2403.13247, 2024

  27. [27]

    Communication-efficient variance-reduced decentralized stochastic optimization over time-varying directed graphs.IEEE Transactions on Automatic Control, 2021

    Yiyue Chen, Abolfazl Hashemi, and Haris Vikalo. Communication-efficient variance-reduced decentralized stochastic optimization over time-varying directed graphs.IEEE Transactions on Automatic Control, 2021

  28. [28]

    Cosson, A

    R. Cosson, A. Jadbabaie, A. Makur, A. Reisizadeh, and D. Shah. Gradient descent with low-rank objective functions. InProceedings of the 62nd IEEE Conference on Decision and Control (CDC), pages 3309–3314, Singapore, December 13–15 2023

  29. [29]

    Cosson, A

    R. Cosson, A. Jadbabaie, A. Makur, A. Reisizadeh, and D. Shah. Low-rank gradient descent.IEEE Open Journal of Control Systems, 2:380–395, October 2023

  30. [30]

    Shisheng Cui and Uday V Shanbhag. On the analysis of variance-reduced and randomized projection variants of single projection schemes for monotone stochastic variational inequality problems.Set-Valued and Variational Analysis, 29:453–499, 2021

  31. [31]

    Momentum improves normalized sgd

    Ashok Cutkosky and Harsh Mehta. Momentum improves normalized sgd. InInternational conference on machine learning, pages 2260–2268. PMLR, 2020

  32. [32]

    High-probability bounds for non-convex stochastic optimization with heavy tails.Advances in Neural Information Processing Systems, 34:4883–4895, 2021

    Ashok Cutkosky and Harsh Mehta. High-probability bounds for non-convex stochastic optimization with heavy tails.Advances in Neural Information Processing Systems, 34:4883–4895, 2021

  33. [33]

    Momentum-based variance reduction in non-convex sgd.Advances in neural information processing systems, 32, 2019

    Ashok Cutkosky and Francesco Orabona. Momentum-based variance reduction in non-convex sgd.Advances in neural information processing systems, 32, 2019

  34. [34]

    Faster non-convex federated learning via global and local momentum

    Rudrajit Das, Anish Acharya, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S Dhillon, and Ufuk Topcu. Faster non-convex federated learning via global and local momentum. InUncertainty in Artificial Intelligence, pages 496–506. PMLR, 2022

  35. [35]

    Stochastic model-based minimization of weakly convex functions

    Damek Davis and Dmitriy Drusvyatskiy. Stochastic model-based minimization of weakly convex functions. SIAM Journal on Optimization, 29(1):207–239, 2019

  36. [36]

    Saga: A fast incremental gradient method with support for non-strongly convex composite objectives.Advances in neural information processing systems, 27, 2014

    Aaron Defazio, Francis Bach, and Simon Lacoste-Julien. Saga: A fast incremental gradient method with support for non-strongly convex composite objectives.Advances in neural information processing systems, 27, 2014

  37. [37]

    Learning-rate-free learning by d-adaptation

    Aaron Defazio and Konstantin Mishchenko. Learning-rate-free learning by d-adaptation. InInternational Conference on Machine Learning, pages 7449–7479. PMLR, 2023

  38. [38]

    Provable convergence guarantees for black-box variational inference

    Justin Domke, Robert Gower, and Guillaume Garrigos. Provable convergence guarantees for black-box variational inference. InAdvances in Neural Information Processing Systems, volume 36, pages 66289–66327, 2023. 11

  39. [39]

    The complexity of finding stationary points with stochastic gradient descent.arXiv preprint arXiv:1910.01845, 2019

    Yoel Drori and Ohad Shamir. The complexity of finding stationary points with stochastic gradient descent.arXiv preprint arXiv:1910.01845, 2019

  40. [40]

    Efficiency of minimizing compositions of convex functions and smooth maps.Mathematical Programming, 178(1):503–558, 2019

    Dmitriy Drusvyatskiy and Courtney Paquette. Efficiency of minimizing compositions of convex functions and smooth maps.Mathematical Programming, 178(1):503–558, 2019

  41. [41]

    Adaptive subgradient methods for online learning and stochastic optimization.Journal of machine learning research, 12(7), 2011

    John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization.Journal of machine learning research, 12(7), 2011

  42. [42]

    SIAM, 1999

    Ivar Ekeland and Roger Temam.Convex analysis and variational problems. SIAM, 1999

  43. [43]

    Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator.Advances in neural information processing systems, 31, 2018

    Cong Fang, Chris Junchi Li, Zhouchen Lin, and Tong Zhang. Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator.Advances in neural information processing systems, 31, 2018

  44. [44]

    Sharp analysis for nonconvex SGD escaping from saddle points

    Cong Fang, Zhouchen Lin, and Tong Zhang. Sharp analysis for nonconvex SGD escaping from saddle points. In Conference on Learning Theory, pages 1192–1234. PMLR, 2019

  45. [45]

    Lower Bounds and Proximally Anchored SGD for Non-Convex Minimization Under Unbounded Variance

    Arda Fazla, Ege C. Kaya, Antesh Upadhyay, and Abolfazl Hashemi. Lower bounds and proximally anchored SGD for non-convex minimization under unbounded variance.arXiv preprint arXiv:2604.16620, 2026

  46. [46]

    Stochastic first-and zeroth-order methods for nonconvex stochastic program- ming.SIAM journal on optimization, 23(4):2341–2368, 2013

    Saeed Ghadimi and Guanghui Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic program- ming.SIAM journal on optimization, 23(4):2341–2368, 2013

  47. [47]

    Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization.Mathematical Programming, 155(1):267–305, 2016

    Saeed Ghadimi, Guanghui Lan, and Hongchao Zhang. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization.Mathematical Programming, 155(1):267–305, 2016

  48. [48]

    On stochastic approximation.Theory of Probability & Its Applications, 10(2):275–278, 1965

    EG Gladyshev. On stochastic approximation.Theory of Probability & Its Applications, 10(2):275–278, 1965

  49. [49]

    High- probability complexity bounds for non-smooth stochastic convex optimization with heavy-tailed noise.Journal of Optimization Theory and Applications, 203(3):2679–2738, 2024

    Eduard Gorbunov, Marina Danilova, Innokentiy Shibaev, Pavel Dvurechensky, and Alexander Gasnikov. High- probability complexity bounds for non-smooth stochastic convex optimization with heavy-tailed noise.Journal of Optimization Theory and Applications, 203(3):2679–2738, 2024

  50. [50]

    A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent

    Eduard Gorbunov, Filip Hanzely, and Peter Richtárik. A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent. InInternational Conference on Artificial Intelligence and Statistics, pages 680–690. PMLR, 2020

  51. [51]

    High-probability convergence for composite and distributed stochastic minimization and variational inequalities with heavy-tailed noise.arXiv preprint arXiv:2310.01860, 2023

    Eduard Gorbunov, Abdurakhmon Sadiev, Marina Danilova, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, and Peter Richtárik. High-probability convergence for composite and distributed stochastic minimization and variational inequalities with heavy-tailed noise.arXiv preprint arXiv:2310.01860, 2023

  52. [52]

    Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity.SIAM Journal on Optimization, 29(2):1350–1365, 2019

    Benjamin Grimmer. Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity.SIAM Journal on Optimization, 29(2):1350–1365, 2019

  53. [53]

    Disa: A dual inexact splitting algorithm for distributed convex composite optimization.IEEE Transactions on Automatic Control, 69(5):2995–3010, 2023

    Luyao Guo, Xinli Shi, Shaofu Yang, and Jinde Cao. Disa: A dual inexact splitting algorithm for distributed convex composite optimization.IEEE Transactions on Automatic Control, 69(5):2995–3010, 2023

  54. [54]

    The heavy-tail phenomenon in SGD

    Mert Gurbuzbalaban, Umut Simsekli, and Lingjiong Zhu. The heavy-tail phenomenon in SGD. InInternational Conference on Machine Learning, pages 3964–3975. PMLR, 2021

  55. [55]

    Comparison of optimization techniques based on gradient descent algorithm: A review.PalArch’s Journal of Archaeology of Egypt/Egyptology, 18(4):2715–2743, 2021

    Saad Hikmat Haji and Adnan Mohsin Abdulazeez. Comparison of optimization techniques based on gradient descent algorithm: A review.PalArch’s Journal of Archaeology of Egypt/Egyptology, 18(4):2715–2743, 2021

  56. [56]

    A primal-dual algorithm with line search for general convex-concave saddle point problems.SIAM Journal on Optimization, 31(2):1299–1329, 2021

    Erfan Yazdandoost Hamedani and Necdet Serhat Aybat. A primal-dual algorithm with line search for general convex-concave saddle point problems.SIAM Journal on Optimization, 31(2):1299–1329, 2021

  57. [57]

    J. M. Hammersley and J. G. Mauldon. General principles of antithetic variates.Mathematical Proceedings of the Cambridge Philosophical Society, 52(3):476–481, July 1956

  58. [58]

    J. M. Hammersley and K. W. Morton. A new Monte Carlo technique: Antithetic variates.Mathematical Proceedings of the Cambridge Philosophical Society, 52(3):449–475, July 1956. 12

  59. [59]

    Tight analyses for non-smooth stochastic gradient descent

    Nicholas JA Harvey, Christopher Liaw, Yaniv Plan, and Sikander Randhawa. Tight analyses for non-smooth stochastic gradient descent. InConference on Learning Theory, pages 1579–1613. PMLR, 2019

  60. [60]

    Hashemi, D

    A. Hashemi, D. Lee, and A. Makur. Strong antithetic variates: Theory and applications. Preprint, pp. 1–13, 2025

  61. [61]

    A unified model for large-scale inexact fixed-point iteration: A stochastic optimization perspective.IEEE Transactions on Automatic Control, 70(4):2435–2449, 2024

    Abolfazl Hashemi. A unified model for large-scale inexact fixed-point iteration: A stochastic optimization perspective.IEEE Transactions on Automatic Control, 70(4):2435–2449, 2024

  62. [62]

    RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation

    Abolfazl Hashemi. Rampage: Randomized mid-point for debiased gradient extrapolation.arXiv preprint arXiv:2603.22155, 2026

  63. [63]

    Strong antithetic variance reduction inequalities

    Abolfazl Hashemi, Dogmin Lee, and Anuran Makur. Strong antithetic variance reduction inequalities. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), pages 1–6, Ann Arbor, MI, USA, June 22–27 2025

  64. [64]

    Resilient constrained learning.Advances in Neural Information Processing Systems, 36:71767–71798, 2023

    Ignacio Hounie, Alejandro Ribeiro, and Luiz FO Chamon. Resilient constrained learning.Advances in Neural Information Processing Systems, 36:71767–71798, 2023

  65. [65]

    Yankun Huang and Qihang Lin. Oracle complexity of single-loop switching subgradient methods for non- smooth weakly convex functional constrained optimization.Advances in Neural Information Processing Systems, 36:61327–61340, 2023

  66. [66]

    Distributed proximal algorithms for nonsmooth optimiza- tion: Unified convergence analysis.IEEE Transactions on Automatic Control, 2025

    Yi Huang, Shisheng Cui, Jian Sun, and Ziyang Meng. Distributed proximal algorithms for nonsmooth optimiza- tion: Unified convergence analysis.IEEE Transactions on Automatic Control, 2025

  67. [67]

    Distributed multiproximal algorithm for nonsmooth convex optimization with coupled inequality constraints.IEEE Transactions on Automatic Control, 68(12):8126–8133, 2023

    Yi Huang, Ziyang Meng, Jian Sun, and Wei Ren. Distributed multiproximal algorithm for nonsmooth convex optimization with coupled inequality constraints.IEEE Transactions on Automatic Control, 68(12):8126–8133, 2023

  68. [68]

    Accelerated stochastic approximation with state-dependent noise.arXiv preprint arXiv:2307.01497, 2023

    Sasila Ilandarideva, Anatoli Juditsky, Guanghui Lan, and Tianjiao Li. Accelerated stochastic approximation with state-dependent noise.arXiv preprint arXiv:2307.01497, 2023

  69. [69]

    Safe-ef: Error feedback for non-smooth constrained optimiza- tion

    Rustem Islamov, Yarden As, and Ilyas Fatkhullin. Safe-ef: Error feedback for non-smooth constrained optimiza- tion. InForty-second International Conference on Machine Learning, 2025

  70. [70]

    Safe-EF: Error feedback for non-smooth constrained optimiza- tion

    Rustem Islamov, Yarden As, and Ilyas Fatkhullin. Safe-EF: Error feedback for non-smooth constrained optimiza- tion. InForty-second International Conference on Machine Learning, 2025

  71. [71]

    Unconstrained online learning with unbounded losses

    Andrew Jacobsen and Ashok Cutkosky. Unconstrained online learning with unbounded losses. InInternational Conference on Machine Learning, pages 14590–14630. PMLR, 2023

  72. [72]

    Jadbabaie, A

    A. Jadbabaie, A. Makur, and A. Reisizadeh. Adaptive low-rank gradient descent. InProceedings of the 62nd IEEE Conference on Decision and Control (CDC), pages 3315–3320, Singapore, December 13–15 2023

  73. [73]

    Jadbabaie, A

    A. Jadbabaie, A. Makur, and D. Shah. Federated optimization of smooth loss functions.IEEE Transactions on Information Theory, 69(12):7836–7866, December 2023

  74. [74]

    Jadbabaie, A

    A. Jadbabaie, A. Makur, and D. Shah. Gradient-based empirical risk minimization using local polynomial regression.Stochastic Systems, 14(4):363–402, December 2024

  75. [75]

    Prateek Jain, Sham M Kakade, Rahul Kidambi, Praneeth Netrapalli, and Aaron Sidford. Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification.Journal of machine learning research, 18(223):1–42, 2018

  76. [76]

    First-order methods for nonsmooth nonconvex functional constrained optimization with or without slater points.SIAM Journal on Optimization, 35(2):1300–1329, 2025

    Zhichao Jia and Benjamin Grimmer. First-order methods for nonsmooth nonconvex functional constrained optimization with or without slater points.SIAM Journal on Optimization, 35(2):1300–1329, 2025

  77. [77]

    Accelerating stochastic gradient descent using predictive variance reduction

    Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems, 26, 2013. 13

  78. [78]

    Large deviations of vector-valued martingales in 2-smooth normed spaces.arXiv preprint arXiv:0809.0813, 2008

    Anatoli Juditsky and Arkadii S Nemirovski. Large deviations of vector-valued martingales in 2-smooth normed spaces.arXiv preprint arXiv:0809.0813, 2008

  79. [79]

    Kahn and A

    H. Kahn and A. W. Marshall. Methods of reducing sample size in Monte Carlo computations.Journal of the Operations Research Society of America, 1(5):263–278, November 1953

  80. [80]

    Better theory for SGD in the nonconvex world.Transactions on Machine Learning Research, 2023

    Ahmed Khaled and Peter Richtárik. Better theory for SGD in the nonconvex world.Transactions on Machine Learning Research, 2023

Showing first 80 references.