Unified High-Probability Analysis of Stochastic Variance-Reduced Estimation
Pith reviewed 2026-05-19 16:34 UTC · model grok-4.3
The pith
A unified recursion with memory, reset, and correction terms yields high-probability bounds for variance-reduced stochastic estimators in normed spaces.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the three-component recursive estimator satisfies a high-probability bound on its estimation error, proved using a newly introduced dimension-free vector-valued Freedman inequality for random sums of vector martingales in smooth normed spaces; this bound applies equally in Euclidean and Banach settings and directly produces improved high-probability oracle complexities for both unconstrained mirror-descent optimization and stochastic optimization with expectation constraints.
What carries the argument
The recursion defined by memory retention, reset probability, and iterate-movement correction, together with the dimension-free vector-valued Freedman inequality for vector martingale sums.
If this is right
- The framework recovers classical estimators including momentum, SPIDER, STORM, and PAGE.
- It motivates new second-order variants of variance-reduced estimators.
- High-probability oracle complexities are obtained for unconstrained mirror-descent optimization with only logarithmic dependence on the confidence level.
- The first Õ(ε^{-3}) oracle-complexity bounds are derived for stochastic optimization with expectation constraints.
Where Pith is reading between the lines
- The dimension-free character of the inequality suggests the bounds remain useful even when dimension grows without introducing extra logarithmic factors.
- The same recursive structure and inequality may be applied to derive high-probability guarantees for other families of stochastic algorithms beyond variance reduction.
- The extension to Banach spaces indicates possible use in optimization problems whose geometry is not captured by Euclidean or Hilbert-space norms.
Load-bearing premise
The newly introduced dimension-free vector-valued Freedman inequality holds for random sums of vector martingales in smooth normed spaces, including Banach spaces.
What would settle it
A concrete counterexample consisting of a vector martingale sequence in a smooth normed space for which the claimed high-probability bound on the sum is violated would disprove the central result.
read the original abstract
Stochastic estimators are fundamental to large-scale optimization, where population quantities must be inferred from noisy oracle observations. Although influential methods such as momentum, SPIDER, STORM, and PAGE have been highly successful, their analyses are largely estimator-specific and expectation-based, obscuring the structural tradeoffs that determine reliability. In this paper, we develop a unified framework for stochastic variance-reduced estimation based on a recursion with three components: memory retention, reset probability, and a correction term for iterate movement. This framework recovers several classical estimators, motivates new second-order variants, and yields a bias-variance decomposition of estimation error. Our main result is a unified high-probability bound proved using a new dimension-free vector-valued Freedman inequality, valid for smooth normed spaces involving random sums of vector martingales. The result applies in both Euclidean and non-Euclidean settings, including the analysis of mirror-descent-based methods in Banach spaces. As applications, we obtain high-probability oracle complexities for unconstrained optimization with mirror descent, establishing the logarithmic dependence on the confidence level. We also derive the first $\tilde{\mathcal{O}}(\varepsilon^{-3})$ oracle-complexity bounds for stochastic optimization with expectation constraints, improving upon the existing $\tilde{\mathcal{O}}(\varepsilon^{-4})$ complexity by leveraging variance-reduced estimation for the first time in this setting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a unified recursion framework for stochastic variance-reduced estimation consisting of memory retention, reset probability, and a correction term for iterate movement. This recovers classical estimators (momentum, SPIDER, STORM, PAGE), motivates new second-order variants, and yields a bias-variance decomposition. The central result is a unified high-probability bound obtained via a new dimension-free vector-valued Freedman inequality for random sums of vector martingales in smooth normed spaces (including Banach spaces). Applications give high-probability oracle complexities for mirror-descent unconstrained optimization (with logarithmic dependence on confidence level) and the first Õ(ε^{-3}) oracle complexity for stochastic optimization with expectation constraints, improving on prior Õ(ε^{-4}) bounds.
Significance. If the new vector-valued Freedman inequality holds with the claimed dimension-free property in smooth normed spaces, the work provides a valuable structural unification of high-probability analyses for variance-reduced methods, extending beyond Euclidean settings to mirror descent in Banach spaces. The bias-variance decomposition and recovery of multiple estimators clarify trade-offs, while the improved complexity for expectation-constrained problems is a concrete advance. The framework's generality could influence subsequent analyses in stochastic optimization.
major comments (2)
- [§4] §4 (Theorem 4.1, the dimension-free vector-valued Freedman inequality): the proof must explicitly confirm that the tail bound remains free of hidden dependence on the modulus of smoothness or separability assumptions when applied to the non-Euclidean norms arising in mirror descent; otherwise the downstream Õ(ε^{-3}) claim for expectation-constrained optimization (Theorem 5.4) cannot be guaranteed without additional regularity not controlled in the final expression.
- [§5.3] §5.3 (oracle complexity for expectation-constrained problems): the reduction from the unified high-probability bound to the Õ(ε^{-3}) rate assumes the martingale increments satisfy the exact conditions of the new inequality without extra logarithmic factors from the constraint functions; this step should be expanded with an explicit error propagation argument to substantiate the improvement over prior Õ(ε^{-4}) results.
minor comments (2)
- [§3] Notation for the reset probability and correction term in the unified recursion (Definition 3.1) should be aligned with the bias-variance decomposition in Lemma 3.2 to avoid ambiguity when recovering SPIDER and STORM.
- [§6] Figure 1 (comparison of estimators) lacks error bars or explicit post-hoc selection details; adding these would strengthen the empirical support for the high-probability claims.
Simulated Author's Rebuttal
Thank you for the detailed review and valuable feedback on our manuscript. We have carefully considered the major comments and provide point-by-point responses below. We will make the suggested clarifications in the revised version.
read point-by-point responses
-
Referee: [§4] §4 (Theorem 4.1, the dimension-free vector-valued Freedman inequality): the proof must explicitly confirm that the tail bound remains free of hidden dependence on the modulus of smoothness or separability assumptions when applied to the non-Euclidean norms arising in mirror descent; otherwise the downstream Õ(ε^{-3}) claim for expectation-constrained optimization (Theorem 5.4) cannot be guaranteed without additional regularity not controlled in the final expression.
Authors: We thank the referee for highlighting this important aspect of the proof. The vector-valued Freedman inequality in Theorem 4.1 is derived in a manner that ensures the tail bound is dimension-free and does not depend on the modulus of smoothness or separability assumptions beyond the standard smoothness of the norm, which is already part of the setup for mirror descent in Banach spaces. The constants in the bound are uniform and independent of these parameters in the final expression. To address the concern explicitly, we will revise the proof in the appendix to include a dedicated paragraph confirming this independence and its implications for the mirror descent application. This will also strengthen the guarantee for the complexity in Theorem 5.4. revision: yes
-
Referee: [§5.3] §5.3 (oracle complexity for expectation-constrained problems): the reduction from the unified high-probability bound to the Õ(ε^{-3}) rate assumes the martingale increments satisfy the exact conditions of the new inequality without extra logarithmic factors from the constraint functions; this step should be expanded with an explicit error propagation argument to substantiate the improvement over prior Õ(ε^{-4}) results.
Authors: We agree that an explicit error propagation argument would improve the clarity of the reduction in Section 5.3. In the revised manuscript, we will add a detailed subsection or paragraph that propagates the errors from the constraint functions through the high-probability bound, demonstrating that no additional logarithmic factors are introduced beyond those already present in the unified bound. This will rigorously justify the Õ(ε^{-3}) oracle complexity and highlight the improvement over previous Õ(ε^{-4}) results. revision: yes
Circularity Check
Derivation chain is self-contained; new inequality introduced independently
full rationale
The paper introduces a recursion for variance-reduced estimators and derives a unified high-probability bound by applying a newly stated dimension-free vector-valued Freedman inequality for martingales in smooth normed spaces. This inequality is positioned as an independent technical contribution that enables the bound and downstream oracle complexities (including the Õ(ε^{-3}) result). No quoted equations or steps reduce the main result to a fitted parameter, a self-citation chain, or a definitional renaming; the framework recovers known estimators via the recursion but does not force the high-probability claim by construction. The derivation is therefore self-contained against external benchmarks once the new inequality is accepted.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Smoothness assumptions in normed spaces for the vector martingale analysis
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
unified high-probability bound proved using a new dimension-free vector-valued Freedman inequality, valid for smooth normed spaces involving random sums of vector martingales
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
three components: memory retention, reset probability, and a correction term
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Sai Aparna Aketi, Abolfazl Hashemi, and Kaushik Roy. Global update tracking: A decentralized learning algorithm for heterogeneous data.Advances in neural information processing systems, 36:48939–48961, 2023
work page 2023
-
[2]
Ahmet Alacaoglu, Yura Malitsky, and Stephen J Wright. Towards weaker variance assumptions for stochastic optimization.arXiv preprint arXiv:2504.09951, 2025
-
[3]
Mohammad S Alkousa. On modification of an adaptive stochastic mirror descent algorithm for convex opti- mization problems with functional constraints. InComputational Mathematics and Applications, pages 47–63. Springer, 2020
work page 2020
-
[4]
Neon2: Finding local minima via first-order oracles
Zeyuan Allen-Zhu and Yuanzhi Li. Neon2: Finding local minima via first-order oracles. InAdvances in Neural Information Processing Systems, pages 3716–3726, 2018
work page 2018
-
[5]
Springer, Cham, Switzerland, second edition, 2021
Luigi Ambrosio, Elia Brué, and Daniele Semola.Lectures on optimal transport. Springer, Cham, Switzerland, second edition, 2021
work page 2021
-
[6]
Second-order information in non-convex stochastic optimization: Power and limitations
Yossi Arjevani, Yair Carmon, John C Duchi, Dylan J Foster, Ayush Sekhari, and Karthik Sridharan. Second-order information in non-convex stochastic optimization: Power and limitations. InConference on Learning Theory, pages 242–299. PMLR, 2020
work page 2020
-
[7]
Lower bounds for non-convex stochastic optimization.Mathematical Programming, pages 1–50, 2022
Yossi Arjevani, Yair Carmon, John C Duchi, Dylan J Foster, Nathan Srebro, and Blake Woodworth. Lower bounds for non-convex stochastic optimization.Mathematical Programming, pages 1–50, 2022
work page 2022
-
[8]
Hilal Asi and John C Duchi. Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity.SIAM Journal on Optimization, 29(3):2257–2290, 2019
work page 2019
-
[9]
Non-asymptotic analysis of stochastic approximation algorithms for machine learning
Francis Bach and Eric Moulines. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. InAdvances in Neural Information Processing Systems, volume 24, 2011
work page 2011
-
[10]
Syreen Banabilah, Moayad Aloqaily, Eitaa Alsayed, Nida Malik, and Yaser Jararweh. Federated learning review: Fundamentals, enabling technologies, and future applications.Information processing & management, 59(6):103061, 2022
work page 2022
-
[11]
Springer Science & Business Media, 2012
Viorel Barbu and Teodor Precupanu.Convexity and optimization in Banach spaces. Springer Science & Business Media, 2012
work page 2012
-
[12]
Bregman monotone optimization algorithms
Heinz H Bauschke, Jonathan M Borwein, and Patrick L Combettes. Bregman monotone optimization algorithms. SIAM Journal on control and optimization, 42(2):596–636, 2003
work page 2003
-
[13]
Mirror descent and convex optimization problems with non-smooth inequality constraints
Anastasia Bayandina, Pavel Dvurechensky, Alexander Gasnikov, Fedor Stonyakin, and Alexander Titov. Mirror descent and convex optimization problems with non-smooth inequality constraints. InLarge-scale and distributed optimization, pages 181–213. Springer, 2018
work page 2018
-
[14]
Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization.Operations Research Letters, 31(3):167–175, 2003
work page 2003
-
[15]
Dimitri P Bertsekas.Constrained optimization and Lagrange multiplier methods. Academic press, 2014
work page 2014
-
[16]
Julius R Blum. Approximation methods which converge with probability one.The Annals of Mathematical Statistics, pages 382–386, 1954
work page 1954
-
[17]
Digvijay Boob and Mohammad Khalafi. Optimal primal-dual algorithm with last iterate convergence guarantees for stochastic convex optimization problems.arXiv preprint arXiv:2410.18513, 2024
-
[18]
Variance reduction.Wiley statsRef: Statistics reference online, pages 1–6, 2017
Zdravko Botev and Ad Ridder. Variance reduction.Wiley statsRef: Statistics reference online, pages 1–6, 2017
work page 2017
-
[19]
Optimization methods for large-scale learning.SIAM Review, 60(2):223–311, 2018
Léon Bottou, Frank E Curtis, and Jorge Nocedal. Optimization methods for large-scale learning.SIAM Review, 60(2):223–311, 2018. 10
work page 2018
-
[20]
Oxford University Press, Oxford, UK, 2013
Stéphane Boucheron, Gábor Lugosi, and Pascal Massart.Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford, UK, 2013
work page 2013
-
[21]
Haim Brezis and Haim Brézis.Functional analysis, Sobolev spaces and partial differential equations, volume 2. Springer, 2011
work page 2011
-
[22]
Regina Sandra Burachik and Susana Scheimberg. A proximal point method for the variational inequality problem in banach spaces.SIAM Journal on Control and Optimization, 39(5):1633–1649, 2000
work page 2000
-
[23]
Yair Carmon and Oliver Hinder. Making sgd parameter-free. InConference on Learning Theory, pages 2360–2389. PMLR, 2022
work page 2022
-
[24]
Antonin Chambolle and Thomas Pock. A first-order primal-dual algorithm for convex problems with applications to imaging.Journal of mathematical imaging and vision, 40:120–145, 2011
work page 2011
-
[25]
Pushpita Chatterjee, Debashis Das, and Danda B Rawat. Federated learning empowered recommendation model for financial consumer services.IEEE Transactions on Consumer Electronics, 70(1):2508–2516, 2023
work page 2023
-
[26]
Vishnu Pandi Chellapandi, Antesh Upadhyay, Abolfazl Hashemi, and Stanislaw H ˙Zak. Fednmut–federated noisy model update tracking convergence analysis.arXiv preprint arXiv:2403.13247, 2024
-
[27]
Yiyue Chen, Abolfazl Hashemi, and Haris Vikalo. Communication-efficient variance-reduced decentralized stochastic optimization over time-varying directed graphs.IEEE Transactions on Automatic Control, 2021
work page 2021
- [28]
- [29]
-
[30]
Shisheng Cui and Uday V Shanbhag. On the analysis of variance-reduced and randomized projection variants of single projection schemes for monotone stochastic variational inequality problems.Set-Valued and Variational Analysis, 29:453–499, 2021
work page 2021
-
[31]
Momentum improves normalized sgd
Ashok Cutkosky and Harsh Mehta. Momentum improves normalized sgd. InInternational conference on machine learning, pages 2260–2268. PMLR, 2020
work page 2020
-
[32]
Ashok Cutkosky and Harsh Mehta. High-probability bounds for non-convex stochastic optimization with heavy tails.Advances in Neural Information Processing Systems, 34:4883–4895, 2021
work page 2021
-
[33]
Ashok Cutkosky and Francesco Orabona. Momentum-based variance reduction in non-convex sgd.Advances in neural information processing systems, 32, 2019
work page 2019
-
[34]
Faster non-convex federated learning via global and local momentum
Rudrajit Das, Anish Acharya, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S Dhillon, and Ufuk Topcu. Faster non-convex federated learning via global and local momentum. InUncertainty in Artificial Intelligence, pages 496–506. PMLR, 2022
work page 2022
-
[35]
Stochastic model-based minimization of weakly convex functions
Damek Davis and Dmitriy Drusvyatskiy. Stochastic model-based minimization of weakly convex functions. SIAM Journal on Optimization, 29(1):207–239, 2019
work page 2019
-
[36]
Aaron Defazio, Francis Bach, and Simon Lacoste-Julien. Saga: A fast incremental gradient method with support for non-strongly convex composite objectives.Advances in neural information processing systems, 27, 2014
work page 2014
-
[37]
Learning-rate-free learning by d-adaptation
Aaron Defazio and Konstantin Mishchenko. Learning-rate-free learning by d-adaptation. InInternational Conference on Machine Learning, pages 7449–7479. PMLR, 2023
work page 2023
-
[38]
Provable convergence guarantees for black-box variational inference
Justin Domke, Robert Gower, and Guillaume Garrigos. Provable convergence guarantees for black-box variational inference. InAdvances in Neural Information Processing Systems, volume 36, pages 66289–66327, 2023. 11
work page 2023
-
[39]
Yoel Drori and Ohad Shamir. The complexity of finding stationary points with stochastic gradient descent.arXiv preprint arXiv:1910.01845, 2019
-
[40]
Dmitriy Drusvyatskiy and Courtney Paquette. Efficiency of minimizing compositions of convex functions and smooth maps.Mathematical Programming, 178(1):503–558, 2019
work page 2019
-
[41]
John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization.Journal of machine learning research, 12(7), 2011
work page 2011
-
[42]
Ivar Ekeland and Roger Temam.Convex analysis and variational problems. SIAM, 1999
work page 1999
-
[43]
Cong Fang, Chris Junchi Li, Zhouchen Lin, and Tong Zhang. Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator.Advances in neural information processing systems, 31, 2018
work page 2018
-
[44]
Sharp analysis for nonconvex SGD escaping from saddle points
Cong Fang, Zhouchen Lin, and Tong Zhang. Sharp analysis for nonconvex SGD escaping from saddle points. In Conference on Learning Theory, pages 1192–1234. PMLR, 2019
work page 2019
-
[45]
Lower Bounds and Proximally Anchored SGD for Non-Convex Minimization Under Unbounded Variance
Arda Fazla, Ege C. Kaya, Antesh Upadhyay, and Abolfazl Hashemi. Lower bounds and proximally anchored SGD for non-convex minimization under unbounded variance.arXiv preprint arXiv:2604.16620, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[46]
Saeed Ghadimi and Guanghui Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic program- ming.SIAM journal on optimization, 23(4):2341–2368, 2013
work page 2013
-
[47]
Saeed Ghadimi, Guanghui Lan, and Hongchao Zhang. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization.Mathematical Programming, 155(1):267–305, 2016
work page 2016
-
[48]
On stochastic approximation.Theory of Probability & Its Applications, 10(2):275–278, 1965
EG Gladyshev. On stochastic approximation.Theory of Probability & Its Applications, 10(2):275–278, 1965
work page 1965
-
[49]
Eduard Gorbunov, Marina Danilova, Innokentiy Shibaev, Pavel Dvurechensky, and Alexander Gasnikov. High- probability complexity bounds for non-smooth stochastic convex optimization with heavy-tailed noise.Journal of Optimization Theory and Applications, 203(3):2679–2738, 2024
work page 2024
-
[50]
A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent
Eduard Gorbunov, Filip Hanzely, and Peter Richtárik. A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent. InInternational Conference on Artificial Intelligence and Statistics, pages 680–690. PMLR, 2020
work page 2020
-
[51]
Eduard Gorbunov, Abdurakhmon Sadiev, Marina Danilova, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, and Peter Richtárik. High-probability convergence for composite and distributed stochastic minimization and variational inequalities with heavy-tailed noise.arXiv preprint arXiv:2310.01860, 2023
-
[52]
Benjamin Grimmer. Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity.SIAM Journal on Optimization, 29(2):1350–1365, 2019
work page 2019
-
[53]
Luyao Guo, Xinli Shi, Shaofu Yang, and Jinde Cao. Disa: A dual inexact splitting algorithm for distributed convex composite optimization.IEEE Transactions on Automatic Control, 69(5):2995–3010, 2023
work page 2023
-
[54]
The heavy-tail phenomenon in SGD
Mert Gurbuzbalaban, Umut Simsekli, and Lingjiong Zhu. The heavy-tail phenomenon in SGD. InInternational Conference on Machine Learning, pages 3964–3975. PMLR, 2021
work page 2021
-
[55]
Saad Hikmat Haji and Adnan Mohsin Abdulazeez. Comparison of optimization techniques based on gradient descent algorithm: A review.PalArch’s Journal of Archaeology of Egypt/Egyptology, 18(4):2715–2743, 2021
work page 2021
-
[56]
Erfan Yazdandoost Hamedani and Necdet Serhat Aybat. A primal-dual algorithm with line search for general convex-concave saddle point problems.SIAM Journal on Optimization, 31(2):1299–1329, 2021
work page 2021
-
[57]
J. M. Hammersley and J. G. Mauldon. General principles of antithetic variates.Mathematical Proceedings of the Cambridge Philosophical Society, 52(3):476–481, July 1956
work page 1956
-
[58]
J. M. Hammersley and K. W. Morton. A new Monte Carlo technique: Antithetic variates.Mathematical Proceedings of the Cambridge Philosophical Society, 52(3):449–475, July 1956. 12
work page 1956
-
[59]
Tight analyses for non-smooth stochastic gradient descent
Nicholas JA Harvey, Christopher Liaw, Yaniv Plan, and Sikander Randhawa. Tight analyses for non-smooth stochastic gradient descent. InConference on Learning Theory, pages 1579–1613. PMLR, 2019
work page 2019
-
[60]
A. Hashemi, D. Lee, and A. Makur. Strong antithetic variates: Theory and applications. Preprint, pp. 1–13, 2025
work page 2025
-
[61]
Abolfazl Hashemi. A unified model for large-scale inexact fixed-point iteration: A stochastic optimization perspective.IEEE Transactions on Automatic Control, 70(4):2435–2449, 2024
work page 2024
-
[62]
RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation
Abolfazl Hashemi. Rampage: Randomized mid-point for debiased gradient extrapolation.arXiv preprint arXiv:2603.22155, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[63]
Strong antithetic variance reduction inequalities
Abolfazl Hashemi, Dogmin Lee, and Anuran Makur. Strong antithetic variance reduction inequalities. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), pages 1–6, Ann Arbor, MI, USA, June 22–27 2025
work page 2025
-
[64]
Ignacio Hounie, Alejandro Ribeiro, and Luiz FO Chamon. Resilient constrained learning.Advances in Neural Information Processing Systems, 36:71767–71798, 2023
work page 2023
-
[65]
Yankun Huang and Qihang Lin. Oracle complexity of single-loop switching subgradient methods for non- smooth weakly convex functional constrained optimization.Advances in Neural Information Processing Systems, 36:61327–61340, 2023
work page 2023
-
[66]
Yi Huang, Shisheng Cui, Jian Sun, and Ziyang Meng. Distributed proximal algorithms for nonsmooth optimiza- tion: Unified convergence analysis.IEEE Transactions on Automatic Control, 2025
work page 2025
-
[67]
Yi Huang, Ziyang Meng, Jian Sun, and Wei Ren. Distributed multiproximal algorithm for nonsmooth convex optimization with coupled inequality constraints.IEEE Transactions on Automatic Control, 68(12):8126–8133, 2023
work page 2023
-
[68]
Sasila Ilandarideva, Anatoli Juditsky, Guanghui Lan, and Tianjiao Li. Accelerated stochastic approximation with state-dependent noise.arXiv preprint arXiv:2307.01497, 2023
-
[69]
Safe-ef: Error feedback for non-smooth constrained optimiza- tion
Rustem Islamov, Yarden As, and Ilyas Fatkhullin. Safe-ef: Error feedback for non-smooth constrained optimiza- tion. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[70]
Safe-EF: Error feedback for non-smooth constrained optimiza- tion
Rustem Islamov, Yarden As, and Ilyas Fatkhullin. Safe-EF: Error feedback for non-smooth constrained optimiza- tion. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[71]
Unconstrained online learning with unbounded losses
Andrew Jacobsen and Ashok Cutkosky. Unconstrained online learning with unbounded losses. InInternational Conference on Machine Learning, pages 14590–14630. PMLR, 2023
work page 2023
-
[72]
A. Jadbabaie, A. Makur, and A. Reisizadeh. Adaptive low-rank gradient descent. InProceedings of the 62nd IEEE Conference on Decision and Control (CDC), pages 3315–3320, Singapore, December 13–15 2023
work page 2023
-
[73]
A. Jadbabaie, A. Makur, and D. Shah. Federated optimization of smooth loss functions.IEEE Transactions on Information Theory, 69(12):7836–7866, December 2023
work page 2023
-
[74]
A. Jadbabaie, A. Makur, and D. Shah. Gradient-based empirical risk minimization using local polynomial regression.Stochastic Systems, 14(4):363–402, December 2024
work page 2024
-
[75]
Prateek Jain, Sham M Kakade, Rahul Kidambi, Praneeth Netrapalli, and Aaron Sidford. Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification.Journal of machine learning research, 18(223):1–42, 2018
work page 2018
-
[76]
Zhichao Jia and Benjamin Grimmer. First-order methods for nonsmooth nonconvex functional constrained optimization with or without slater points.SIAM Journal on Optimization, 35(2):1300–1329, 2025
work page 2025
-
[77]
Accelerating stochastic gradient descent using predictive variance reduction
Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems, 26, 2013. 13
work page 2013
-
[78]
Anatoli Juditsky and Arkadii S Nemirovski. Large deviations of vector-valued martingales in 2-smooth normed spaces.arXiv preprint arXiv:0809.0813, 2008
-
[79]
H. Kahn and A. W. Marshall. Methods of reducing sample size in Monte Carlo computations.Journal of the Operations Research Society of America, 1(5):263–278, November 1953
work page 1953
-
[80]
Better theory for SGD in the nonconvex world.Transactions on Machine Learning Research, 2023
Ahmed Khaled and Peter Richtárik. Better theory for SGD in the nonconvex world.Transactions on Machine Learning Research, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.