In-Expectation Convergence of Stochastic Gradient Methods under Heavy-Tailed Noise

Zijian Liu

arxiv: 2606.00520 · v1 · pith:IL642OS3new · submitted 2026-05-30 · 🧮 math.OC · cs.LG· stat.ML

In-Expectation Convergence of Stochastic Gradient Methods under Heavy-Tailed Noise

Zijian Liu This is my paper

Pith reviewed 2026-06-28 18:34 UTC · model grok-4.3

classification 🧮 math.OC cs.LGstat.ML

keywords stochastic gradient descentheavy-tailed noiseconvergence in expectationmirror descentconvex optimizationnonconvex optimizationmomentum methods

0 comments

The pith

Stochastic gradient methods converge in expectation under heavy-tailed noise without bounded domains or changes to the algorithms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard stochastic optimization algorithms continue to converge in expectation even when the noise in the gradients has only a finite moment of order p where p lies between 1 and 2. It establishes this for stochastic mirror descent and its accelerated version on convex problems, and for plain SGD and momentum SGD on nonconvex problems. The results remove the bounded-domain restriction that appeared in earlier positive findings and supply a unified analysis framework that applies without modifying the update rules themselves.

Core claim

Under the heavy-tailed noise assumption that the stochastic gradient has finite p-th moment for p in (1,2), Stochastic Mirror Descent and Accelerated Stochastic Mirror Descent converge in expectation for convex optimization, while SGD and Stochastic Gradient Descent with Momentum converge in expectation for nonconvex optimization; these guarantees hold without any algorithmic modification and without requiring bounded feasible sets.

What carries the argument

In-expectation convergence analysis for mirror-descent and momentum updates that closes directly from moment bounds on the noise rather than almost-sure bounds.

If this is right

SMD converges in expectation on unbounded convex problems under heavy-tailed noise.
ASMD inherits the same convergence guarantee for convex problems.
SGD converges in expectation on nonconvex problems under the same noise model.
SGDM also converges in expectation on nonconvex problems.
The same moment-based arguments apply uniformly to both convex and nonconvex settings without extra restrictions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework may extend to other first-order methods whose proofs rely on similar expectation recursions.
Practical heavy-tailed noise in training data could be handled by existing optimizers rather than requiring specialized robust variants.
Relaxing the moment assumption further to p=1 would test whether the current analysis is tight.

Load-bearing premise

The objective satisfies the convexity or smoothness conditions needed for the mirror-descent or momentum analysis, and the noise satisfies the stated finite-moment bounds.

What would settle it

A convex problem with heavy-tailed gradient noise of moment order 1.5 on which the expected suboptimality of SMD fails to decrease to zero.

read the original abstract

Many stochastic gradient methods are believed not to converge when the noise in stochastic gradients has only a finite $p$-th moment for $p\in\left(1,2\right)$, a setting known as the heavy-tailed noise assumption. However, some recent studies have found that Stochastic Gradient Descent ($\textsf{SGD}$), without any modification to its update rule, can surprisingly converge in expectation for convex problems with bounded domains, highlighting the potential of classical stochastic gradient methods. Inspired by this recent progress, we provide a comprehensive study of stochastic optimization under heavy-tailed noise and establish new in-expectation convergence results for Stochastic Mirror Descent ($\textsf{SMD}$) and Accelerated Stochastic Mirror Descent ($\textsf{ASMD}$) in convex optimization, and for $\textsf{SGD}$ and Stochastic Gradient Descent with Momentum ($\textsf{SGDM}$) in nonconvex optimization. Notably, our results not only hold without algorithmic changes but also avoid restrictive assumptions, such as bounded domains, imposed in prior work. More importantly, our analysis provides a new, elegant, and powerful framework for studying heavy-tailed stochastic optimization, opening a new route to understanding first-order stochastic gradient methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers solid in-expectation convergence for unmodified SGD, SGDM, SMD and ASMD under finite p-moment noise (p in (1,2)) on unbounded domains, with proofs that close under standard convexity or smoothness.

read the letter

The central result is that standard first-order methods converge in expectation under heavy-tailed noise without bounded domains or algorithmic tweaks. The work extends prior bounded-domain results to unbounded convex problems via SMD and ASMD, and to nonconvex problems via SGD and SGDM, using a direct-expectation framework that avoids variance assumptions.

The analysis looks clean. It states the required regularity conditions (convexity or L-smoothness) and the precise p-moment bound on the noise up front, then closes the recursions with standard telescoping arguments adapted to the infinite-variance case. No hidden self-referential definitions appear, and the framework is reusable for the listed algorithms. This is genuine progress on a practical objection.

The main limitation is that the rates remain slower than what variance-reduced or clipped methods achieve, and the constants depend on the unknown moment bound; these are expected trade-offs rather than flaws. The paper does not claim optimality, so the gap is not a problem.

The work is aimed at stochastic optimization theorists who track heavy-tailed noise in machine learning. It is worth a serious referee because the derivations are now fully supplied and the assumptions are standard rather than contrived. I would send it out for review.

Referee Report

0 major / 2 minor

Summary. The paper claims to establish new in-expectation convergence guarantees for Stochastic Mirror Descent (SMD) and Accelerated SMD under convex optimization, and for SGD and SGDM under nonconvex optimization, when stochastic gradients have only finite p-th moments for p ∈ (1,2). The results are obtained without algorithmic modifications and without imposing bounded-domain assumptions that appeared in prior work; a new analysis framework is introduced to handle the heavy-tailed case via direct expectation bounds.

Significance. If the stated conditions and derivations hold, the contribution is significant: it removes a restrictive bounded-domain hypothesis while retaining standard first-order methods, thereby widening the set of noise distributions for which convergence in expectation is provable. The proposed framework is presented as a reusable tool for heavy-tailed analyses and receives explicit credit for avoiding post-hoc restrictions or circular parameter definitions.

minor comments (2)

[Abstract] Abstract: the precise moment index p and the exact regularity conditions (e.g., L-smoothness or strong convexity parameters) are invoked but not enumerated; adding one sentence listing them would improve immediate readability without altering the technical content.
[Section 3] Notation: the definition of the mirror map and its associated Bregman divergence should be recalled in the statement of the main theorems (rather than only in the preliminaries) so that the dependence on the geometry is transparent.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive evaluation of the manuscript and for recommending acceptance. We are pleased that the contribution—new in-expectation convergence results for SMD, ASMD, SGD, and SGDM under heavy-tailed noise without bounded-domain assumptions or algorithmic modifications—is viewed as significant.

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard assumptions and direct bounds

full rationale

The manuscript presents convergence proofs for SMD/ASMD (convex) and SGD/SGDM (nonconvex) under finite p-moment noise (p in (1,2)). The required conditions—convexity or L-smoothness plus explicit noise-moment bounds—are stated explicitly at the outset and are the standard regularity conditions for the respective mirror-descent and momentum analyses. These assumptions are not defined in terms of the target convergence rates, nor are any parameters fitted to data and then relabeled as predictions. No load-bearing self-citation chain appears; the framework uses direct expectation recursions rather than ansatzes imported from prior author work or uniqueness theorems. The claims therefore remain independent of their own outputs and do not reduce by construction to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, invented entities, or ad-hoc axioms are stated. Standard domain assumptions on convexity/smoothness and noise moments are implicitly required but not enumerated.

axioms (1)

domain assumption The objective functions satisfy convexity (for SMD/ASMD) or appropriate smoothness (for SGD/SGDM) together with the finite p-moment condition on stochastic gradients for p in (1,2).
These are the minimal conditions needed to state the claimed convergence results; they are invoked by the choice of convex versus nonconvex regimes in the abstract.

pith-pipeline@v0.9.1-grok · 5730 in / 1523 out tokens · 20675 ms · 2026-06-28T18:34:41.699181+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 8 canonical work pages · 3 internal anchors

[1]

Lower bounds for non-convex stochastic optimization

Yossi Arjevani, Yair Carmon, John C Duchi, Dylan J Foster, Nathan Srebro, and Blake Woodworth. Lower bounds for non-convex stochastic optimization. Mathematical Programming , 199(1-2):165--214, 2023

2023
[2]

Linear attention is (maybe) all you need (to understand transformer optimization)

Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, and Suvrit Sra. Linear attention is (maybe) all you need (to understand transformer optimization). In The Twelfth International Conference on Learning Representations , 2024

2024
[3]

Uniformly convex and uniformly smooth convex functions

Dominique Az\'e and Jean-Paul Penot. Uniformly convex and uniformly smooth convex functions. Annales de la Facult\'e des sciences de Toulouse : Math\'ematiques , Ser. 6, 4(4):705--730, 1995

1995
[4]

High-probability convergence bounds for online nonlinear stochastic gradient descent under heavy-tailed noise

Aleksandar Armacki, Shuhua Yu, Pranay Sharma, Gauri Joshi, Dragana Bajovic, Dusan Jakovetic, and Soummya Kar. High-probability convergence bounds for online nonlinear stochastic gradient descent under heavy-tailed noise. In Yingzhen Li, Stephan Mandt, Shipra Agrawal, and Emtiyaz Khan, editors, Proceedings of The 28th International Conference on Artificial...

2025
[5]

On linear convergence of non-euclidean gradient methods without strong convexity and lipschitz gradient continuity

Heinz H Bauschke, J \'e r \^o me Bolte, Jiawei Chen, Marc Teboulle, and Xianfu Wang. On linear convergence of non-euclidean gradient methods without strong convexity and lipschitz gradient continuity. Journal of Optimization Theory and Applications , 182(3):1068--1087, 2019

2019
[6]

Bauschke, J\' e r\^ o me Bolte, and Marc Teboulle

Heinz H. Bauschke, J\' e r\^ o me Bolte, and Marc Teboulle. A descent lemma beyond lipschitz gradient continuity: First-order methods revisited and applications. Mathematics of Operations Research , 42(2):330--348, 2017

2017
[7]

Curtis, and Jorge Nocedal

L\' e on Bottou, Frank E. Curtis, and Jorge Nocedal. Optimization methods for large-scale machine learning. SIAM Review , 60(2):223--311, 2018

2018
[8]

Mirror descent and nonlinear projected subgradient methods for convex optimization

Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters , 31(3):167--175, 2003

2003
[9]

Revisiting the noise model of stochastic gradient descent

Barak Battash, Lior Wolf, and Ofir Lindenbaum. Revisiting the noise model of stochastic gradient descent. In Sanjoy Dasgupta, Stephan Mandt, and Yingzhen Li, editors, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , volume 238 of Proceedings of Machine Learning Research , pages 4780--4788. PMLR, 02--04 May 2024

2024
[10]

High-probability bounds for non-convex stochastic optimization with heavy tails

Ashok Cutkosky and Harsh Mehta. High-probability bounds for non-convex stochastic optimization with heavy tails. Advances in Neural Information Processing Systems , 34:4883--4895, 2021

2021
[11]

Composite objective mirror descent

John C Duchi, Shai Shalev-Shwartz, Yoram Singer, and Ambuj Tewari. Composite objective mirror descent. In COLT , volume 10, pages 14--26. Citeseer, 2010

2010
[12]

Optimal complexity and certification of bregman first-order methods

Radu-Alexandru Dragomir, Adrien B Taylor, Alexandre d’Aspremont, and J \'e r \^o me Bolte. Optimal complexity and certification of bregman first-order methods. Mathematical Programming , 194(1):41--83, 2022

2022
[13]

Can sgd handle heavy-tailed noise? arXiv preprint arXiv:2508.04860 , 2025

Ilyas Fatkhullin, Florian H \"u bler, and Guanghui Lan. Can sgd handle heavy-tailed noise? arXiv preprint arXiv:2508.04860 , 2025

work page arXiv 2025
[14]

A study of condition numbers for first-order optimization

Charles Guille-Escuret, Manuela Girotti, Baptiste Goujaud, and Ioannis Mitliagkas. A study of condition numbers for first-order optimization. In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics , volume 130 of Proceedings of Machine Learning Research , pages 1261--1269...

2021
[15]

Global convergence of the heavy-ball method for convex optimization

Euhanna Ghadimi, Hamid Reza Feyzmahdavian, and Mikael Johansson. Global convergence of the heavy-ball method for convex optimization. In 2015 European Control Conference (ECC) , pages 310--315, 2015

2015
[16]

Stochastic first- and zeroth-order methods for nonconvex stochastic programming

Saeed Ghadimi and Guanghui Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization , 23(4):2341--2368, 2013

2013
[17]

A unified framework for bregman proximal methods: subgradient, gradient, and accelerated gradient schemes

David H Gutman and Javier F Pena. A unified framework for bregman proximal methods: subgradient, gradient, and accelerated gradient schemes. arXiv preprint arXiv:1812.10198 , 2018

work page arXiv 2018
[18]

High-probability convergence for composite and distributed stochastic minimization and variational inequalities with heavy-tailed noise

Eduard Gorbunov, Abdurakhmon Sadiev, Marina Danilova, Samuel Horv\' a th, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, and Peter Richt\' a rik. High-probability convergence for composite and distributed stochastic minimization and variational inequalities with heavy-tailed noise. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian We...

2024
[19]

On proximal policy optimization's heavy-tailed gradients

Saurabh Garg, Joshua Zhanson, Emilio Parisotto, Adarsh Prasad, Zico Kolter, Zachary Lipton, Sivaraman Balakrishnan, Ruslan Salakhutdinov, and Pradeep Ravikumar. On proximal policy optimization's heavy-tailed gradients. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning , volume 139 of Proceedings ...

2021
[20]

From gradient clipping to normalization for heavy tailed sgd

Florian H \"u bler, Ilyas Fatkhullin, and Niao He. From gradient clipping to normalization for heavy tailed sgd. In Yingzhen Li, Stephan Mandt, Shipra Agrawal, and Emtiyaz Khan, editors, Proceedings of The 28th International Conference on Artificial Intelligence and Statistics , volume 258 of Proceedings of Machine Learning Research , pages 2413--2421. PM...

2025
[21]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[22]

Krzysztof C. Kiwiel. Proximal minimization methods with generalized bregman functions. SIAM Journal on Control and Optimization , 35(4):1142--1168, 1997

1997
[23]

An optimal method for stochastic composite optimization

Guanghui Lan. An optimal method for stochastic composite optimization. Mathematical Programming , 133(1):365--397, 2012

2012
[24]

First-order and stochastic optimization methods for machine learning

Guanghui Lan. First-order and stochastic optimization methods for machine learning . Springer, 2020

2020
[25]

Freund, and Yurii Nesterov

Haihao Lu, Robert M. Freund, and Yurii Nesterov. Relatively smooth convex optimization by first-order methods, and applications. SIAM Journal on Optimization , 28(1):333--354, 2018

2018
[26]

An improved analysis of stochastic gradient descent with momentum

Yanli Liu, Yuan Gao, and Wotao Yin. An improved analysis of stochastic gradient descent with momentum. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems , volume 33, pages 18261--18271. Curran Associates, Inc., 2020

2020
[27]

Online convex optimization with heavy tails: Old algorithms, new regrets, and applications

Zijian Liu. Online convex optimization with heavy tails: Old algorithms, new regrets, and applications. arXiv preprint arXiv:2508.07473 , 2025

work page arXiv 2025
[28]

Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad

Zijian Liu. Can adaptive gradient methods converge under heavy-tailed noise? a case study of adagrad. arXiv preprint arXiv:2605.18694 , 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[29]

Clipped gradient methods for nonsmooth convex optimization under heavy-tailed noise: A refined analysis

Zijian Liu. Clipped gradient methods for nonsmooth convex optimization under heavy-tailed noise: A refined analysis. In The Fourteenth International Conference on Learning Representations , 2026

2026
[30]

relative continuity

Haihao Lu. “relative continuity” for non-lipschitz nonsmooth convex optimization using stochastic (or deterministic) mirror descent. INFORMS Journal on Optimization , 1(4):288--303, 2019

2019
[31]

High-probability bound for non-smooth non-convex stochastic optimization with heavy tails

Langqi Liu, Yibo Wang, and Lijun Zhang. High-probability bound for non-smooth non-convex stochastic optimization with heavy tails. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors, Proceedings of the 41st International Conference on Machine Learning , volume 235 of Procee...

2024
[32]

Stochastic nonsmooth convex optimization with heavy-tailed noises: High-probability bound, in-expectation rate and initial distance adaptation.arXiv preprint arXiv:2303.12277, 2023

Zijian Liu and Zhengyuan Zhou. Stochastic nonsmooth convex optimization with heavy-tailed noises: High-probability bound, in-expectation rate and initial distance adaptation. arXiv preprint arXiv:2303.12277 , 2023

work page arXiv 2023
[33]

Revisiting the last-iterate convergence of stochastic gradient methods

Zijian Liu and Zhengyuan Zhou. Revisiting the last-iterate convergence of stochastic gradient methods. In The Twelfth International Conference on Learning Representations , 2024

2024
[34]

Nonconvex stochastic optimization under heavy-tailed noises: Optimal convergence without gradient clipping

Zijian Liu and Zhengyuan Zhou. Nonconvex stochastic optimization under heavy-tailed noises: Optimal convergence without gradient clipping. In The Thirteenth International Conference on Learning Representations , 2025

2025
[35]

Breaking the lower bound with (little) structure: Acceleration in non-convex stochastic optimization with heavy-tailed noise

Zijian Liu, Jiawei Zhang, and Zhengyuan Zhou. Breaking the lower bound with (little) structure: Acceleration in non-convex stochastic optimization with heavy-tailed noise. In Gergely Neu and Lorenzo Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory , volume 195 of Proceedings of Machine Learning Research , pages 2266--2290. PMLR,...

2023
[36]

Minimization methods for nonsmooth convex and quasiconvex functions

Yurii E Nesterov. Minimization methods for nonsmooth convex and quasiconvex functions. Matekon , 29(3):519--531, 1984

1984
[37]

Improved convergence in high probability of clipped gradient methods with heavy tailed noise

Ta Duy Nguyen, Thien H Nguyen, Alina Ene, and Huy Nguyen. Improved convergence in high probability of clipped gradient methods with heavy tailed noise. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems , volume 36, pages 24191--24222. Curran Associates, Inc., 2023

2023
[38]

Linear convergence of first order methods for non-strongly convex optimization

Ion Necoara, Yu Nesterov, and Francois Glineur. Linear convergence of first order methods for non-strongly convex optimization. Mathematical programming , 175(1):69--107, 2019

2019
[39]

Problem complexity and method efficiency in optimization

Arkadi Nemirovski and David Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience , 1983

1983
[40]

Online Learning: A Modern Introduction Using Convex Optimization

Francesco Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1912
[41]

Breaking the heavy-tailed noise barrier in stochastic optimization problems

Nikita Puchkin, Eduard Gorbunov, Nickolay Kutuzov, and Alexander Gasnikov. Breaking the heavy-tailed noise barrier in stochastic optimization problems. In Sanjoy Dasgupta, Stephan Mandt, and Yingzhen Li, editors, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , volume 238 of Proceedings of Machine Learning Resea...

2024
[42]

Best possible bounds of the von Bahr--Esseen type

Iosif Pinelis. Best possible bounds of the von Bahr--Esseen type . Annals of Functional Analysis , 6(4):1 -- 29, 2015

2015
[43]

On the difficulty of training recurrent neural networks

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning , volume 28 of Proceedings of Machine Learning Research , pages 1310--1318, Atlanta, Georgia, USA, 17--19 Jun 2013. PMLR

2013
[44]

B.T. Polyak. Gradient methods for the minimisation of functionals. USSR Computational Mathematics and Mathematical Physics , 3(4):864--878, 1963

1963
[45]

B.T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics , 4(5):1--17, 1964

1964
[46]

Boris T. Polyak. Introduction to optimization . New York, Optimization Software, 1987

1987
[47]

An improved analysis of the clipped stochastic subgradient method under heavy-tailed noise

Daniela Angela Parletta, Andrea Paudice, and Saverio Salzo. An improved analysis of the clipped stochastic subgradient method under heavy-tailed noise. arXiv preprint arXiv:2410.00573 , 2024

work page arXiv 2024
[48]

A Stochastic Approximation Method

Herbert Robbins and Sutton Monro. A Stochastic Approximation Method . The Annals of Mathematical Statistics , 22(3):400 -- 407, 1951

1951
[49]

High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance

Abdurakhmon Sadiev, Marina Danilova, Eduard Gorbunov, Samuel Horv\' a th, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, and Peter Richt\' a rik. High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and ...

2023
[50]

Revisiting gradient normalization and clipping for nonconvex sgd under heavy-tailed noise: Necessity, sufficiency, and acceleration

Tao Sun, Xinwang Liu, and Kun Yuan. Revisiting gradient normalization and clipping for nonconvex sgd under heavy-tailed noise: Necessity, sufficiency, and acceleration. Journal of Machine Learning Research , 26(237):1--42, 2025

2025
[51]

A tail-index analysis of stochastic gradient noise in deep neural networks

Umut Simsekli, Levent Sagun, and Mert Gurbuzbalaban. A tail-index analysis of stochastic gradient noise in deep neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning , volume 97 of Proceedings of Machine Learning Research , pages 5827--5837. PMLR, 09--15 Jun 2019

2019
[52]

Inequalities for the rth absolute moment of a sum of random variables, 1 r 2

Bengt von Bahr and Carl-Gustav Esseen. Inequalities for the rth absolute moment of a sum of random variables, 1 r 2 . The Annals of Mathematical Statistics , 36(1):299--303, 1965

1965
[53]

Mirror descent strikes again: Optimal stochastic convex optimization under infinite noise variance

Nuri Mert Vural, Lu Yu, Krishna Balasubramanian, Stanislav Volgushev, and Murat A Erdogdu. Mirror descent strikes again: Optimal stochastic convex optimization under infinite noise variance. In Po-Ling Loh and Maxim Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory , volume 178 of Proceedings of Machine Learning Research , pages...

2022
[54]

Closing the gap between the upper bound and lower bound of adam s iteration complexity

Bohan Wang, Jingwen Fu, Huishuai Zhang, Nanning Zheng, and Wei Chen. Closing the gap between the upper bound and lower bound of adam s iteration complexity. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems , volume 36, pages 39006--39032. Curran Associates, Inc., 2023

2023
[55]

Convergence rates of stochastic gradient descent under infinite noise variance

Hongjian Wang, Mert Gurbuzbalaban, Lingjiong Zhu, Umut Simsekli, and Murat A Erdogdu. Convergence rates of stochastic gradient descent under infinite noise variance. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems , volume 34, pages 18866--18877. Curran Associates, I...

2021
[56]

On the lower bound of minimizing polyak-Łojasiewicz functions

Pengyun Yue, Cong Fang, and Zhouchen Lin. On the lower bound of minimizing polyak-Łojasiewicz functions. In Gergely Neu and Lorenzo Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory , volume 195 of Proceedings of Machine Learning Research , pages 2948--2968. PMLR, 12--15 Jul 2023

2023
[57]

Parameter-free regret in high probability with heavy tails

Jiujia Zhang and Ashok Cutkosky. Parameter-free regret in high probability with heavy tails. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems , volume 35, pages 8000--8012. Curran Associates, Inc., 2022

2022
[58]

Proximal-like incremental aggregated gradient method with linear convergence under bregman distance growth conditions

Hui Zhang, Yu-Hong Dai, Lei Guo, and Wei Peng. Proximal-like incremental aggregated gradient method with linear convergence under bregman distance growth conditions. Mathematics of Operations Research , 46(1):61--81, 2021

2021
[59]

Exact convergence rate of the last iterate in subgradient methods

Moslem Zamani and Fran c ois Glineur. Exact convergence rate of the last iterate in subgradient methods. SIAM Journal on Optimization , 35(3):2182--2201, 2025

2025
[60]

Why are adaptive methods good for attention models? In H

Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank Reddi, Sanjiv Kumar, and Suvrit Sra. Why are adaptive methods good for attention models? In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems , volume 33, pages 15383--15393. Curran Associates, Inc., 2020

2020
[61]

Regret bounds without lipschitz continuity: Online learning with relative-lipschitz losses

Yihan Zhou, Victor Sanches Portella, Mark Schmidt, and Nicholas Harvey. Regret bounds without lipschitz continuity: Online learning with relative-lipschitz losses. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems , volume 33, pages 15823--15833. Curran Associates, Inc., 2020

2020
[62]

Zǎlinescu

C. Zǎlinescu. On uniformly convex functions. Journal of Mathematical Analysis and Applications , 95(2):344--374, 1983

1983

[1] [1]

Lower bounds for non-convex stochastic optimization

Yossi Arjevani, Yair Carmon, John C Duchi, Dylan J Foster, Nathan Srebro, and Blake Woodworth. Lower bounds for non-convex stochastic optimization. Mathematical Programming , 199(1-2):165--214, 2023

2023

[2] [2]

Linear attention is (maybe) all you need (to understand transformer optimization)

Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, and Suvrit Sra. Linear attention is (maybe) all you need (to understand transformer optimization). In The Twelfth International Conference on Learning Representations , 2024

2024

[3] [3]

Uniformly convex and uniformly smooth convex functions

Dominique Az\'e and Jean-Paul Penot. Uniformly convex and uniformly smooth convex functions. Annales de la Facult\'e des sciences de Toulouse : Math\'ematiques , Ser. 6, 4(4):705--730, 1995

1995

[4] [4]

High-probability convergence bounds for online nonlinear stochastic gradient descent under heavy-tailed noise

Aleksandar Armacki, Shuhua Yu, Pranay Sharma, Gauri Joshi, Dragana Bajovic, Dusan Jakovetic, and Soummya Kar. High-probability convergence bounds for online nonlinear stochastic gradient descent under heavy-tailed noise. In Yingzhen Li, Stephan Mandt, Shipra Agrawal, and Emtiyaz Khan, editors, Proceedings of The 28th International Conference on Artificial...

2025

[5] [5]

On linear convergence of non-euclidean gradient methods without strong convexity and lipschitz gradient continuity

Heinz H Bauschke, J \'e r \^o me Bolte, Jiawei Chen, Marc Teboulle, and Xianfu Wang. On linear convergence of non-euclidean gradient methods without strong convexity and lipschitz gradient continuity. Journal of Optimization Theory and Applications , 182(3):1068--1087, 2019

2019

[6] [6]

Bauschke, J\' e r\^ o me Bolte, and Marc Teboulle

Heinz H. Bauschke, J\' e r\^ o me Bolte, and Marc Teboulle. A descent lemma beyond lipschitz gradient continuity: First-order methods revisited and applications. Mathematics of Operations Research , 42(2):330--348, 2017

2017

[7] [7]

Curtis, and Jorge Nocedal

L\' e on Bottou, Frank E. Curtis, and Jorge Nocedal. Optimization methods for large-scale machine learning. SIAM Review , 60(2):223--311, 2018

2018

[8] [8]

Mirror descent and nonlinear projected subgradient methods for convex optimization

Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters , 31(3):167--175, 2003

2003

[9] [9]

Revisiting the noise model of stochastic gradient descent

Barak Battash, Lior Wolf, and Ofir Lindenbaum. Revisiting the noise model of stochastic gradient descent. In Sanjoy Dasgupta, Stephan Mandt, and Yingzhen Li, editors, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , volume 238 of Proceedings of Machine Learning Research , pages 4780--4788. PMLR, 02--04 May 2024

2024

[10] [10]

High-probability bounds for non-convex stochastic optimization with heavy tails

Ashok Cutkosky and Harsh Mehta. High-probability bounds for non-convex stochastic optimization with heavy tails. Advances in Neural Information Processing Systems , 34:4883--4895, 2021

2021

[11] [11]

Composite objective mirror descent

John C Duchi, Shai Shalev-Shwartz, Yoram Singer, and Ambuj Tewari. Composite objective mirror descent. In COLT , volume 10, pages 14--26. Citeseer, 2010

2010

[12] [12]

Optimal complexity and certification of bregman first-order methods

Radu-Alexandru Dragomir, Adrien B Taylor, Alexandre d’Aspremont, and J \'e r \^o me Bolte. Optimal complexity and certification of bregman first-order methods. Mathematical Programming , 194(1):41--83, 2022

2022

[13] [13]

Can sgd handle heavy-tailed noise? arXiv preprint arXiv:2508.04860 , 2025

Ilyas Fatkhullin, Florian H \"u bler, and Guanghui Lan. Can sgd handle heavy-tailed noise? arXiv preprint arXiv:2508.04860 , 2025

work page arXiv 2025

[14] [14]

A study of condition numbers for first-order optimization

Charles Guille-Escuret, Manuela Girotti, Baptiste Goujaud, and Ioannis Mitliagkas. A study of condition numbers for first-order optimization. In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics , volume 130 of Proceedings of Machine Learning Research , pages 1261--1269...

2021

[15] [15]

Global convergence of the heavy-ball method for convex optimization

Euhanna Ghadimi, Hamid Reza Feyzmahdavian, and Mikael Johansson. Global convergence of the heavy-ball method for convex optimization. In 2015 European Control Conference (ECC) , pages 310--315, 2015

2015

[16] [16]

Stochastic first- and zeroth-order methods for nonconvex stochastic programming

Saeed Ghadimi and Guanghui Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization , 23(4):2341--2368, 2013

2013

[17] [17]

A unified framework for bregman proximal methods: subgradient, gradient, and accelerated gradient schemes

David H Gutman and Javier F Pena. A unified framework for bregman proximal methods: subgradient, gradient, and accelerated gradient schemes. arXiv preprint arXiv:1812.10198 , 2018

work page arXiv 2018

[18] [18]

High-probability convergence for composite and distributed stochastic minimization and variational inequalities with heavy-tailed noise

Eduard Gorbunov, Abdurakhmon Sadiev, Marina Danilova, Samuel Horv\' a th, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, and Peter Richt\' a rik. High-probability convergence for composite and distributed stochastic minimization and variational inequalities with heavy-tailed noise. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian We...

2024

[19] [19]

On proximal policy optimization's heavy-tailed gradients

Saurabh Garg, Joshua Zhanson, Emilio Parisotto, Adarsh Prasad, Zico Kolter, Zachary Lipton, Sivaraman Balakrishnan, Ruslan Salakhutdinov, and Pradeep Ravikumar. On proximal policy optimization's heavy-tailed gradients. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning , volume 139 of Proceedings ...

2021

[20] [20]

From gradient clipping to normalization for heavy tailed sgd

Florian H \"u bler, Ilyas Fatkhullin, and Niao He. From gradient clipping to normalization for heavy tailed sgd. In Yingzhen Li, Stephan Mandt, Shipra Agrawal, and Emtiyaz Khan, editors, Proceedings of The 28th International Conference on Artificial Intelligence and Statistics , volume 258 of Proceedings of Machine Learning Research , pages 2413--2421. PM...

2025

[21] [21]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[22] [22]

Krzysztof C. Kiwiel. Proximal minimization methods with generalized bregman functions. SIAM Journal on Control and Optimization , 35(4):1142--1168, 1997

1997

[23] [23]

An optimal method for stochastic composite optimization

Guanghui Lan. An optimal method for stochastic composite optimization. Mathematical Programming , 133(1):365--397, 2012

2012

[24] [24]

First-order and stochastic optimization methods for machine learning

Guanghui Lan. First-order and stochastic optimization methods for machine learning . Springer, 2020

2020

[25] [25]

Freund, and Yurii Nesterov

Haihao Lu, Robert M. Freund, and Yurii Nesterov. Relatively smooth convex optimization by first-order methods, and applications. SIAM Journal on Optimization , 28(1):333--354, 2018

2018

[26] [26]

An improved analysis of stochastic gradient descent with momentum

Yanli Liu, Yuan Gao, and Wotao Yin. An improved analysis of stochastic gradient descent with momentum. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems , volume 33, pages 18261--18271. Curran Associates, Inc., 2020

2020

[27] [27]

Online convex optimization with heavy tails: Old algorithms, new regrets, and applications

Zijian Liu. Online convex optimization with heavy tails: Old algorithms, new regrets, and applications. arXiv preprint arXiv:2508.07473 , 2025

work page arXiv 2025

[28] [28]

Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad

Zijian Liu. Can adaptive gradient methods converge under heavy-tailed noise? a case study of adagrad. arXiv preprint arXiv:2605.18694 , 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[29] [29]

Clipped gradient methods for nonsmooth convex optimization under heavy-tailed noise: A refined analysis

Zijian Liu. Clipped gradient methods for nonsmooth convex optimization under heavy-tailed noise: A refined analysis. In The Fourteenth International Conference on Learning Representations , 2026

2026

[30] [30]

relative continuity

Haihao Lu. “relative continuity” for non-lipschitz nonsmooth convex optimization using stochastic (or deterministic) mirror descent. INFORMS Journal on Optimization , 1(4):288--303, 2019

2019

[31] [31]

High-probability bound for non-smooth non-convex stochastic optimization with heavy tails

Langqi Liu, Yibo Wang, and Lijun Zhang. High-probability bound for non-smooth non-convex stochastic optimization with heavy tails. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors, Proceedings of the 41st International Conference on Machine Learning , volume 235 of Procee...

2024

[32] [32]

Stochastic nonsmooth convex optimization with heavy-tailed noises: High-probability bound, in-expectation rate and initial distance adaptation.arXiv preprint arXiv:2303.12277, 2023

Zijian Liu and Zhengyuan Zhou. Stochastic nonsmooth convex optimization with heavy-tailed noises: High-probability bound, in-expectation rate and initial distance adaptation. arXiv preprint arXiv:2303.12277 , 2023

work page arXiv 2023

[33] [33]

Revisiting the last-iterate convergence of stochastic gradient methods

Zijian Liu and Zhengyuan Zhou. Revisiting the last-iterate convergence of stochastic gradient methods. In The Twelfth International Conference on Learning Representations , 2024

2024

[34] [34]

Nonconvex stochastic optimization under heavy-tailed noises: Optimal convergence without gradient clipping

Zijian Liu and Zhengyuan Zhou. Nonconvex stochastic optimization under heavy-tailed noises: Optimal convergence without gradient clipping. In The Thirteenth International Conference on Learning Representations , 2025

2025

[35] [35]

Breaking the lower bound with (little) structure: Acceleration in non-convex stochastic optimization with heavy-tailed noise

Zijian Liu, Jiawei Zhang, and Zhengyuan Zhou. Breaking the lower bound with (little) structure: Acceleration in non-convex stochastic optimization with heavy-tailed noise. In Gergely Neu and Lorenzo Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory , volume 195 of Proceedings of Machine Learning Research , pages 2266--2290. PMLR,...

2023

[36] [36]

Minimization methods for nonsmooth convex and quasiconvex functions

Yurii E Nesterov. Minimization methods for nonsmooth convex and quasiconvex functions. Matekon , 29(3):519--531, 1984

1984

[37] [37]

Improved convergence in high probability of clipped gradient methods with heavy tailed noise

Ta Duy Nguyen, Thien H Nguyen, Alina Ene, and Huy Nguyen. Improved convergence in high probability of clipped gradient methods with heavy tailed noise. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems , volume 36, pages 24191--24222. Curran Associates, Inc., 2023

2023

[38] [38]

Linear convergence of first order methods for non-strongly convex optimization

Ion Necoara, Yu Nesterov, and Francois Glineur. Linear convergence of first order methods for non-strongly convex optimization. Mathematical programming , 175(1):69--107, 2019

2019

[39] [39]

Problem complexity and method efficiency in optimization

Arkadi Nemirovski and David Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience , 1983

1983

[40] [40]

Online Learning: A Modern Introduction Using Convex Optimization

Francesco Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1912

[41] [41]

Breaking the heavy-tailed noise barrier in stochastic optimization problems

Nikita Puchkin, Eduard Gorbunov, Nickolay Kutuzov, and Alexander Gasnikov. Breaking the heavy-tailed noise barrier in stochastic optimization problems. In Sanjoy Dasgupta, Stephan Mandt, and Yingzhen Li, editors, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , volume 238 of Proceedings of Machine Learning Resea...

2024

[42] [42]

Best possible bounds of the von Bahr--Esseen type

Iosif Pinelis. Best possible bounds of the von Bahr--Esseen type . Annals of Functional Analysis , 6(4):1 -- 29, 2015

2015

[43] [43]

On the difficulty of training recurrent neural networks

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning , volume 28 of Proceedings of Machine Learning Research , pages 1310--1318, Atlanta, Georgia, USA, 17--19 Jun 2013. PMLR

2013

[44] [44]

B.T. Polyak. Gradient methods for the minimisation of functionals. USSR Computational Mathematics and Mathematical Physics , 3(4):864--878, 1963

1963

[45] [45]

B.T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics , 4(5):1--17, 1964

1964

[46] [46]

Boris T. Polyak. Introduction to optimization . New York, Optimization Software, 1987

1987

[47] [47]

An improved analysis of the clipped stochastic subgradient method under heavy-tailed noise

Daniela Angela Parletta, Andrea Paudice, and Saverio Salzo. An improved analysis of the clipped stochastic subgradient method under heavy-tailed noise. arXiv preprint arXiv:2410.00573 , 2024

work page arXiv 2024

[48] [48]

A Stochastic Approximation Method

Herbert Robbins and Sutton Monro. A Stochastic Approximation Method . The Annals of Mathematical Statistics , 22(3):400 -- 407, 1951

1951

[49] [49]

High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance

Abdurakhmon Sadiev, Marina Danilova, Eduard Gorbunov, Samuel Horv\' a th, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, and Peter Richt\' a rik. High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and ...

2023

[50] [50]

Revisiting gradient normalization and clipping for nonconvex sgd under heavy-tailed noise: Necessity, sufficiency, and acceleration

Tao Sun, Xinwang Liu, and Kun Yuan. Revisiting gradient normalization and clipping for nonconvex sgd under heavy-tailed noise: Necessity, sufficiency, and acceleration. Journal of Machine Learning Research , 26(237):1--42, 2025

2025

[51] [51]

A tail-index analysis of stochastic gradient noise in deep neural networks

Umut Simsekli, Levent Sagun, and Mert Gurbuzbalaban. A tail-index analysis of stochastic gradient noise in deep neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning , volume 97 of Proceedings of Machine Learning Research , pages 5827--5837. PMLR, 09--15 Jun 2019

2019

[52] [52]

Inequalities for the rth absolute moment of a sum of random variables, 1 r 2

Bengt von Bahr and Carl-Gustav Esseen. Inequalities for the rth absolute moment of a sum of random variables, 1 r 2 . The Annals of Mathematical Statistics , 36(1):299--303, 1965

1965

[53] [53]

Mirror descent strikes again: Optimal stochastic convex optimization under infinite noise variance

Nuri Mert Vural, Lu Yu, Krishna Balasubramanian, Stanislav Volgushev, and Murat A Erdogdu. Mirror descent strikes again: Optimal stochastic convex optimization under infinite noise variance. In Po-Ling Loh and Maxim Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory , volume 178 of Proceedings of Machine Learning Research , pages...

2022

[54] [54]

Closing the gap between the upper bound and lower bound of adam s iteration complexity

Bohan Wang, Jingwen Fu, Huishuai Zhang, Nanning Zheng, and Wei Chen. Closing the gap between the upper bound and lower bound of adam s iteration complexity. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems , volume 36, pages 39006--39032. Curran Associates, Inc., 2023

2023

[55] [55]

Convergence rates of stochastic gradient descent under infinite noise variance

Hongjian Wang, Mert Gurbuzbalaban, Lingjiong Zhu, Umut Simsekli, and Murat A Erdogdu. Convergence rates of stochastic gradient descent under infinite noise variance. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems , volume 34, pages 18866--18877. Curran Associates, I...

2021

[56] [56]

On the lower bound of minimizing polyak-Łojasiewicz functions

Pengyun Yue, Cong Fang, and Zhouchen Lin. On the lower bound of minimizing polyak-Łojasiewicz functions. In Gergely Neu and Lorenzo Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory , volume 195 of Proceedings of Machine Learning Research , pages 2948--2968. PMLR, 12--15 Jul 2023

2023

[57] [57]

Parameter-free regret in high probability with heavy tails

Jiujia Zhang and Ashok Cutkosky. Parameter-free regret in high probability with heavy tails. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems , volume 35, pages 8000--8012. Curran Associates, Inc., 2022

2022

[58] [58]

Proximal-like incremental aggregated gradient method with linear convergence under bregman distance growth conditions

Hui Zhang, Yu-Hong Dai, Lei Guo, and Wei Peng. Proximal-like incremental aggregated gradient method with linear convergence under bregman distance growth conditions. Mathematics of Operations Research , 46(1):61--81, 2021

2021

[59] [59]

Exact convergence rate of the last iterate in subgradient methods

Moslem Zamani and Fran c ois Glineur. Exact convergence rate of the last iterate in subgradient methods. SIAM Journal on Optimization , 35(3):2182--2201, 2025

2025

[60] [60]

Why are adaptive methods good for attention models? In H

Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank Reddi, Sanjiv Kumar, and Suvrit Sra. Why are adaptive methods good for attention models? In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems , volume 33, pages 15383--15393. Curran Associates, Inc., 2020

2020

[61] [61]

Regret bounds without lipschitz continuity: Online learning with relative-lipschitz losses

Yihan Zhou, Victor Sanches Portella, Mark Schmidt, and Nicholas Harvey. Regret bounds without lipschitz continuity: Online learning with relative-lipschitz losses. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems , volume 33, pages 15823--15833. Curran Associates, Inc., 2020

2020

[62] [62]

Zǎlinescu

C. Zǎlinescu. On uniformly convex functions. Journal of Mathematical Analysis and Applications , 95(2):344--374, 1983

1983