Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method

Alexander Nazin; Alexandre Tsybakov; Anatoli Juditsky (LJK); Arkadi Nemirovsky

arxiv: 1907.02707 · v1 · pith:5BRX623Knew · submitted 2019-07-05 · 🧮 math.ST · math.OC· stat.TH

Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method

Anatoli Juditsky (LJK) , Alexander Nazin , Arkadi Nemirovsky , Alexandre Tsybakov This is my paper

Pith reviewed 2026-05-25 02:15 UTC · model grok-4.3

classification 🧮 math.ST math.OCstat.TH

keywords robust stochastic optimizationmirror descentgradient truncationsub-Gaussian boundsconvex optimizationheavy-tailed noisecomposite objectives

0 comments

The pith

Truncating stochastic gradients produces sub-Gaussian error bounds for mirror descent in convex stochastic optimization under weak noise tails.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs robust non-Euclidean iterative algorithms for convex composite stochastic optimization by truncating stochastic gradients. This truncation step enables sub-Gaussian confidence bounds on solution accuracy when the noise satisfies only weak tail conditions, and the bounds hold in both convex and strongly convex regimes. The same truncation idea supplies robust accuracy estimates that apply to general stochastic algorithms. A reader would care because typical stochastic methods require strong moment assumptions on noise that often fail in practice, while these algorithms remain reliable under milder conditions.

Core claim

By truncating stochastic gradients, one can construct non-Euclidean iterative algorithms that deliver sub-Gaussian concentration of the optimization error in convex and strongly convex composite stochastic optimization, relying only on weak assumptions about the tails of the noise distribution.

What carries the argument

truncation of stochastic gradients inside mirror descent iterations

If this is right

Sub-Gaussian confidence bounds hold for the convex setting.
Sub-Gaussian confidence bounds hold for the strongly convex setting.
Robust accuracy estimates extend to general stochastic algorithms beyond the proposed mirror descent variants.
The algorithms remain applicable to composite objectives and non-Euclidean geometries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The truncation approach may allow reliable optimization on data sets containing occasional large outliers without separate preprocessing steps.
Similar truncation could be tested inside variance-reduced stochastic methods to handle heavy tails while preserving faster convergence rates.
The weak-tail assumption might be relaxed further by combining truncation with other robust estimators such as coordinate-wise medians.

Load-bearing premise

The noise distribution has weak tail conditions that become sub-Gaussian after truncation.

What would settle it

A simulation or theoretical counterexample in which the optimization error after truncation fails to exhibit sub-Gaussian concentration for a noise distribution obeying the paper's stated weak tail conditions.

read the original abstract

We propose an approach to construction of robust non-Euclidean iterative algorithms for convex composite stochastic optimization based on truncation of stochastic gradients. For such algorithms, we establish sub-Gaussian confidence bounds under weak assumptions about the tails of the noise distribution in convex and strongly convex settings. Robust estimates of the accuracy of general stochastic algorithms are also proposed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Truncation inside mirror descent gives sub-Gaussian bounds under weak noise tails for convex and strongly convex problems.

read the letter

The one thing to know is that this paper develops robust versions of mirror descent by truncating the stochastic gradients, which produces sub-Gaussian confidence bounds on the solution accuracy under weak assumptions on the noise tails, covering both convex and strongly convex composite stochastic optimization. The actual novelty is in adapting the truncation technique to the non-Euclidean setting. Prior results on gradient truncation for robustness were mostly done in the Euclidean norm, so the work here shows how to handle the bias term when the mirror map is different and the analysis uses Bregman divergences instead of squared Euclidean distance. That makes the method applicable to a wider range of problems where the geometry matters. The paper does well by keeping the analysis straightforward: they control the truncation probability and the resulting bias using the tail conditions, then plug those into the usual mirror descent inequality to get the high-probability bound. The derivations appear to go through cleanly under the stated assumptions. A soft spot, though minor, is that the truncation level is a parameter that needs to be chosen based on the noise distribution properties. In applications where the tails are not known in advance, this could require some tuning or estimation step that is not addressed in the theoretical development. The central argument still holds under the stated assumptions. This paper is for specialists in stochastic optimization and robust learning who are already using or extending mirror descent methods. Someone looking for ways to relax sub-Gaussian noise assumptions while keeping high-probability guarantees will find the targeted construction useful. It deserves a serious referee. The contribution is focused, the math is grounded in standard techniques with a clear extension, and the claims are verifiable from the derivations. I would recommend sending it out for peer review.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes robust non-Euclidean iterative algorithms for convex composite stochastic optimization based on truncation of stochastic gradients within the mirror descent framework. It derives sub-Gaussian high-probability confidence bounds on the iterates under weak tail/moment conditions on the noise, for both convex and strongly convex settings, and additionally provides robust accuracy estimates for general stochastic algorithms.

Significance. If the central derivations hold, the work supplies a practical mechanism for obtaining sub-Gaussian deviation bounds in mirror-descent stochastic optimization without presupposing sub-Gaussian noise, by explicit truncation that controls bias and variance under mild moment assumptions. The explicit statement of assumptions, the bias-variance decomposition carried through the standard mirror-descent analysis, and the resulting high-probability bounds constitute a clear technical contribution to robust stochastic optimization.

minor comments (2)

[Abstract / §1] The abstract and introduction would benefit from a brief, explicit statement of the precise moment condition (e.g., the order of the moment and the truncation threshold) that is used to control both the truncation probability and the bias term, rather than referring only to 'weak assumptions about the tails.'
[§2] Notation for the composite objective (smooth + nonsmooth parts) and the mirror map is introduced gradually; a single consolidated table or paragraph collecting all standing assumptions and symbols would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our manuscript, the recognition of its technical contributions to robust stochastic optimization, and the recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes truncation-based robust mirror descent algorithms for stochastic convex optimization and derives sub-Gaussian high-probability bounds directly from explicit weak tail assumptions on the noise (via bias-variance decomposition through the standard mirror-descent recurrence). No step reduces a claimed prediction or bound to a parameter fitted from the same data, nor does any load-bearing premise rest on a self-citation chain; the derivation is self-contained against the stated moment conditions and standard analysis tools.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard convexity of the composite objective and on the existence of weak tail conditions on the noise that truncation converts into sub-Gaussian behavior. No free parameters or invented entities are visible in the abstract.

axioms (2)

domain assumption The objective is convex (or strongly convex)
Explicitly required for the stated settings of convex and strongly convex stochastic optimization.
domain assumption Noise admits weak tail assumptions under which truncation yields sub-Gaussian concentration
This is the key modeling premise that enables the claimed bounds.

pith-pipeline@v0.9.0 · 5582 in / 1224 out tokens · 25185 ms · 2026-05-25T02:15:56.415573+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose an approach to construction of robust non-Euclidean iterative algorithms for convex composite stochastic optimization based on truncation of stochastic gradients... sub-Gaussian confidence bounds under weak assumptions about the tails
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RSMD recursion xi = Prox... yi = G if ||G-g||* ≤ ... else g(¯x)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 9 canonical work pages · 7 internal anchors

[1]

Ro bust Stochastic Approximation Ap- proach to Stochastic Programming, SIAM J

Nemirovski, A., Juditsky, A., Lan, G., and Shapiro, A. Ro bust Stochastic Approximation Ap- proach to Stochastic Programming, SIAM J. Optim. , 2009, vol. 19, no. 4, pp. 1574–1609

2009
[2]

and Nesterov, Y., Deterministic and Stocha stic Primal-Dual Subgradient Algorithms for Uniformly Convex Minimization, Stoch

Juditsky, A. and Nesterov, Y., Deterministic and Stocha stic Primal-Dual Subgradient Algorithms for Uniformly Convex Minimization, Stoch. Syst. , 2014, vol. 4, no. 1, pp. 44–80

2014
[3]

and Lan, G., Optimal Stochastic Approximati on Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmi c Framework, SIAM J

Ghadimi, S. and Lan, G., Optimal Stochastic Approximati on Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmi c Framework, SIAM J. Optim. , 2012, vol. 22, no. 4, pp. 1469–1492

2012
[4]

(eds.) Contributions to Probability and Statistics: Essays in Hon or of Harold Hotelling, pp

Tukey, J.W., A Survey of Sampling from Contaminated Dist ributions, In: Olkin, I., et al. (eds.) Contributions to Probability and Statistics: Essays in Hon or of Harold Hotelling, pp. 448–485. Stanford University Press, Palo Alto, 1960

1960
[5]

J., Robust Estimation of a Location Parameter, Ann

Huber, P. J., Robust Estimation of a Location Parameter, Ann. Math. Statist. , 1964, vol. 35, no. 1, pp. 73–101

1964
[6]

J., Robust Statistics: A Review, Ann

Huber, P. J., Robust Statistics: A Review, Ann. Math. Statist. , 1972, vol. 43, no. 4, pp. 1041– 1067

1972
[7]

J., Robust Statistics , New York: John Wiley and Sons, 1981

Huber, P. J., Robust Statistics , New York: John Wiley and Sons, 1981

1981
[8]

and Masreliez, C., Robust Estimation via Stoc hastic Approximation, IEEE Trans

Martin, R. and Masreliez, C., Robust Estimation via Stoc hastic Approximation, IEEE Trans. Information Theory, 1975, vol. 21, no. 3, pp. 263–271

1975
[9]

and Tsypkin, Ya.Z., Adaptive Estimation Al gorithms: Convergence, Optimality, Stability, Autom

Polyak, B.T. and Tsypkin, Ya.Z., Adaptive Estimation Al gorithms: Convergence, Optimality, Stability, Autom. Remote Control , 1979, vol. 40, no. 3, pp. 378–389

1979
[10]

and Tsypkin, Ya.Z., Robust Pseudogradien t Adaptation Algorithms, Autom

Polyak, B.T. and Tsypkin, Ya.Z., Robust Pseudogradien t Adaptation Algorithms, Autom. Remote Control, 1981, vol. 41, no. 10, pp. 1404–1409

1981
[11]

and Tsypkin, J.Z., Robust Identiﬁcation, Automatica, 1980, vol

Polyak, B. and Tsypkin, J.Z., Robust Identiﬁcation, Automatica, 1980, vol. 16, no. 1, pp. 53–63

1980
[12]

and Vandelinde, V., Robust Estimation Using t he Robbins-Monro Stochastic Approxi- mation Algorithm, IEEE Trans

Price, E. and Vandelinde, V., Robust Estimation Using t he Robbins-Monro Stochastic Approxi- mation Algorithm, IEEE Trans. Information Theory , 1979, vol. 25, no. 6, pp. 698–704

1979
[13]

and Kovaˇ cevi´ c, B.D., Analysis of Robust Stochastic Approximation Algorithms for Process Identiﬁcation, Automatica, 1986, vol

Stankovi´ c, S.S. and Kovaˇ cevi´ c, B.D., Analysis of Robust Stochastic Approximation Algorithms for Process Identiﬁcation, Automatica, 1986, vol. 22, no. 4, pp. 483–488

1986
[14]

, Convergence and Robustness of the Robbins-Monro Algorithm Truncated at Randomly Varying Bounds, Stoch

Chen, H.-F., Guo, L., and Gao, A.J. , Convergence and Robustness of the Robbins-Monro Algorithm Truncated at Randomly Varying Bounds, Stoch. Proc. Appl. , 1987, vol. 27, pp. 217– 231

1987
[15]

and Gao, A.J., Robustness Analysis for Stoc hastic Approximation Algorithms, Stochast

Chen, H.-F. and Gao, A.J., Robustness Analysis for Stoc hastic Approximation Algorithms, Stochast. Stochast. Rep. , 1989, vol. 26, no. 1, pp. 3–20

1989
[16]

Information Theory , 1992, vol

Nazin, A.V., Polyak, B.T., and Tsybakov, A.B., Optimal and Robust Kernel Algorithms for Passive Stochastic Approximation, IEEE Trans. Information Theory , 1992, vol. 38, no. 5, pp. 1577–1583

1992
[17]

(In Rus- sian.) 20

Tsypkin, Ya.Z., Osnovy Informatsionnoy Teorii Identiﬁkatsii , Moscow: Nauka, 1984. (In Rus- sian.) 20

1984
[18]

(In Russian.)

Tsypkin, Ya.Z., Informatsionnaya Teoriya Identiﬁkatsii , Moscow: Nauka, 1995. (In Russian.)

1995
[19]

Kwon, J., Lecu´ e, G., and Lerasle, M., Median of Means Principle as a Divide-and-Conquer Procedure for Robustness, Sub-Sampling and Hyper-paramet ers Tuning , 2018, arXiv:1812.02435

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

Chinot, G., Lecu´ e, G., and Lerasle, M., Statistical Le arning with Lipschitz and Convex Loss Functions, 2018, arXiv preprint arXiv:1810.01090

work page internal anchor Pith review Pith/arXiv arXiv 2018
[21]

Robust machine learning by median-of-means : theory and practice

Lecu´ e, G. and Lerasle, M. (2017). Robust machine learn ing by median-of-means: theory and practice, 2017, arXiv preprint arXiv:1711.10306v2. Annals of Stat. , to appear

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

Robust classification via MOM minimization

Lecu´ e, G., Lerasle, M., and Mathieu, T. Robust Classiﬁ cation via MOM Minimization, 2018, arXiv preprint arXiv:1808.03106

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

and Oliveira, R

Lerasle, M. and Oliveira, R. I., Robust empirical mean e stimators, 2011, arXiv preprint arXiv:1112.3914

work page arXiv 2011
[24]

Risk minimization by median-of-means tournaments

Lugosi, G. and Mendelson, S., Risk Minimization by Medi an-of-Means Tournaments, 2016, arXiv preprint arXiv:1608.00757

work page internal anchor Pith review Pith/arXiv arXiv 2016
[25]

Regularization, sparse recovery, and median-of-means tournaments

Lugosi, G. and Mendelson, S., Regularization, Sparse R ecovery, and Median-of-Means Tourna- ments, 2017, arXiv preprint arXiv:1701.04112

work page internal anchor Pith review Pith/arXiv arXiv 2017
[26]

Near-optimal mean estimators with respect to general norms

Lugosi, G. and Mendelson, S., Near-Optimal Mean Estima tors with Respect to General Norms, 2018, arXiv preprint arXiv:1806.06233

work page internal anchor Pith review Pith/arXiv arXiv 2018
[27]

and Sabato, S., Loss Minimization and Parameter Estimation with Heavy Tails, J

Hsu, D. and Sabato, S., Loss Minimization and Parameter Estimation with Heavy Tails, J. Machine Learning Research, 2016, vol. 17, no. 1, pp. 543–582

2016
[28]

Information Theory, 2013, vol

Bubeck, S., Cesa-Bianchi, N., and Lugosi, G., Bandits w ith Heavy Tail, IEEE Trans. Information Theory, 2013, vol. 59, no. 11, pp. 7711–7717

2013
[29]

Stat., 2016, vol

Devroye, L., Lerasle, M., Lugosi, G., and Oliveira, R.I ., Sub-Gaussian Mean Estimators, Ann. Stat., 2016, vol. 44, no. 6, pp. 2695–2725

2016
[30]

and Yudin, D.B., Slozhnost’ zadach i eﬀektivnost’ metodov optimizatsii , Moscow: Nauka, 1979

Nemirovskii, A.S. and Yudin, D.B., Slozhnost’ zadach i eﬀektivnost’ metodov optimizatsii , Moscow: Nauka, 1979. Translated under the title Problem Complexity and Method Eﬃciency in Optimization , Chichester: Wiley, 1983

1979
[31]

and Mendelson, S., Sub-Gaussian Estimators of the Mean of a Random Vector, Ann

Lugosi, G. and Mendelson, S., Sub-Gaussian Estimators of the Mean of a Random Vector, Ann. Stat., 2019, vol. 47, no. 2, pp. 783–794

2019
[32]

IHP: Probab

Catoni, O., Challenging the Empirical Mean and Empiric al Variance: a Deviation Study, Ann. IHP: Probab. Stat. , 2012, vol. 48, no. 4, pp. 1148–1185

2012
[33]

and Catoni, O., Robust Linear Least Squ ares Regression, Ann

Audibert, J.-Y. and Catoni, O., Robust Linear Least Squ ares Regression, Ann. Stat. , 2011, vol. 39, no. 5, pp. 2766–2794

2011
[34]

Minsker, S., Geometric Median and Robust Estimation in Banach Spaces, Bernoulli, 2015, vol. 21, no. 4, pp. 2308–2335

2015
[35]

and Minsker, S., Estimation of the Covariance St ructure of Heavy-Tailed Distributions, In Advances in Neural Information Processing Systems , pp

Wei, X. and Minsker, S., Estimation of the Covariance St ructure of Heavy-Tailed Distributions, In Advances in Neural Information Processing Systems , pp. 2859–2868, 2017. 21

2017
[36]

Chen, Y., Su, L., and Xu, J., Distributed Statistical Ma chine Learning in Adversarial Settings: Byzantine Gradient Descent, Proceedings of the ACM on Measurement and Analysis of Computin g Systems, vol. 1, iss. 2, article no. 44, 2017

2017
[37]

Yin, D., Chen, Y., Ramchandran, K., and Bartlett, P., By zantine-Robust Distributed Learning: Towards Optimal Statistical Rates, 2018, arXiv preprint arXiv:1803.01498

work page arXiv 2018
[38]

COMPSTAT’2010, pp

Cardot, H., C´ enac, P., and Chaouch, M., Stochastic App roximation for Multivariate and Func- tional Median, In Proc. COMPSTAT’2010, pp. 421–428. Springer, 2010

2010
[39]

Stat., 2017, vol

Cardot, H., C´ enac, P., and Godichon-Baggioni, A., Onl ine Estimation of the Geometric Median in Hilbert Spaces: Nonasymptotic Conﬁdence Balls, Ann. Stat., 2017, vol. 45, no. 2, pp. 591–614

2017
[40]

Program., 2012, vol

Lan, G., An Optimal Method for Stochastic Composite Opt imization, Math. Program., 2012, vol. 133, nos. 1-2, pp. 365–397

2012
[41]

Program., 2018, pp

Necoara, I., Nesterov, Y., and Glineur, F., Linear Conv ergence of First Order Methods for Non-Strongly Convex Optimization, Math. Program., 2018, pp. 1–39

2018
[42]

and Nemirovski, A., First Order Methods fo r Nonsmooth Convex Large-Scale Op- timization, I: General Purpose Methods, In: S

Juditsky, A. and Nemirovski, A., First Order Methods fo r Nonsmooth Convex Large-Scale Op- timization, I: General Purpose Methods, In: S. Sra, S. Nowoz in, and S. J. Wright (eds.), Opti- mization for Machine Learning , pp. 121–148. MIT Press, 2011

2011
[43]

Probab., 1975, vol

Freedman, D.A., On Tail Probabilities for Martingales , Ann. Probab., 1975, vol. 3, no. 1, pp. 100–118

1975
[44]

and Teboulle, M., Convergence Analysis of a Pro ximal-Like Minimization Algorithm Using Bregman Functions, SIAM J

Chen, G. and Teboulle, M., Convergence Analysis of a Pro ximal-Like Minimization Algorithm Using Bregman Functions, SIAM J. Optim. , 1993, vol. 3, no. 3, pp. 538–543. 22

1993

[1] [1]

Ro bust Stochastic Approximation Ap- proach to Stochastic Programming, SIAM J

Nemirovski, A., Juditsky, A., Lan, G., and Shapiro, A. Ro bust Stochastic Approximation Ap- proach to Stochastic Programming, SIAM J. Optim. , 2009, vol. 19, no. 4, pp. 1574–1609

2009

[2] [2]

and Nesterov, Y., Deterministic and Stocha stic Primal-Dual Subgradient Algorithms for Uniformly Convex Minimization, Stoch

Juditsky, A. and Nesterov, Y., Deterministic and Stocha stic Primal-Dual Subgradient Algorithms for Uniformly Convex Minimization, Stoch. Syst. , 2014, vol. 4, no. 1, pp. 44–80

2014

[3] [3]

and Lan, G., Optimal Stochastic Approximati on Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmi c Framework, SIAM J

Ghadimi, S. and Lan, G., Optimal Stochastic Approximati on Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmi c Framework, SIAM J. Optim. , 2012, vol. 22, no. 4, pp. 1469–1492

2012

[4] [4]

(eds.) Contributions to Probability and Statistics: Essays in Hon or of Harold Hotelling, pp

Tukey, J.W., A Survey of Sampling from Contaminated Dist ributions, In: Olkin, I., et al. (eds.) Contributions to Probability and Statistics: Essays in Hon or of Harold Hotelling, pp. 448–485. Stanford University Press, Palo Alto, 1960

1960

[5] [5]

J., Robust Estimation of a Location Parameter, Ann

Huber, P. J., Robust Estimation of a Location Parameter, Ann. Math. Statist. , 1964, vol. 35, no. 1, pp. 73–101

1964

[6] [6]

J., Robust Statistics: A Review, Ann

Huber, P. J., Robust Statistics: A Review, Ann. Math. Statist. , 1972, vol. 43, no. 4, pp. 1041– 1067

1972

[7] [7]

J., Robust Statistics , New York: John Wiley and Sons, 1981

Huber, P. J., Robust Statistics , New York: John Wiley and Sons, 1981

1981

[8] [8]

and Masreliez, C., Robust Estimation via Stoc hastic Approximation, IEEE Trans

Martin, R. and Masreliez, C., Robust Estimation via Stoc hastic Approximation, IEEE Trans. Information Theory, 1975, vol. 21, no. 3, pp. 263–271

1975

[9] [9]

and Tsypkin, Ya.Z., Adaptive Estimation Al gorithms: Convergence, Optimality, Stability, Autom

Polyak, B.T. and Tsypkin, Ya.Z., Adaptive Estimation Al gorithms: Convergence, Optimality, Stability, Autom. Remote Control , 1979, vol. 40, no. 3, pp. 378–389

1979

[10] [10]

and Tsypkin, Ya.Z., Robust Pseudogradien t Adaptation Algorithms, Autom

Polyak, B.T. and Tsypkin, Ya.Z., Robust Pseudogradien t Adaptation Algorithms, Autom. Remote Control, 1981, vol. 41, no. 10, pp. 1404–1409

1981

[11] [11]

and Tsypkin, J.Z., Robust Identiﬁcation, Automatica, 1980, vol

Polyak, B. and Tsypkin, J.Z., Robust Identiﬁcation, Automatica, 1980, vol. 16, no. 1, pp. 53–63

1980

[12] [12]

and Vandelinde, V., Robust Estimation Using t he Robbins-Monro Stochastic Approxi- mation Algorithm, IEEE Trans

Price, E. and Vandelinde, V., Robust Estimation Using t he Robbins-Monro Stochastic Approxi- mation Algorithm, IEEE Trans. Information Theory , 1979, vol. 25, no. 6, pp. 698–704

1979

[13] [13]

and Kovaˇ cevi´ c, B.D., Analysis of Robust Stochastic Approximation Algorithms for Process Identiﬁcation, Automatica, 1986, vol

Stankovi´ c, S.S. and Kovaˇ cevi´ c, B.D., Analysis of Robust Stochastic Approximation Algorithms for Process Identiﬁcation, Automatica, 1986, vol. 22, no. 4, pp. 483–488

1986

[14] [14]

, Convergence and Robustness of the Robbins-Monro Algorithm Truncated at Randomly Varying Bounds, Stoch

Chen, H.-F., Guo, L., and Gao, A.J. , Convergence and Robustness of the Robbins-Monro Algorithm Truncated at Randomly Varying Bounds, Stoch. Proc. Appl. , 1987, vol. 27, pp. 217– 231

1987

[15] [15]

and Gao, A.J., Robustness Analysis for Stoc hastic Approximation Algorithms, Stochast

Chen, H.-F. and Gao, A.J., Robustness Analysis for Stoc hastic Approximation Algorithms, Stochast. Stochast. Rep. , 1989, vol. 26, no. 1, pp. 3–20

1989

[16] [16]

Information Theory , 1992, vol

Nazin, A.V., Polyak, B.T., and Tsybakov, A.B., Optimal and Robust Kernel Algorithms for Passive Stochastic Approximation, IEEE Trans. Information Theory , 1992, vol. 38, no. 5, pp. 1577–1583

1992

[17] [17]

(In Rus- sian.) 20

Tsypkin, Ya.Z., Osnovy Informatsionnoy Teorii Identiﬁkatsii , Moscow: Nauka, 1984. (In Rus- sian.) 20

1984

[18] [18]

(In Russian.)

Tsypkin, Ya.Z., Informatsionnaya Teoriya Identiﬁkatsii , Moscow: Nauka, 1995. (In Russian.)

1995

[19] [19]

Kwon, J., Lecu´ e, G., and Lerasle, M., Median of Means Principle as a Divide-and-Conquer Procedure for Robustness, Sub-Sampling and Hyper-paramet ers Tuning , 2018, arXiv:1812.02435

work page internal anchor Pith review Pith/arXiv arXiv 2018

[20] [20]

Chinot, G., Lecu´ e, G., and Lerasle, M., Statistical Le arning with Lipschitz and Convex Loss Functions, 2018, arXiv preprint arXiv:1810.01090

work page internal anchor Pith review Pith/arXiv arXiv 2018

[21] [21]

Robust machine learning by median-of-means : theory and practice

Lecu´ e, G. and Lerasle, M. (2017). Robust machine learn ing by median-of-means: theory and practice, 2017, arXiv preprint arXiv:1711.10306v2. Annals of Stat. , to appear

work page internal anchor Pith review Pith/arXiv arXiv 2017

[22] [22]

Robust classification via MOM minimization

Lecu´ e, G., Lerasle, M., and Mathieu, T. Robust Classiﬁ cation via MOM Minimization, 2018, arXiv preprint arXiv:1808.03106

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [23]

and Oliveira, R

Lerasle, M. and Oliveira, R. I., Robust empirical mean e stimators, 2011, arXiv preprint arXiv:1112.3914

work page arXiv 2011

[24] [24]

Risk minimization by median-of-means tournaments

Lugosi, G. and Mendelson, S., Risk Minimization by Medi an-of-Means Tournaments, 2016, arXiv preprint arXiv:1608.00757

work page internal anchor Pith review Pith/arXiv arXiv 2016

[25] [25]

Regularization, sparse recovery, and median-of-means tournaments

Lugosi, G. and Mendelson, S., Regularization, Sparse R ecovery, and Median-of-Means Tourna- ments, 2017, arXiv preprint arXiv:1701.04112

work page internal anchor Pith review Pith/arXiv arXiv 2017

[26] [26]

Near-optimal mean estimators with respect to general norms

Lugosi, G. and Mendelson, S., Near-Optimal Mean Estima tors with Respect to General Norms, 2018, arXiv preprint arXiv:1806.06233

work page internal anchor Pith review Pith/arXiv arXiv 2018

[27] [27]

and Sabato, S., Loss Minimization and Parameter Estimation with Heavy Tails, J

Hsu, D. and Sabato, S., Loss Minimization and Parameter Estimation with Heavy Tails, J. Machine Learning Research, 2016, vol. 17, no. 1, pp. 543–582

2016

[28] [28]

Information Theory, 2013, vol

Bubeck, S., Cesa-Bianchi, N., and Lugosi, G., Bandits w ith Heavy Tail, IEEE Trans. Information Theory, 2013, vol. 59, no. 11, pp. 7711–7717

2013

[29] [29]

Stat., 2016, vol

Devroye, L., Lerasle, M., Lugosi, G., and Oliveira, R.I ., Sub-Gaussian Mean Estimators, Ann. Stat., 2016, vol. 44, no. 6, pp. 2695–2725

2016

[30] [30]

and Yudin, D.B., Slozhnost’ zadach i eﬀektivnost’ metodov optimizatsii , Moscow: Nauka, 1979

Nemirovskii, A.S. and Yudin, D.B., Slozhnost’ zadach i eﬀektivnost’ metodov optimizatsii , Moscow: Nauka, 1979. Translated under the title Problem Complexity and Method Eﬃciency in Optimization , Chichester: Wiley, 1983

1979

[31] [31]

and Mendelson, S., Sub-Gaussian Estimators of the Mean of a Random Vector, Ann

Lugosi, G. and Mendelson, S., Sub-Gaussian Estimators of the Mean of a Random Vector, Ann. Stat., 2019, vol. 47, no. 2, pp. 783–794

2019

[32] [32]

IHP: Probab

Catoni, O., Challenging the Empirical Mean and Empiric al Variance: a Deviation Study, Ann. IHP: Probab. Stat. , 2012, vol. 48, no. 4, pp. 1148–1185

2012

[33] [33]

and Catoni, O., Robust Linear Least Squ ares Regression, Ann

Audibert, J.-Y. and Catoni, O., Robust Linear Least Squ ares Regression, Ann. Stat. , 2011, vol. 39, no. 5, pp. 2766–2794

2011

[34] [34]

Minsker, S., Geometric Median and Robust Estimation in Banach Spaces, Bernoulli, 2015, vol. 21, no. 4, pp. 2308–2335

2015

[35] [35]

and Minsker, S., Estimation of the Covariance St ructure of Heavy-Tailed Distributions, In Advances in Neural Information Processing Systems , pp

Wei, X. and Minsker, S., Estimation of the Covariance St ructure of Heavy-Tailed Distributions, In Advances in Neural Information Processing Systems , pp. 2859–2868, 2017. 21

2017

[36] [36]

Chen, Y., Su, L., and Xu, J., Distributed Statistical Ma chine Learning in Adversarial Settings: Byzantine Gradient Descent, Proceedings of the ACM on Measurement and Analysis of Computin g Systems, vol. 1, iss. 2, article no. 44, 2017

2017

[37] [37]

Yin, D., Chen, Y., Ramchandran, K., and Bartlett, P., By zantine-Robust Distributed Learning: Towards Optimal Statistical Rates, 2018, arXiv preprint arXiv:1803.01498

work page arXiv 2018

[38] [38]

COMPSTAT’2010, pp

Cardot, H., C´ enac, P., and Chaouch, M., Stochastic App roximation for Multivariate and Func- tional Median, In Proc. COMPSTAT’2010, pp. 421–428. Springer, 2010

2010

[39] [39]

Stat., 2017, vol

Cardot, H., C´ enac, P., and Godichon-Baggioni, A., Onl ine Estimation of the Geometric Median in Hilbert Spaces: Nonasymptotic Conﬁdence Balls, Ann. Stat., 2017, vol. 45, no. 2, pp. 591–614

2017

[40] [40]

Program., 2012, vol

Lan, G., An Optimal Method for Stochastic Composite Opt imization, Math. Program., 2012, vol. 133, nos. 1-2, pp. 365–397

2012

[41] [41]

Program., 2018, pp

Necoara, I., Nesterov, Y., and Glineur, F., Linear Conv ergence of First Order Methods for Non-Strongly Convex Optimization, Math. Program., 2018, pp. 1–39

2018

[42] [42]

and Nemirovski, A., First Order Methods fo r Nonsmooth Convex Large-Scale Op- timization, I: General Purpose Methods, In: S

Juditsky, A. and Nemirovski, A., First Order Methods fo r Nonsmooth Convex Large-Scale Op- timization, I: General Purpose Methods, In: S. Sra, S. Nowoz in, and S. J. Wright (eds.), Opti- mization for Machine Learning , pp. 121–148. MIT Press, 2011

2011

[43] [43]

Probab., 1975, vol

Freedman, D.A., On Tail Probabilities for Martingales , Ann. Probab., 1975, vol. 3, no. 1, pp. 100–118

1975

[44] [44]

and Teboulle, M., Convergence Analysis of a Pro ximal-Like Minimization Algorithm Using Bregman Functions, SIAM J

Chen, G. and Teboulle, M., Convergence Analysis of a Pro ximal-Like Minimization Algorithm Using Bregman Functions, SIAM J. Optim. , 1993, vol. 3, no. 3, pp. 538–543. 22

1993