pith. sign in

arxiv: 1907.02707 · v1 · pith:5BRX623Knew · submitted 2019-07-05 · 🧮 math.ST · math.OC· stat.TH

Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method

Pith reviewed 2026-05-25 02:15 UTC · model grok-4.3

classification 🧮 math.ST math.OCstat.TH
keywords robust stochastic optimizationmirror descentgradient truncationsub-Gaussian boundsconvex optimizationheavy-tailed noisecomposite objectives
0
0 comments X

The pith

Truncating stochastic gradients produces sub-Gaussian error bounds for mirror descent in convex stochastic optimization under weak noise tails.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs robust non-Euclidean iterative algorithms for convex composite stochastic optimization by truncating stochastic gradients. This truncation step enables sub-Gaussian confidence bounds on solution accuracy when the noise satisfies only weak tail conditions, and the bounds hold in both convex and strongly convex regimes. The same truncation idea supplies robust accuracy estimates that apply to general stochastic algorithms. A reader would care because typical stochastic methods require strong moment assumptions on noise that often fail in practice, while these algorithms remain reliable under milder conditions.

Core claim

By truncating stochastic gradients, one can construct non-Euclidean iterative algorithms that deliver sub-Gaussian concentration of the optimization error in convex and strongly convex composite stochastic optimization, relying only on weak assumptions about the tails of the noise distribution.

What carries the argument

truncation of stochastic gradients inside mirror descent iterations

If this is right

  • Sub-Gaussian confidence bounds hold for the convex setting.
  • Sub-Gaussian confidence bounds hold for the strongly convex setting.
  • Robust accuracy estimates extend to general stochastic algorithms beyond the proposed mirror descent variants.
  • The algorithms remain applicable to composite objectives and non-Euclidean geometries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The truncation approach may allow reliable optimization on data sets containing occasional large outliers without separate preprocessing steps.
  • Similar truncation could be tested inside variance-reduced stochastic methods to handle heavy tails while preserving faster convergence rates.
  • The weak-tail assumption might be relaxed further by combining truncation with other robust estimators such as coordinate-wise medians.

Load-bearing premise

The noise distribution has weak tail conditions that become sub-Gaussian after truncation.

What would settle it

A simulation or theoretical counterexample in which the optimization error after truncation fails to exhibit sub-Gaussian concentration for a noise distribution obeying the paper's stated weak tail conditions.

read the original abstract

We propose an approach to construction of robust non-Euclidean iterative algorithms for convex composite stochastic optimization based on truncation of stochastic gradients. For such algorithms, we establish sub-Gaussian confidence bounds under weak assumptions about the tails of the noise distribution in convex and strongly convex settings. Robust estimates of the accuracy of general stochastic algorithms are also proposed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes robust non-Euclidean iterative algorithms for convex composite stochastic optimization based on truncation of stochastic gradients within the mirror descent framework. It derives sub-Gaussian high-probability confidence bounds on the iterates under weak tail/moment conditions on the noise, for both convex and strongly convex settings, and additionally provides robust accuracy estimates for general stochastic algorithms.

Significance. If the central derivations hold, the work supplies a practical mechanism for obtaining sub-Gaussian deviation bounds in mirror-descent stochastic optimization without presupposing sub-Gaussian noise, by explicit truncation that controls bias and variance under mild moment assumptions. The explicit statement of assumptions, the bias-variance decomposition carried through the standard mirror-descent analysis, and the resulting high-probability bounds constitute a clear technical contribution to robust stochastic optimization.

minor comments (2)
  1. [Abstract / §1] The abstract and introduction would benefit from a brief, explicit statement of the precise moment condition (e.g., the order of the moment and the truncation threshold) that is used to control both the truncation probability and the bias term, rather than referring only to 'weak assumptions about the tails.'
  2. [§2] Notation for the composite objective (smooth + nonsmooth parts) and the mirror map is introduced gradually; a single consolidated table or paragraph collecting all standing assumptions and symbols would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our manuscript, the recognition of its technical contributions to robust stochastic optimization, and the recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes truncation-based robust mirror descent algorithms for stochastic convex optimization and derives sub-Gaussian high-probability bounds directly from explicit weak tail assumptions on the noise (via bias-variance decomposition through the standard mirror-descent recurrence). No step reduces a claimed prediction or bound to a parameter fitted from the same data, nor does any load-bearing premise rest on a self-citation chain; the derivation is self-contained against the stated moment conditions and standard analysis tools.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard convexity of the composite objective and on the existence of weak tail conditions on the noise that truncation converts into sub-Gaussian behavior. No free parameters or invented entities are visible in the abstract.

axioms (2)
  • domain assumption The objective is convex (or strongly convex)
    Explicitly required for the stated settings of convex and strongly convex stochastic optimization.
  • domain assumption Noise admits weak tail assumptions under which truncation yields sub-Gaussian concentration
    This is the key modeling premise that enables the claimed bounds.

pith-pipeline@v0.9.0 · 5582 in / 1224 out tokens · 25185 ms · 2026-05-25T02:15:56.415573+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 9 canonical work pages · 7 internal anchors

  1. [1]

    Ro bust Stochastic Approximation Ap- proach to Stochastic Programming, SIAM J

    Nemirovski, A., Juditsky, A., Lan, G., and Shapiro, A. Ro bust Stochastic Approximation Ap- proach to Stochastic Programming, SIAM J. Optim. , 2009, vol. 19, no. 4, pp. 1574–1609

  2. [2]

    and Nesterov, Y., Deterministic and Stocha stic Primal-Dual Subgradient Algorithms for Uniformly Convex Minimization, Stoch

    Juditsky, A. and Nesterov, Y., Deterministic and Stocha stic Primal-Dual Subgradient Algorithms for Uniformly Convex Minimization, Stoch. Syst. , 2014, vol. 4, no. 1, pp. 44–80

  3. [3]

    and Lan, G., Optimal Stochastic Approximati on Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmi c Framework, SIAM J

    Ghadimi, S. and Lan, G., Optimal Stochastic Approximati on Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmi c Framework, SIAM J. Optim. , 2012, vol. 22, no. 4, pp. 1469–1492

  4. [4]

    (eds.) Contributions to Probability and Statistics: Essays in Hon or of Harold Hotelling, pp

    Tukey, J.W., A Survey of Sampling from Contaminated Dist ributions, In: Olkin, I., et al. (eds.) Contributions to Probability and Statistics: Essays in Hon or of Harold Hotelling, pp. 448–485. Stanford University Press, Palo Alto, 1960

  5. [5]

    J., Robust Estimation of a Location Parameter, Ann

    Huber, P. J., Robust Estimation of a Location Parameter, Ann. Math. Statist. , 1964, vol. 35, no. 1, pp. 73–101

  6. [6]

    J., Robust Statistics: A Review, Ann

    Huber, P. J., Robust Statistics: A Review, Ann. Math. Statist. , 1972, vol. 43, no. 4, pp. 1041– 1067

  7. [7]

    J., Robust Statistics , New York: John Wiley and Sons, 1981

    Huber, P. J., Robust Statistics , New York: John Wiley and Sons, 1981

  8. [8]

    and Masreliez, C., Robust Estimation via Stoc hastic Approximation, IEEE Trans

    Martin, R. and Masreliez, C., Robust Estimation via Stoc hastic Approximation, IEEE Trans. Information Theory, 1975, vol. 21, no. 3, pp. 263–271

  9. [9]

    and Tsypkin, Ya.Z., Adaptive Estimation Al gorithms: Convergence, Optimality, Stability, Autom

    Polyak, B.T. and Tsypkin, Ya.Z., Adaptive Estimation Al gorithms: Convergence, Optimality, Stability, Autom. Remote Control , 1979, vol. 40, no. 3, pp. 378–389

  10. [10]

    and Tsypkin, Ya.Z., Robust Pseudogradien t Adaptation Algorithms, Autom

    Polyak, B.T. and Tsypkin, Ya.Z., Robust Pseudogradien t Adaptation Algorithms, Autom. Remote Control, 1981, vol. 41, no. 10, pp. 1404–1409

  11. [11]

    and Tsypkin, J.Z., Robust Identification, Automatica, 1980, vol

    Polyak, B. and Tsypkin, J.Z., Robust Identification, Automatica, 1980, vol. 16, no. 1, pp. 53–63

  12. [12]

    and Vandelinde, V., Robust Estimation Using t he Robbins-Monro Stochastic Approxi- mation Algorithm, IEEE Trans

    Price, E. and Vandelinde, V., Robust Estimation Using t he Robbins-Monro Stochastic Approxi- mation Algorithm, IEEE Trans. Information Theory , 1979, vol. 25, no. 6, pp. 698–704

  13. [13]

    and Kovaˇ cevi´ c, B.D., Analysis of Robust Stochastic Approximation Algorithms for Process Identification, Automatica, 1986, vol

    Stankovi´ c, S.S. and Kovaˇ cevi´ c, B.D., Analysis of Robust Stochastic Approximation Algorithms for Process Identification, Automatica, 1986, vol. 22, no. 4, pp. 483–488

  14. [14]

    , Convergence and Robustness of the Robbins-Monro Algorithm Truncated at Randomly Varying Bounds, Stoch

    Chen, H.-F., Guo, L., and Gao, A.J. , Convergence and Robustness of the Robbins-Monro Algorithm Truncated at Randomly Varying Bounds, Stoch. Proc. Appl. , 1987, vol. 27, pp. 217– 231

  15. [15]

    and Gao, A.J., Robustness Analysis for Stoc hastic Approximation Algorithms, Stochast

    Chen, H.-F. and Gao, A.J., Robustness Analysis for Stoc hastic Approximation Algorithms, Stochast. Stochast. Rep. , 1989, vol. 26, no. 1, pp. 3–20

  16. [16]

    Information Theory , 1992, vol

    Nazin, A.V., Polyak, B.T., and Tsybakov, A.B., Optimal and Robust Kernel Algorithms for Passive Stochastic Approximation, IEEE Trans. Information Theory , 1992, vol. 38, no. 5, pp. 1577–1583

  17. [17]

    (In Rus- sian.) 20

    Tsypkin, Ya.Z., Osnovy Informatsionnoy Teorii Identifikatsii , Moscow: Nauka, 1984. (In Rus- sian.) 20

  18. [18]

    (In Russian.)

    Tsypkin, Ya.Z., Informatsionnaya Teoriya Identifikatsii , Moscow: Nauka, 1995. (In Russian.)

  19. [19]

    Kwon, J., Lecu´ e, G., and Lerasle, M., Median of Means Principle as a Divide-and-Conquer Procedure for Robustness, Sub-Sampling and Hyper-paramet ers Tuning , 2018, arXiv:1812.02435

  20. [20]

    Chinot, G., Lecu´ e, G., and Lerasle, M., Statistical Le arning with Lipschitz and Convex Loss Functions, 2018, arXiv preprint arXiv:1810.01090

  21. [21]

    Robust machine learning by median-of-means : theory and practice

    Lecu´ e, G. and Lerasle, M. (2017). Robust machine learn ing by median-of-means: theory and practice, 2017, arXiv preprint arXiv:1711.10306v2. Annals of Stat. , to appear

  22. [22]

    Robust classification via MOM minimization

    Lecu´ e, G., Lerasle, M., and Mathieu, T. Robust Classifi cation via MOM Minimization, 2018, arXiv preprint arXiv:1808.03106

  23. [23]

    and Oliveira, R

    Lerasle, M. and Oliveira, R. I., Robust empirical mean e stimators, 2011, arXiv preprint arXiv:1112.3914

  24. [24]

    Risk minimization by median-of-means tournaments

    Lugosi, G. and Mendelson, S., Risk Minimization by Medi an-of-Means Tournaments, 2016, arXiv preprint arXiv:1608.00757

  25. [25]

    Regularization, sparse recovery, and median-of-means tournaments

    Lugosi, G. and Mendelson, S., Regularization, Sparse R ecovery, and Median-of-Means Tourna- ments, 2017, arXiv preprint arXiv:1701.04112

  26. [26]

    Near-optimal mean estimators with respect to general norms

    Lugosi, G. and Mendelson, S., Near-Optimal Mean Estima tors with Respect to General Norms, 2018, arXiv preprint arXiv:1806.06233

  27. [27]

    and Sabato, S., Loss Minimization and Parameter Estimation with Heavy Tails, J

    Hsu, D. and Sabato, S., Loss Minimization and Parameter Estimation with Heavy Tails, J. Machine Learning Research, 2016, vol. 17, no. 1, pp. 543–582

  28. [28]

    Information Theory, 2013, vol

    Bubeck, S., Cesa-Bianchi, N., and Lugosi, G., Bandits w ith Heavy Tail, IEEE Trans. Information Theory, 2013, vol. 59, no. 11, pp. 7711–7717

  29. [29]

    Stat., 2016, vol

    Devroye, L., Lerasle, M., Lugosi, G., and Oliveira, R.I ., Sub-Gaussian Mean Estimators, Ann. Stat., 2016, vol. 44, no. 6, pp. 2695–2725

  30. [30]

    and Yudin, D.B., Slozhnost’ zadach i effektivnost’ metodov optimizatsii , Moscow: Nauka, 1979

    Nemirovskii, A.S. and Yudin, D.B., Slozhnost’ zadach i effektivnost’ metodov optimizatsii , Moscow: Nauka, 1979. Translated under the title Problem Complexity and Method Efficiency in Optimization , Chichester: Wiley, 1983

  31. [31]

    and Mendelson, S., Sub-Gaussian Estimators of the Mean of a Random Vector, Ann

    Lugosi, G. and Mendelson, S., Sub-Gaussian Estimators of the Mean of a Random Vector, Ann. Stat., 2019, vol. 47, no. 2, pp. 783–794

  32. [32]

    IHP: Probab

    Catoni, O., Challenging the Empirical Mean and Empiric al Variance: a Deviation Study, Ann. IHP: Probab. Stat. , 2012, vol. 48, no. 4, pp. 1148–1185

  33. [33]

    and Catoni, O., Robust Linear Least Squ ares Regression, Ann

    Audibert, J.-Y. and Catoni, O., Robust Linear Least Squ ares Regression, Ann. Stat. , 2011, vol. 39, no. 5, pp. 2766–2794

  34. [34]

    Minsker, S., Geometric Median and Robust Estimation in Banach Spaces, Bernoulli, 2015, vol. 21, no. 4, pp. 2308–2335

  35. [35]

    and Minsker, S., Estimation of the Covariance St ructure of Heavy-Tailed Distributions, In Advances in Neural Information Processing Systems , pp

    Wei, X. and Minsker, S., Estimation of the Covariance St ructure of Heavy-Tailed Distributions, In Advances in Neural Information Processing Systems , pp. 2859–2868, 2017. 21

  36. [36]

    Chen, Y., Su, L., and Xu, J., Distributed Statistical Ma chine Learning in Adversarial Settings: Byzantine Gradient Descent, Proceedings of the ACM on Measurement and Analysis of Computin g Systems, vol. 1, iss. 2, article no. 44, 2017

  37. [37]

    Yin, D., Chen, Y., Ramchandran, K., and Bartlett, P., By zantine-Robust Distributed Learning: Towards Optimal Statistical Rates, 2018, arXiv preprint arXiv:1803.01498

  38. [38]

    COMPSTAT’2010, pp

    Cardot, H., C´ enac, P., and Chaouch, M., Stochastic App roximation for Multivariate and Func- tional Median, In Proc. COMPSTAT’2010, pp. 421–428. Springer, 2010

  39. [39]

    Stat., 2017, vol

    Cardot, H., C´ enac, P., and Godichon-Baggioni, A., Onl ine Estimation of the Geometric Median in Hilbert Spaces: Nonasymptotic Confidence Balls, Ann. Stat., 2017, vol. 45, no. 2, pp. 591–614

  40. [40]

    Program., 2012, vol

    Lan, G., An Optimal Method for Stochastic Composite Opt imization, Math. Program., 2012, vol. 133, nos. 1-2, pp. 365–397

  41. [41]

    Program., 2018, pp

    Necoara, I., Nesterov, Y., and Glineur, F., Linear Conv ergence of First Order Methods for Non-Strongly Convex Optimization, Math. Program., 2018, pp. 1–39

  42. [42]

    and Nemirovski, A., First Order Methods fo r Nonsmooth Convex Large-Scale Op- timization, I: General Purpose Methods, In: S

    Juditsky, A. and Nemirovski, A., First Order Methods fo r Nonsmooth Convex Large-Scale Op- timization, I: General Purpose Methods, In: S. Sra, S. Nowoz in, and S. J. Wright (eds.), Opti- mization for Machine Learning , pp. 121–148. MIT Press, 2011

  43. [43]

    Probab., 1975, vol

    Freedman, D.A., On Tail Probabilities for Martingales , Ann. Probab., 1975, vol. 3, no. 1, pp. 100–118

  44. [44]

    and Teboulle, M., Convergence Analysis of a Pro ximal-Like Minimization Algorithm Using Bregman Functions, SIAM J

    Chen, G. and Teboulle, M., Convergence Analysis of a Pro ximal-Like Minimization Algorithm Using Bregman Functions, SIAM J. Optim. , 1993, vol. 3, no. 3, pp. 538–543. 22