pith. sign in

arxiv: 2606.24879 · v1 · pith:5R5VCZSCnew · submitted 2026-06-23 · 🧮 math.OC · cs.LG

New Bounds for the Last Iterate of the Stochastic subGradient Method

Pith reviewed 2026-06-25 22:41 UTC · model grok-4.3

classification 🧮 math.OC cs.LG
keywords stochastic subgradient methodlast iterate convergenceconvex optimizationstochastic optimizationone-dimensional optimizationoptimization error boundsi.i.d. noise
0
0 comments X

The pith

Under i.i.d. additive subgradient noise with bounded variance, the last iterate of stochastic subgradient descent reaches order 1/√n error in one dimension.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the final point generated by the stochastic subgradient method on one-dimensional convex Lipschitz problems. With the usual fixed stepsize schedule of order 1/√n, it shows that additive i.i.d. noise of uniformly bounded variance produces an optimization error of order 1/√n for the last iterate. This removes the extra logarithmic factor that appears in earlier generic analyses. The same setting without the i.i.d. requirement allows the error to reach order (log n)/√n, giving a negative answer to an open question on whether the last iterate can be optimal under only bounded-variance noise.

Core claim

For fixed horizon n and stepsize η = Θ(1/√n), when subgradient noise is additive, i.i.d., and has uniformly bounded variance, the optimization error of the last iterate is of order 1/√n for one-dimensional convex Lipschitz objectives. Dropping the i.i.d. assumption permits error of order (log n)/√n, showing that the last iterate is suboptimal under bounded variance alone.

What carries the argument

Last-iterate error analysis that exploits one-dimensional convexity together with the additive i.i.d. structure of the noise.

If this is right

  • The last iterate matches the known rate of the averaged iterate when the noise satisfies the i.i.d. condition.
  • Existing generic bounds that carry an extra log n factor are not tight under the i.i.d. assumption.
  • The log n factor reappears as soon as the noise process loses the i.i.d. property, even in dimension one.
  • The result separates the effect of noise correlation from the effect of dimension.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Analyses of other first-order stochastic methods may improve when they explicitly use i.i.d. structure rather than only variance bounds.
  • The distinction between i.i.d. and merely bounded-variance noise could matter for last-iterate guarantees in non-convex or higher-dimensional settings.
  • Practical implementations that average iterates might be replaceable by the raw last iterate if the noise is known to be i.i.d.

Load-bearing premise

The subgradient noise must be additive, independent across steps, identically distributed, and have uniformly bounded variance.

What would settle it

An explicit one-dimensional convex Lipschitz function and a sequence of i.i.d. bounded-variance noises for which the last iterate error exceeds order 1/√n would falsify the positive bound.

read the original abstract

We study the last iterate of the stochastic subgradient method for one-dimensional convex Lipschitz objectives. For a fixed horizon $n$, we consider the standard fixed stepsizes $\eta =\Theta(1/\sqrt n)$. We prove that, for such stepsize policies, under additive i.i.d. subgradient noise with uniformly bounded variance, the last iterate features an optimization error of order $1/\sqrt n$, thereby removing the extra $(\log n)$ factor present in existing generic bounds. On the other hand, we show that without the i.i.d. assumption, the optimization error can be of order $(\log n)/\sqrt n$. Thus, under the uniformly bounded variance assumption alone, the last iterate of SsGM is suboptimal even in dimension one, resolving negatively an open problem posed in Koren and Segal, COLT, 2020.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper studies the last iterate of the stochastic subgradient method for one-dimensional convex Lipschitz objectives. With fixed stepsizes η=Θ(1/√n), it proves that additive i.i.d. subgradient noise with uniformly bounded variance yields last-iterate optimization error of order 1/√n (removing the extra log n factor from generic bounds). Without the i.i.d. assumption the error can reach (log n)/√n, negatively resolving an open problem from Koren and Segal (COLT 2020).

Significance. If the derivations hold, the result is significant for clarifying the precise role of the i.i.d. assumption in last-iterate rates for stochastic subgradient methods, even in dimension one. It supplies both a positive rate under the stronger noise model and an explicit counter-example construction under uniform bounded variance alone, directly addressing a stated open question.

minor comments (2)
  1. The abstract states the positive and negative results cleanly but does not indicate the theorem numbers or sections where the i.i.d. positive bound and the non-i.i.d. construction appear; adding these cross-references would aid readers.
  2. Notation for the noise process (additive i.i.d. with bounded variance) is introduced in the abstract; a short dedicated paragraph in §2 or §3 defining the precise probability space and moment assumptions would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, accurate summary of our results, and recommendation of minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation proceeds from the standard subgradient recursion under the paper's explicit assumptions (additive i.i.d. noise with uniformly bounded variance for the positive O(1/√n) result, and the same variance bound without i.i.d. for the negative (log n)/√n counter-example). No step reduces the claimed rate to a quantity defined by the same paper, no fitted parameter is relabeled as a prediction, and no load-bearing premise rests on a self-citation chain. The separation of results follows directly from the differing noise conditions applied to the same recursion, making the analysis self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Claims rest on standard domain assumptions of one-dimensional convex Lipschitz functions and additive noise; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Objective is convex and Lipschitz continuous in one dimension
    Invoked to guarantee existence of subgradients and to control the recursion.
  • domain assumption Subgradient noise is additive with uniformly bounded variance
    Central modeling choice that enables the 1/sqrt n rate; contrasted with non-i.i.d. case.

pith-pipeline@v0.9.1-grok · 5683 in / 1348 out tokens · 23379 ms · 2026-06-25T22:41:18.059016+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 2 linked inside Pith

  1. [1]

    Bartlett, Pradeep Ravikumar, and Martin J

    Alekh Agarwal, Peter L. Bartlett, Pradeep Ravikumar, and Martin J. Wainwright. Information- theoretic lower bounds on the oracle complexity of stochastic convex optimization.IEEE Transactions on Information Theory, 58(5):3235–3249, 2012

  2. [2]

    MIT press, 2024

    Francis Bach.Learning theory from first principles. MIT press, 2024

  3. [3]

    General tail bounds for non-smooth stochastic mirror descent

    Khaled Eldowa and Andrea Paudice. General tail bounds for non-smooth stochastic mirror descent. InProceedings of the 27-th International Conference on Artificial Intelligence and Statistics, pages 3205–3213, 2024

  4. [4]

    Yu. M. Ermol’ev. On the method of generalized stochastic gradients and quasi-Féjer sequences. Cybernetics, 5:208–220, 1969

  5. [5]

    Nicholas J. A. Harvey, Chris Liaw, and Sikander Randhawa. Tight analyses for subgradient descent I: lower bounds.Open J. Math. Optim., 5:1–17, 2024

  6. [6]

    Harvey, Christopher Liaw, Yaniv Plan, and Sikander Randhawa

    Nicholas J.A. Harvey, Christopher Liaw, Yaniv Plan, and Sikander Randhawa. Tight analyses for non-smooth stochastic gradient descent. InProceedinds of the 32nd International Conference on Computational Learning Theory, pages 1579–1613, 2019. 20

  7. [7]

    Nagaraj, and Praneeth Netrapalli

    Prateek Jain, Dheeraj M. Nagaraj, and Praneeth Netrapalli. Making the last iterate of sgd information theoretically optimal.SIAM Journal on Optimization, 31(2):1108–1130, 2021

  8. [8]

    Springer, 3 edition, 2021

    Olav Kallenberg.Foundations of Modern Probability, volume 99 ofProbability Theory and Stochastic Modelling. Springer, 3 edition, 2021

  9. [9]

    Open problem: Tight convergence of SGD in constant dimension

    Tomer Koren and Shahar Segal. Open problem: Tight convergence of SGD in constant dimension. InCOLT 2020, volume 125 ofProceedings of Machine Learning Research, pages 3847–3851, 2020

  10. [10]

    Gradient descent’s last iterate is often (slightly) suboptimal

    Guy Kornowski and Ohad Shamir. Gradient descent’s last iterate is often (slightly) suboptimal. Preprint at arXiv:2604.13870, April 2026

  11. [11]

    The convergence rate of SGD’s final iterate: Analysis on dimension dependence, 2021

    Daogao Liu and Zhou Lu. The convergence rate of SGD’s final iterate: Analysis on dimension dependence, 2021. Preprint arXiv:2106.14588

  12. [12]

    Stochastic nonsmooth convex optimization with heavy-tailed noises: High-probability bound, in-expectation rate and initial distance adaptation.arXiv preprint arXiv:2303.12277, 2023

    Zijian Liu and Zhengyuan Zhou. Stochastic nonsmooth convex optimization with heavy-tailed noises: High-probability bound, in-expectation rate and initial distance adaptation.arXiv preprint arXiv:2303.12277, 2023

  13. [13]

    Revisiting the last-iterate convergence of stochastic gradient methods

    Zijian Liu and Zhengyuan Zhou. Revisiting the last-iterate convergence of stochastic gradient methods. InProceedings of the 12-th International Conference on Learning Representations (to appear), 2024

  14. [14]

    Meyn and Richard L

    Sean P. Meyn and Richard L. Tweedie.Markov Chains and Stochastic Stability. Cambridge University Press, 2 edition, 2009

  15. [15]

    Nemirovski, A

    A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming.SIAM Journal on Optimization, 19(4):1574–1609, 2009

  16. [16]

    Wiley-Interscience, 1983

    Arkadij Semenovič Nemirovskij and David Borisovich Yudin.Problem complexity and method efficiency in optimization. Wiley-Interscience, 1983

  17. [17]

    An improved analysis of the clipped stochastic subgradient method under heavy-tailed noise, 2025

    Daniela Angela Parletta, Andrea Paudice, and Saverio Salzo. An improved analysis of the clipped stochastic subgradient method under heavy-tailed noise, 2025

  18. [18]

    Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes

    Ohad Shamir and Tong Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. InProceedings of the 30th International Conference on Machine Learning, 2013

  19. [19]

    Serdar Yüksel and Sean P. Meyn. Random-time, state-dependent stochastic drift for Markov chains and application to stochastic stabilization over erasure channels, 2012. Preprint at arXiv:1010.4820

  20. [20]

    Exact convergence rate of the last iterate in subgradient methods.SIAM Journal on Optimization, 35(3):2182–2201, 2025

    Moslem Zamani and François Glineur. Exact convergence rate of the last iterate in subgradient methods.SIAM Journal on Optimization, 35(3):2182–2201, 2025. 21 A Proofs of the Deferred Claims A.1 Proof of Claim 7 Proof. We first prove thatP satisfies the Feller property. Recall that a transition probability kernel Pon the compact metric spaceIis Feller if φ...

  21. [21]

    If α > a, then everyx∈ [α, α+ ρ) ∩ X lies strictly to the left ofS, and every relative subgradient g∈∂ X f(x)satisfiesg≤0

  22. [22]

    If β < b , then every x∈ (β−ρ, β ] ∩ X lies strictly to the right of S, and every relative subgradientg∈∂ X f(x)satisfiesg≥0. Proof. Since S is a non-empty closed convex subset of the intervalX, it is a closed interval inX. Write u= infS, v= supS. If α > a , then the left endpoint of the enlarged set is not created by the boundary ofX, hence α = u−ρ . The...