Proves it is impossible to achieve optimal last-iterate rates for GD and SGD without knowing the horizon T in advance, incurring an unavoidable poly-log factor penalty even in the deterministic case.
Fast last-iterate convergenceofsgdinthesmoothinterpolationregime
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Classical momentum acceleration in mini-batch SGD for quadratics is proportional to batch size up to saturation, enabling perfect parallelization under minimal noise assumptions.
SGD with greedy step size on smooth quadratics in the interpolation regime attains O(1/t^{3/4}) last-iterate convergence.
citing papers explorer
-
Gradient Descent's Last Iterate is Often (slightly) Suboptimal
Proves it is impossible to achieve optimal last-iterate rates for GD and SGD without knowing the horizon T in advance, incurring an unavoidable poly-log factor penalty even in the deterministic case.
-
Perfect Parallelization in Mini-Batch SGD with Classical Momentum Acceleration
Classical momentum acceleration in mini-batch SGD for quadratics is proportional to batch size up to saturation, enabling perfect parallelization under minimal noise assumptions.
-
Last-Iterate Convergence of Randomized Kaczmarz and SGD with Greedy Step Size
SGD with greedy step size on smooth quadratics in the interpolation regime attains O(1/t^{3/4}) last-iterate convergence.