Proves it is impossible to achieve optimal last-iterate rates for GD and SGD without knowing the horizon T in advance, incurring an unavoidable poly-log factor penalty even in the deterministic case.
Large-scale machine learning with stochastic gradient descent
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
math.OC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Gradient Descent's Last Iterate is Often (slightly) Suboptimal
Proves it is impossible to achieve optimal last-iterate rates for GD and SGD without knowing the horizon T in advance, incurring an unavoidable poly-log factor penalty even in the deterministic case.