SGD's stationary distribution is Boltzmann-Gibbs with temperature equal to step-size, concentrating exponentially on minimum-energy critical points.
Stochastic gradient flow dynamics of test risk and its exact solution for weak features.https://arxiv.org/abs/2402.07626
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
SGD on multiclass cross-entropy loss alternates between curvature-driven oscillations and stable regimes but self-stabilizes to enable best-iterate convergence with large learning rates for linear and two-layer models.
citing papers explorer
-
What is the long-run distribution of stochastic gradient descent? A large deviations analysis
SGD's stationary distribution is Boltzmann-Gibbs with temperature equal to step-size, concentrating exponentially on minimum-energy critical points.
-
SGD at the Edge of Stability: Stochastic Stabilization with Large Learning Rates
SGD on multiclass cross-entropy loss alternates between curvature-driven oscillations and stable regimes but self-stabilizes to enable best-iterate convergence with large learning rates for linear and two-layer models.