Edge of stochastic stability: Revisiting the edge of stability for sgd

Arseniy Andreyev, Pierfrancesco Beneventano · 2024 · arXiv 2412.20553

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

extension 1

citation-polarity summary

extend 1

representative citing papers

Zeroth-Order Optimization at the Edge of Stability

cs.LG · 2026-04-16 · unverdicted · novelty 7.0

Zeroth-order methods achieve mean-square stability when the step size satisfies a condition involving the entire Hessian spectrum, with full-batch ZO optimizers operating at the edge of stability and large steps regularizing the Hessian trace.

Momentum Further Constrains Sharpness at the Edge of Stochastic Stability

cs.LG · 2026-04-15 · unverdicted · novelty 7.0

Momentum SGD exhibits two distinct EoSS regimes for batch sharpness, stabilizing at 2(1-β)/η for small batches and 2(1+β)/η for large batches, aligning with linear stability thresholds.

Large Spikes in Stochastic Gradient Descent: A Large-Deviations View

cs.LG · 2026-03-10 · unverdicted · novelty 7.0

Large loss spikes in SGD are polynomially likely and serve as the dominant mechanism for escaping sharp minima toward flatter solutions in the NTK regime.

Does Weight Decay Enhance Training Stability?

cs.LG · 2026-05-15 · conditional · novelty 6.0

Weight decay slows progressive sharpening at the edge of stability, inducing damped oscillations in CNNs and a phase transition to sub-2/η sharpness in MLPs driven by parameter-sharpness gradient alignment, yielding more stable NTK dynamics.

Generalization at the Edge of Stability

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

Training at the edge of stability causes neural network optimizers to converge on fractal attractors whose effective dimension, measured via a new sharpness dimension from the Hessian spectrum, bounds generalization error in a way not captured by prior trace or norm measures.

citing papers explorer

Showing 5 of 5 citing papers.

Zeroth-Order Optimization at the Edge of Stability cs.LG · 2026-04-16 · unverdicted · none · ref 1
Zeroth-order methods achieve mean-square stability when the step size satisfies a condition involving the entire Hessian spectrum, with full-batch ZO optimizers operating at the edge of stability and large steps regularizing the Hessian trace.
Momentum Further Constrains Sharpness at the Edge of Stochastic Stability cs.LG · 2026-04-15 · unverdicted · none · ref 4
Momentum SGD exhibits two distinct EoSS regimes for batch sharpness, stabilizing at 2(1-β)/η for small batches and 2(1+β)/η for large batches, aligning with linear stability thresholds.
Large Spikes in Stochastic Gradient Descent: A Large-Deviations View cs.LG · 2026-03-10 · unverdicted · none · ref 2
Large loss spikes in SGD are polynomially likely and serve as the dominant mechanism for escaping sharp minima toward flatter solutions in the NTK regime.
Does Weight Decay Enhance Training Stability? cs.LG · 2026-05-15 · conditional · none · ref 20
Weight decay slows progressive sharpening at the edge of stability, inducing damped oscillations in CNNs and a phase transition to sub-2/η sharpness in MLPs driven by parameter-sharpness gradient alignment, yielding more stable NTK dynamics.
Generalization at the Edge of Stability cs.LG · 2026-04-21 · unverdicted · none · ref 5
Training at the edge of stability causes neural network optimizers to converge on fractal attractors whose effective dimension, measured via a new sharpness dimension from the Hessian spectrum, bounds generalization error in a way not captured by prior trace or norm measures.

Edge of stochastic stability: Revisiting the edge of stability for sgd

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer