Understanding gradient descent on the edge of stability in deep learning

Sanjeev Arora, Zhiyuan Li, Abhishek Panigrahi · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Large Spikes in Stochastic Gradient Descent: A Large-Deviations View

cs.LG · 2026-03-10 · unverdicted · novelty 7.0

Large loss spikes in SGD are polynomially likely and serve as the dominant mechanism for escaping sharp minima toward flatter solutions in the NTK regime.

Layer-wise Geometric Approximation Rates for Deep Networks

cs.LG · 2026-04-22

citing papers explorer

Showing 2 of 2 citing papers after filters.

Large Spikes in Stochastic Gradient Descent: A Large-Deviations View cs.LG · 2026-03-10 · unverdicted · none · ref 3
Large loss spikes in SGD are polynomially likely and serve as the dominant mechanism for escaping sharp minima toward flatter solutions in the NTK regime.
Layer-wise Geometric Approximation Rates for Deep Networks cs.LG · 2026-04-22 · unreviewed · ref 1

Understanding gradient descent on the edge of stability in deep learning

fields

years

verdicts

representative citing papers

citing papers explorer