Understanding gradient descent on the edge of stability in deep learning

Sanjeev Arora, Zhiyuan Li, Abhishek Panigrahi · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Geometric Layer-wise Approximation Rates for Deep Networks

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

A shared mixed-activation network of width 2dN+d+2 yields layer-wise L^p approximation rates bounded by the modulus of continuity at geometric scale N^{-ℓ}, reducing to (2d+1)N^{-ℓ} for 1-Lipschitz targets.

Large Spikes in Stochastic Gradient Descent: A Large-Deviations View

cs.LG · 2026-03-10 · unverdicted · novelty 7.0

Large loss spikes in SGD are polynomially likely and serve as the dominant mechanism for escaping sharp minima toward flatter solutions in the NTK regime.

citing papers explorer

Showing 2 of 2 citing papers.

Geometric Layer-wise Approximation Rates for Deep Networks cs.LG · 2026-04-22 · unverdicted · none · ref 1
A shared mixed-activation network of width 2dN+d+2 yields layer-wise L^p approximation rates bounded by the modulus of continuity at geometric scale N^{-ℓ}, reducing to (2d+1)N^{-ℓ} for 1-Lipschitz targets.
Large Spikes in Stochastic Gradient Descent: A Large-Deviations View cs.LG · 2026-03-10 · unverdicted · none · ref 3
Large loss spikes in SGD are polynomially likely and serve as the dominant mechanism for escaping sharp minima toward flatter solutions in the NTK regime.

Understanding gradient descent on the edge of stability in deep learning

fields

years

verdicts

representative citing papers

citing papers explorer