In the high-dimensional regime, SGD on diagonal linear networks is approximated by an SDE and a deterministic PDE that together give an explicit non-asymptotic description of convergence to zero risk.
Catapults in SGD: Spikes in the training loss and their impact on generalization through feature learning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
math.OC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
High-dimensional Limit of SGD for Diagonal Linear Networks
In the high-dimensional regime, SGD on diagonal linear networks is approximated by an SDE and a deterministic PDE that together give an explicit non-asymptotic description of convergence to zero risk.