Catapults in SGD: Spikes in the training loss and their impact on generalization through feature learning

Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

High-dimensional Limit of SGD for Diagonal Linear Networks

math.OC · 2026-05-16 · unverdicted · novelty 6.0

In the high-dimensional regime, SGD on diagonal linear networks is approximated by an SDE and a deterministic PDE that together give an explicit non-asymptotic description of convergence to zero risk.

citing papers explorer

Showing 1 of 1 citing paper.

High-dimensional Limit of SGD for Diagonal Linear Networks math.OC · 2026-05-16 · unverdicted · none · ref 60
In the high-dimensional regime, SGD on diagonal linear networks is approximated by an SDE and a deterministic PDE that together give an explicit non-asymptotic description of convergence to zero risk.

Catapults in SGD: Spikes in the training loss and their impact on generalization through feature learning

fields

years

verdicts

representative citing papers

citing papers explorer