Discrete gradient descent breaks L-1 conservation laws in ReLU networks with drift eta^alpha, decomposed exactly as eta^2 times a spectral sum S(eta) whose mode coefficients are proportional to initial error squared times Hessian eigenvalues.
A convergence theory for deep learning via over-parameterization
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Conservation Law Breaking at the Edge of Stability: A Spectral Theory of Non-Convex Neural Network Optimization
Discrete gradient descent breaks L-1 conservation laws in ReLU networks with drift eta^alpha, decomposed exactly as eta^2 times a spectral sum S(eta) whose mode coefficients are proportional to initial error squared times Hessian eigenvalues.