For orthogonal inputs, gradient flow on shallow ReLU nets with MSE loss at small init converges to zero loss, exhibits min-variation-norm bias, initial alignment, and saddle-to-saddle dynamics.
Gradient descent on two-layer nets: Margin maximization and simplicity bias
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2022 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
For orthogonal inputs, gradient flow on shallow ReLU nets with MSE loss at small init converges to zero loss, exhibits min-variation-norm bias, initial alignment, and saddle-to-saddle dynamics.