Batch gradient descent achieves linear convergence to zero MSE with high probability for sufficiently wide shallow NNs with non-affine piecewise affine activations and distinct inputs.
A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gr adient descent dynamics
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2021 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with piecewise affine activation
Batch gradient descent achieves linear convergence to zero MSE with high probability for sufficiently wide shallow NNs with non-affine piecewise affine activations and distinct inputs.