Batch gradient descent achieves linear convergence to zero MSE with high probability for sufficiently wide shallow NNs with non-affine piecewise affine activations and distinct inputs.
Learning Overparameterized Neural Networks via Stochasti c Gradient Descent on Structured Data
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2021 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with piecewise affine activation
Batch gradient descent achieves linear convergence to zero MSE with high probability for sufficiently wide shallow NNs with non-affine piecewise affine activations and distinct inputs.