Batch gradient descent achieves linear convergence to zero MSE with high probability for sufficiently wide shallow NNs with non-affine piecewise affine activations and distinct inputs.
Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Lang evin algorithms
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2021 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with piecewise affine activation
Batch gradient descent achieves linear convergence to zero MSE with high probability for sufficiently wide shallow NNs with non-affine piecewise affine activations and distinct inputs.