DNNs approximate sequences of functions constructed via finite compositions of locally Lipschitz continuous functions, maxima, and products with polynomial parameter growth in d and 1/ε.
Overall error analysis for the training of deep neural netwo rks via stochastic gradient descent with random initialisation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2021 2verdicts
UNVERDICTED 2representative citing papers
Batch gradient descent achieves linear convergence to zero MSE with high probability for sufficiently wide shallow NNs with non-affine piecewise affine activations and distinct inputs.
citing papers explorer
-
Deep neural network approximation theory for high-dimensional functions
DNNs approximate sequences of functions constructed via finite compositions of locally Lipschitz continuous functions, maxima, and products with polynomial parameter growth in d and 1/ε.
-
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with piecewise affine activation
Batch gradient descent achieves linear convergence to zero MSE with high probability for sufficiently wide shallow NNs with non-affine piecewise affine activations and distinct inputs.