pith. sign in

No bad local minima: Data independent training error guarantees for multilayer neural networks

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it
abstract

We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is zero at every differentiable local minimum, for almost every dataset and dropout-like noise realization. We then extend these results to the case of more than one hidden layer. Our theoretical guarantees assume essentially nothing on the training data, and are verified numerically. These results suggest why the highly non-convex loss of such MNNs can be easily optimized using local updates (e.g., stochastic gradient descent), as observed empirically.

fields

cs.LG 5

representative citing papers

Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks

cs.LG · 2019-06-30 · unverdicted · novelty 6.0

Presents an active-sampling method that approximates the weight subspace from Hessian finite differences, recovers the rank-1 tensors by robust nonlinear programming, and attributes layers with gradient descent, yielding stable recovery under a-posteriori verifiable conditions.

citing papers explorer

Showing 5 of 5 citing papers.