pith. machine review for the scientific record. sign in

arxiv: 1901.01672 · v2 · submitted 2019-01-07 · 💻 cs.LG · cs.AI· stat.ML

Recognition: unknown

Generalization in Deep Networks: The Role of Distance from Initialization

Authors on Pith no claims yet
classification 💻 cs.LG cs.AIstat.ML
keywords initializationcapacitydeepdistancegeneralizationmodelnetworksnetwork
0
0 comments X
read the original abstract

Why does training deep neural networks using stochastic gradient descent (SGD) result in a generalization error that does not worsen with the number of parameters in the network? To answer this question, we advocate a notion of effective model capacity that is dependent on {\em a given random initialization of the network} and not just the training algorithm and the data distribution. We provide empirical evidences that demonstrate that the model capacity of SGD-trained deep networks is in fact restricted through implicit regularization of {\em the $\ell_2$ distance from the initialization}. We also provide theoretical arguments that further highlight the need for initialization-dependent notions of model capacity. We leave as open questions how and why distance from initialization is regularized, and whether it is sufficient to explain generalization.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Towards Initialization-dependent and Non-vacuous Generalization Bounds for Overparameterized Shallow Neural Networks

    cs.LG 2026-04 unverdicted novelty 6.0

    Path-norm initialization-dependent bounds with a new peeling technique give non-vacuous generalization guarantees for overparameterized shallow networks with Lipschitz activations.