Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

· 2018 · cs.LG · arXiv 1809.08587

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We study the dynamics of gradient descent on objective functions of the form $f(\prod_{i=1}^{k} w_i)$ (with respect to scalar parameters $w_1,\ldots,w_k$), which arise in the context of training depth-$k$ linear neural networks. We prove that for standard random initializations, and under mild assumptions on $f$, the number of iterations required for convergence scales exponentially with the depth $k$. We also show empirically that this phenomenon can occur in higher dimensions, where each $w_i$ is a matrix. This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where $k$ is large.

representative citing papers

Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization

cs.LG · 2019-07-24 · unverdicted · novelty 4.0

Provides Hessian-based theoretical characterizations of SGD dynamics and a scale-invariant generalization bound for deep nets, backed by experiments on synthetic data, MNIST, and CIFAR-10.

citing papers explorer

Showing 1 of 1 citing paper.

Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization cs.LG · 2019-07-24 · unverdicted · none · ref 67 · internal anchor
Provides Hessian-based theoretical characterizations of SGD dynamics and a scale-invariant generalization bound for deep nets, backed by experiments on synthetic data, MNIST, and CIFAR-10.

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

fields

years

verdicts

representative citing papers

citing papers explorer