Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

Ohad Shamir

arxiv: 1809.08587 · v4 · pith:MBNTRUP4new · submitted 2018-09-23 · 💻 cs.LG · cs.NE· math.OC· stat.ML

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

Ohad Shamir This is my paper

classification 💻 cs.LG cs.NEmath.OCstat.ML

keywords convergencelinearnetworksneuraldeepdescentgradientarise

0 comments

read the original abstract

We study the dynamics of gradient descent on objective functions of the form $f(\prod_{i=1}^{k} w_i)$ (with respect to scalar parameters $w_1,\ldots,w_k$), which arise in the context of training depth-$k$ linear neural networks. We prove that for standard random initializations, and under mild assumptions on $f$, the number of iterations required for convergence scales exponentially with the depth $k$. We also show empirically that this phenomenon can occur in higher dimensions, where each $w_i$ is a matrix. This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where $k$ is large.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
cs.LG 2019-07 unverdicted novelty 4.0

Provides Hessian-based theoretical characterizations of SGD dynamics and a scale-invariant generalization bound for deep nets, backed by experiments on synthetic data, MNIST, and CIFAR-10.