pith. sign in

arxiv: 1603.04379 · v2 · pith:KKQ6M4HYnew · submitted 2016-03-14 · 🧮 math.OC

On Data Dependence in Distributed Stochastic Optimization

classification 🧮 math.OC
keywords convergencedataspectraldistributedamountdata-dependentmatrixnorm
0
0 comments X
read the original abstract

We study a distributed consensus-based stochastic gradient descent (SGD) algorithm and show that the rate of convergence involves the spectral properties of two matrices: the standard spectral gap of a weight matrix from the network topology and a new term depending on the spectral norm of the sample covariance matrix of the data. This data-dependent convergence rate shows that distributed SGD algorithms perform better on datasets with small spectral norm. Our analysis method also allows us to find data-dependent convergence rates as we limit the amount of communication. Spreading a fixed amount of data across more nodes slows convergence; for asymptotically growing data sets we show that adding more machines can help when minimizing twice-differentiable losses.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.