The True Cost of Stochastic Gradient Langevin Dynamics

Andrew B. Duncan; Konstantinos Zygalakis; Leonard Hasenclever; Lukasz Szpruch; Sebastian J. Vollmer; Tigran Nagapetyan

arxiv: 1706.02692 · v1 · pith:CBUECF5Anew · submitted 2017-06-08 · 📊 stat.ME · math.NA

The True Cost of Stochastic Gradient Langevin Dynamics

Tigran Nagapetyan , Andrew B. Duncan , Leonard Hasenclever , Sebastian J. Vollmer , Lukasz Szpruch , Konstantinos Zygalakis This is my paper

classification 📊 stat.ME math.NA

keywords methodsstochasticcostdatadynamicsgradientlangevinaccuracy

0 comments

read the original abstract

The problem of posterior inference is central to Bayesian statistics and a wealth of Markov Chain Monte Carlo (MCMC) methods have been proposed to obtain asymptotically correct samples from the posterior. As datasets in applications grow larger and larger, scalability has emerged as a central problem for MCMC methods. Stochastic Gradient Langevin Dynamics (SGLD) and related stochastic gradient Markov Chain Monte Carlo methods offer scalability by using stochastic gradients in each step of the simulated dynamics. While these methods are asymptotically unbiased if the stepsizes are reduced in an appropriate fashion, in practice constant stepsizes are used. This introduces a bias that is often ignored. In this paper we study the mean squared error of Lipschitz functionals in strongly log- concave models with i.i.d. data of growing data set size and show that, given a batchsize, to control the bias of SGLD the stepsize has to be chosen so small that the computational cost of reaching a target accuracy is roughly the same for all batchsizes. Using a control variate approach, the cost can be reduced dramatically. The analysis is performed by considering the algorithms as noisy discretisations of the Langevin SDE which correspond to the Euler method if the full data set is used. An important observation is that the 1scale of the step size is determined by the stability criterion if the accuracy is required for consistent credible intervals. Experimental results confirm our theoretical findings.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Piecewise Deterministic Markov Processes for Bayesian Neural Networks
stat.ML 2023-02 unverdicted novelty 6.0

Introduces an adaptive thinning scheme to make PDMP-based MCMC feasible for Bayesian inference in neural networks by handling model-specific IPPs efficiently.