pith. machine review for the scientific record. sign in

arxiv: 1512.07962 · v3 · pith:X62TVW7Cnew · submitted 2015-12-25 · 📊 stat.ML · cs.LG

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

classification 📊 stat.ML cs.LG
keywords stochasticoptimizationmethodsadaptivemomentumannealingelement-wisegradient
0
0 comments X
read the original abstract

Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian analogs to popular stochastic optimization methods; however, this connection is not well studied. We explore this relationship by applying simulated annealing to an SGMCMC algorithm. Furthermore, we extend recent SG-MCMC methods with two key components: i) adaptive preconditioners (as in ADAgrad or RMSprop), and ii) adaptive element-wise momentum weights. The zero-temperature limit gives a novel stochastic optimization method with adaptive element-wise momentum weights, while conventional optimization methods only have a shared, static momentum weight. Under certain assumptions, our theoretical analysis suggests the proposed simulated annealing approach converges close to the global optima. Experiments on several deep neural network models show state-of-the-art results compared to related stochastic optimization algorithms.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.