Big Batch SGD: Automated Inference using Adaptive Batch Sizes

Abhay Yadav; David Jacobs; Soham De; Tom Goldstein

arxiv: 1610.05792 · v4 · pith:JKWIXINXnew · submitted 2016-10-18 · 💻 cs.LG · cs.NA· math.NA· math.OC· stat.ML

Big Batch SGD: Automated Inference using Adaptive Batch Sizes

Soham De , Abhay Yadav , David Jacobs , Tom Goldstein This is my paper

classification 💻 cs.LG cs.NAmath.NAmath.OCstat.ML

keywords batchautomatedgradientmethodsadaptiveclassicalgradientsrequire

0 comments

read the original abstract

Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it difficult to use them for adaptive stepsize selection and automatic stopping. We propose alternative "big batch" SGD schemes that adaptively grow the batch size over time to maintain a nearly constant signal-to-noise ratio in the gradient approximation. The resulting methods have similar convergence rates to classical SGD, and do not require convexity of the objective. The high fidelity gradients enable automated learning rate selection and do not require stepsize decay. Big batch methods are thus easily automated and can run with little or no oversight.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Multi-Iteration Stochastic Optimizers
math.OC 2020-11 unverdicted novelty 7.0

MICE is a multi-iteration control variate estimator for stochastic gradients that exploits correlations between iterates to achieve O(tol^{-1}) complexity in smooth strongly convex problems, outperforming adaptive batch SGD.
Language Models (Mostly) Know What They Know
cs.CL 2022-07 unverdicted novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment
cs.CL 2021-12 conditional novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
Scaling Laws for Transfer
cs.LG 2021-02 unverdicted novelty 6.0

Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.