Biased Importance Sampling for Deep Neural Network Training

Angelos Katharopoulos; Fran\c{c}ois Fleuret

arxiv: 1706.00043 · v2 · pith:5JG5WJ4Jnew · submitted 2017-05-31 · 💻 cs.LG

Biased Importance Sampling for Deep Neural Network Training

Angelos Katharopoulos , Fran\c{c}ois Fleuret This is my paper

classification 💻 cs.LG

keywords deepimportancemethodsamplingwhenbiasedestimategradient

0 comments

read the original abstract

Importance sampling has been successfully used to accelerate stochastic optimization in many convex problems. However, the lack of an efficient way to calculate the importance still hinders its application to Deep Learning. In this paper, we show that the loss value can be used as an alternative importance metric, and propose a way to efficiently approximate it for a deep model, using a small model trained for that purpose in parallel. This method allows in particular to utilize a biased gradient estimate that implicitly optimizes a soft max-loss, and leads to better generalization performance. While such method suffers from a prohibitively high variance of the gradient estimate when using a standard stochastic optimizer, we show that when it is combined with our sampling mechanism, it results in a reliable procedure. We showcase the generality of our method by testing it on both image classification and language modeling tasks using deep convolutional and recurrent neural networks. In particular, our method results in 30% faster training of a CNN for CIFAR10 than when using uniform sampling.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Submodular Batch Selection for Training Deep Neural Networks
cs.LG 2019-06 unverdicted novelty 5.0

A greedy submodular maximization method for mini-batch selection in DNN training yields better generalization than SGD on standard datasets.