GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training

Hailin Jin; Jianchao Yang; Thomas Huang; Thomas Paine; Zhe Lin

arxiv: 1312.6186 · v1 · pith:3ES5P3I5new · submitted 2013-12-21 · 💻 cs.CV · cs.DC· cs.LG· cs.NE

GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training

Thomas Paine , Hailin Jin , Jianchao Yang , Zhe Lin , Thomas Huang This is my paper

classification 💻 cs.CV cs.DCcs.LGcs.NE

keywords a-sgdparallelismtrainingcomputernetworksneuralvisiondata

0 comments

read the original abstract

The ability to train large-scale neural networks has resulted in state-of-the-art performance in many areas of computer vision. These results have largely come from computational break throughs of two forms: model parallelism, e.g. GPU accelerated training, which has seen quick adoption in computer vision circles, and data parallelism, e.g. A-SGD, whose large scale has been used mostly in industry. We report early experiments with a system that makes use of both model parallelism and data parallelism, we call GPU A-SGD. We show using GPU A-SGD it is possible to speed up training of large convolutional neural networks useful for computer vision. We believe GPU A-SGD will make it possible to train larger networks on larger training sets in a reasonable amount of time.

This paper has not been read by Pith yet.

GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training

discussion (0)