SWALP : Stochastic Weight Averaging in Low-Precision Training

Andrew Gordon Wilson; Christopher De Sa; Guandao Yang; Junwen Bai; Polina Kirichenko; Tianyi Zhang

arxiv: 1904.11943 · v2 · pith:4NTHSFGUnew · submitted 2019-04-26 · 💻 cs.LG · cs.AI· stat.ML

SWALP : Stochastic Weight Averaging in Low-Precision Training

Guandao Yang , Tianyi Zhang , Polina Kirichenko , Junwen Bai , Andrew Gordon Wilson , Christopher De Sa This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords swalpprecisionlow-precisiontrainingaccumulatorsadditionallyapproacharbitrarily

0 comments

read the original abstract

Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.

This paper has not been read by Pith yet.

SWALP : Stochastic Weight Averaging in Low-Precision Training

discussion (0)