pith. machine review for the scientific record. sign in

arxiv: 1611.10176 · v1 · submitted 2016-11-30 · 💻 cs.LG · cs.CV

Recognition: unknown

Effective Quantization Methods for Recurrent Neural Networks

Authors on Pith no claims yet
classification 💻 cs.LG cs.CV
keywords methodsquantizationweightsactivationsdegradationneuralperformanceprevious
0
0 comments X
read the original abstract

Reducing bit-widths of weights, activations, and gradients of a Neural Network can shrink its storage size and memory usage, and also allow for faster training and inference by exploiting bitwise operations. However, previous attempts for quantization of RNNs show considerable performance degradation when using low bit-width weights and activations. In this paper, we propose methods to quantize the structure of gates and interlinks in LSTM and GRU cells. In addition, we propose balanced quantization methods for weights to further reduce performance degradation. Experiments on PTB and IMDB datasets confirm effectiveness of our methods as performances of our models match or surpass the previous state-of-the-art of quantized RNN.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Mixed Precision Training

    cs.AI 2017-10 accept novelty 7.0

    Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.