Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations

Liangzhen Lai; Naveen Suda; Vikas Chandra

arxiv: 1703.03073 · v1 · pith:33VHOD6Gnew · submitted 2017-03-08 · 💻 cs.LG · cs.CV

Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations

Liangzhen Lai , Naveen Suda , Vikas Chandra This is my paper

classification 💻 cs.LG cs.CV

keywords fixed-pointweightsactivationsfloating-pointnetworknumbersrepresentationacross

0 comments

read the original abstract

Deep convolutional neural network (CNN) inference requires significant amount of memory and computation, which limits its deployment on embedded devices. To alleviate these problems to some extent, prior research utilize low precision fixed-point numbers to represent the CNN weights and activations. However, the minimum required data precision of fixed-point weights varies across different networks and also across different layers of the same network. In this work, we propose using floating-point numbers for representing the weights and fixed-point numbers for representing the activations. We show that using floating-point representation for weights is more efficient than fixed-point representation for the same bit-width and demonstrate it on popular large-scale CNNs such as AlexNet, SqueezeNet, GoogLeNet and VGG-16. We also show that such a representation scheme enables compact hardware multiply-and-accumulate (MAC) unit design. Experimental results show that the proposed scheme reduces the weight storage by up to 36% and power consumption of the hardware multiplier by up to 50%.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT
stat.ML 2019-07 unverdicted novelty 6.0

NoNN partitions a teacher model into disjoint compressed students via network science for distributed IoT inference, matching teacher accuracy with far lower per-device memory and communication.