ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

Denis A. Gudovskiy; Luca Rigazio

arxiv: 1706.02393 · v1 · pith:7FC4J5LRnew · submitted 2017-06-07 · 💻 cs.CV · cs.NE

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

Denis A. Gudovskiy , Luca Rigazio This is my paper

classification 💻 cs.CV cs.NE

keywords shiftcnnarchitectureconvolutionalinferencecnnsfpgasgeneralizedlayers

0 comments

read the original abstract

In this paper we introduce ShiftCNN, a generalized low-precision architecture for inference of multiplierless convolutional neural networks (CNNs). ShiftCNN is based on a power-of-two weight representation and, as a result, performs only shift and addition operations. Furthermore, ShiftCNN substantially reduces computational cost of convolutional layers by precomputing convolution terms. Such an optimization can be applied to any CNN architecture with a relatively small codebook of weights and allows to decrease the number of product operations by at least two orders of magnitude. The proposed architecture targets custom inference accelerators and can be realized on FPGAs or ASICs. Extensive evaluation on ImageNet shows that the state-of-the-art CNNs can be converted without retraining into ShiftCNN with less than 1% drop in accuracy when the proposed quantization algorithm is employed. RTL simulations, targeting modern FPGAs, show that power consumption of convolutional layers is reduced by a factor of 4 compared to conventional 8-bit fixed-point architectures.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PoTAcc: A Pipeline for End-to-End Acceleration of Power-of-Two Quantized DNNs
cs.AR 2026-05 unverdicted novelty 5.0

PoTAcc delivers an end-to-end pipeline and three shift-PE FPGA accelerators for PoT-quantized DNNs that deliver up to 3.6x speedup and 78% energy reduction versus CPU-only runs on PYNQ-Z2 and Kria boards.