DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

Boyu Li; Chaofan Tao; Fengbin Tu; Hayden Kwok-Hay So; Jiajun Wu; Jiajun Zhou; Kwang-Ting Cheng; Ngai Wong; Yizhao Gao; Yuhao Ding

arxiv: 2302.12510 · v1 · pith:4IV4MQ5Onew · submitted 2023-02-24 · 💻 cs.LG

DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

Jiajun Zhou , Jiajun Wu , Yizhao Gao , Yuhao Ding , Chaofan Tao , Boyu Li , Fengbin Tu , Kwang-Ting Cheng

show 2 more authors

Hayden Kwok-Hay So Ngai Wong

This is my paper

classification 💻 cs.LG

keywords dybitinferenceaccuracynumbersquantizationframeworklow-bitwidthneural

0 comments

read the original abstract

To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variable-length encoding called DyBit. DyBit can dynamically adjust the precision and range of separate bit-field to be adapted to the DNN weights/activations distribution. We also propose a hardware-aware quantization framework with a mixed-precision accelerator to trade-off the inference accuracy and speedup. Experimental results demonstrate that the inference accuracy via DyBit is 1.997% higher than the state-of-the-art at 4-bit quantization, and the proposed framework can achieve up to 8.1x speedup compared with the original model.

This paper has not been read by Pith yet.

DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

discussion (0)