Post-training 4-bit quantization of convolution networks for rapid-deployment

Ron Banner , Yury Nahshan , Elad Hoffer , Daniel Soudry

Authors on Pith no claims yet

classification 💻 cs.CV

keywords quantizationaccuracyapproachconvolutionalfullgithubintermediatemethods

read the original abstract

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of intermediate results, but it often requires the full datasets and time-consuming fine tuning to recover the accuracy lost after quantization. This paper introduces the first practical 4-bit post training quantization approach: it does not involve training the quantized model (fine-tuning), nor it requires the availability of the full dataset. We target the quantization of both activations and weights and suggest three complementary methods for minimizing quantization error at the tensor level, two of whom obtain a closed-form analytical solution. Combining these methods, our approach achieves accuracy that is just a few percents less the state-of-the-art baseline across a wide range of convolutional models. The source code to replicate all experiments is available on GitHub: \url{https://github.com/submission2019/cnn-quantization}.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale
cs.LG 2026-05 unverdicted novelty 6.0

LoKA enables practical FP8 use in numerically sensitive large recommendation models via profiling, model adaptations, and runtime kernel orchestration.
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale
cs.LG 2026-05 unverdicted novelty 6.0

LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.
Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI
cs.CV 2026-04 unverdicted novelty 4.0

Deployment-aligned low-precision NAS recovers about two-thirds of the accuracy drop from post-training quantization, achieving 0.826 mIoU on-device for a 95k-parameter model on Intel Movidius Myriad X without added co...