Compressing Deep Convolutional Networks using Vector Quantization

Yunchao Gong , Liu Liu , Ming Yang , Lubomir Bourdev

Authors on Pith no claims yet

classification 💻 cs.CV cs.LGcs.NE

keywords deepquantizationclassificationcompressingmethodsmodelstoragevector

read the original abstract

Deep convolutional neural networks (CNN) has become the most promising method for object recognition, repeatedly demonstrating record breaking results for image classification and object detection in recent years. However, a very deep CNN generally involves many layers with millions of parameters, making the storage of the network model to be extremely large. This prohibits the usage of deep CNNs on resource limited hardware, especially cell phones or other embedded devices. In this paper, we tackle this model storage issue by investigating information theoretical vector quantization methods for compressing the parameters of CNNs. In particular, we have found in terms of compressing the most storage demanding dense connected layers, vector quantization methods have a clear gain over existing matrix factorization methods. Simply applying k-means clustering to the weights or conducting product quantization can lead to a very good balance between model size and recognition accuracy. For the 1000-category classification task in the ImageNet challenge, we are able to achieve 16-24 times compression of the network with only 1% loss of classification accuracy using the state-of-the-art CNN.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
cs.CV 2015-10 conditional novelty 7.0

A pruning-quantization-Huffman pipeline compresses deep neural networks 35-49x without accuracy loss.
DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression
cs.LG 2026-05 unverdicted novelty 6.0

DiBA factors weight matrices into diagonal-binary-diagonal-binary-diagonal form to cut matrix-vector multiplies from mn to m+k+n operations and improves accuracy on DistilBERT and audio transformer tasks after replacement.
Energy-Efficient Plant Monitoring via Knowledge Distillation
cs.CV 2026-04 unverdicted novelty 4.0

Knowledge distillation allows smaller neural networks to match the accuracy of much larger models on plant species and disease recognition while using substantially less computation.