Weight Normalization based Quantization for Deep Neural Network Compression

Wen-Pu Cai; Wu-Jun Li

arxiv: 1907.00593 · v1 · pith:D36LPGMFnew · submitted 2019-07-01 · 💻 cs.LG · cs.CV· stat.ML

Weight Normalization based Quantization for Deep Neural Network Compression

Wen-Pu Cai , Wu-Jun Li This is my paper

Pith reviewed 2026-05-25 11:42 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords model compressionquantizationweight normalizationdeep neural networksCIFAR-100ImageNet

0 comments

The pith

Weight normalization before quantization avoids long-tail weight distributions and lowers quantization error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces weight normalization based quantization (WNQ) as a way to compress deep neural network models. It claims that normalizing weights first prevents the long-tail distributions that inflate quantization error in existing methods. This enables more accurate compressed models suitable for mobile and embedded deployment. Experiments are reported to show gains over prior quantization baselines on standard image datasets.

Core claim

WNQ adopts weight normalization to avoid the long-tail distribution of network weights and subsequently reduces the quantization error. Experiments on CIFAR-100 and ImageNet show that WNQ can outperform other baselines to achieve state-of-the-art performance.

What carries the argument

Weight normalization applied to network weights before quantization, intended to reshape their distribution and cut quantization error.

If this is right

WNQ produces lower quantization error than standard quantization methods.
Compressed models retain higher accuracy on CIFAR-100 and ImageNet classification.
WNQ reaches state-of-the-art results among quantization-based compression techniques on the tested datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The normalization step could be tested as a preprocessing module inside other quantization or pruning pipelines.
Whether the same distribution shift helps quantization of non-vision models such as language or reinforcement-learning networks remains open.
If the benefit holds only for certain layer types, selective application per layer might improve results further.

Load-bearing premise

That weight normalization will avoid the long-tail distribution of network weights and thereby reduce quantization error.

What would settle it

Direct measurement of weight histograms after normalization that still show long tails, or a side-by-side quantization error comparison where WNQ does not reduce error relative to un-normalized quantization.

Figures

Figures reproduced from arXiv: 1907.00593 by Wen-Pu Cai, Wu-Jun Li.

**Figure 3.** Figure 3: Distribution of float weights w on some selected layers of ResNet20 on CIFAR-100 in 2-bit setting. Top row is WNQ and bottom row is LQ-Net. Red dots on x-axis are the average quantization levels in this layer. “mse” in each figure denotes the relative mean-squared quantization error of the layer defined in Section 4.4 [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 5.** Figure 5: Top1 accuracy of ResNet18 on ImageNet. distribution will cause a larger quantization error which is denoted as “mse” (relative mean-squared quantization error) in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

With the development of deep neural networks, the size of network models becomes larger and larger. Model compression has become an urgent need for deploying these network models to mobile or embedded devices. Model quantization is a representative model compression technique. Although a lot of quantization methods have been proposed, many of them suffer from a high quantization error caused by a long-tail distribution of network weights. In this paper, we propose a novel quantization method, called weight normalization based quantization (WNQ), for model compression. WNQ adopts weight normalization to avoid the long-tail distribution of network weights and subsequently reduces the quantization error. Experiments on CIFAR-100 and ImageNet show that WNQ can outperform other baselines to achieve state-of-the-art performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

WNQ claims normalization avoids long-tail weights to cut quantization error, but the paper gives no derivation or distribution evidence for that step.

read the letter

The main thing here is that the paper states weight normalization will reshape the weight distribution away from long tails and thereby lower quantization error, yet it supplies no math, no histograms, and no direct error measurements to connect those dots. The experiments just report final accuracies on CIFAR-100 and ImageNet that beat some baselines. That is the central gap. What the work actually contributes is a concrete procedure that first normalizes weights and then quantizes them, presented as a new combination for model compression. It tests the approach on two standard image datasets and claims state-of-the-art numbers, which at least gives practitioners something they can try to reproduce. The writing is direct about the practical goal of shrinking models for mobile hardware. The soft spot is exactly the missing causal link. Without an ablation that isolates the distribution effect, or even a simple plot of weight histograms before and after normalization, the accuracy gains cannot be attributed to the stated mechanism rather than other choices in the pipeline. The abstract also does not detail the baselines or report variance across runs, so the strength of the empirical claim is hard to judge from the given material. This paper is for people who build or deploy compressed networks and are looking for quantization variants that are easy to code. A reader already working in that area might pick up the method and test it themselves. It is not a deep theoretical advance, but the topic is relevant enough that a serious editor should send it to referees who can check the implementation details and ask for the missing distribution analysis. I would not desk-reject it on the current evidence.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes Weight Normalization based Quantization (WNQ) as a model compression technique. It asserts that weight normalization avoids long-tail weight distributions in DNNs and thereby reduces quantization error, with experiments on CIFAR-100 and ImageNet demonstrating outperformance over baselines and state-of-the-art results.

Significance. If the central mechanism were validated through distributional analysis and direct quantization-error measurements, the approach could supply a lightweight preprocessing step for existing quantizers. The reported accuracy numbers on standard benchmarks indicate possible practical value for embedded deployment, but the missing link between normalization and error reduction prevents the result from being assessed as a clear advance.

major comments (3)

[Abstract] Abstract: the claim that WNQ 'adopts weight normalization to avoid the long-tail distribution of network weights and subsequently reduces the quantization error' is stated without any derivation, cumulative-distribution analysis, or expected-error calculation showing how the normalization transform alters the weight statistics or lowers quantization error.
[Experiments] Experiments (CIFAR-100 and ImageNet results): accuracy improvements are reported, yet no before/after weight histograms, no measured reduction in quantization error (e.g., L2 or per-layer), and no ablation that isolates the distribution-normalization effect from other quantization choices are supplied, leaving the stated mechanism unsupported.
[Method] Method section: no equation or analysis demonstrates that the weight-normalization step changes the tail behavior in a manner that is guaranteed (or even expected) to reduce quantization error for the subsequent uniform or non-uniform quantizer.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the manuscript would be strengthened by explicit distributional analysis, error measurements, and ablations to support the claimed mechanism, and we will revise the paper to address these points.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that WNQ 'adopts weight normalization to avoid the long-tail distribution of network weights and subsequently reduces the quantization error' is stated without any derivation, cumulative-distribution analysis, or expected-error calculation showing how the normalization transform alters the weight statistics or lowers quantization error.

Authors: We acknowledge that the abstract states the mechanism without supporting derivation or analysis in the current manuscript. In revision we will move a concise derivation and reference to cumulative-distribution effects into the method section and update the abstract to point to it. revision: yes
Referee: [Experiments] Experiments (CIFAR-100 and ImageNet results): accuracy improvements are reported, yet no before/after weight histograms, no measured reduction in quantization error (e.g., L2 or per-layer), and no ablation that isolates the distribution-normalization effect from other quantization choices are supplied, leaving the stated mechanism unsupported.

Authors: We agree these supporting measurements and ablations are absent. The revised version will add before/after weight histograms, per-layer L2 quantization-error reductions, and an ablation isolating the normalization step. revision: yes
Referee: [Method] Method section: no equation or analysis demonstrates that the weight-normalization step changes the tail behavior in a manner that is guaranteed (or even expected) to reduce quantization error for the subsequent uniform or non-uniform quantizer.

Authors: The current method section describes the procedure but does not contain the requested analysis. We will add an analytical subsection with equations showing how the normalization transform reduces tail mass and its expected effect on uniform quantization error. revision: yes

Circularity Check

0 steps flagged

No circularity: technique proposed by design choice with no self-referential reduction in any derivation chain.

full rationale

The paper introduces WNQ as a method that adopts weight normalization to avoid long-tail weight distributions. This is stated as an assertion in the abstract and central claim without any equations, fitted parameters renamed as predictions, or self-citations that would make the outcome equivalent to the input by construction. No load-bearing step reduces to a tautology (e.g., no Eq. X defined in terms of the claimed effect of X). Experiments report accuracy but do not involve the circular patterns of fitted-input-called-prediction or ansatz-smuggled-in-via-citation. The derivation chain, such as it exists, is self-contained as a proposal rather than a closed loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described or can be extracted.

pith-pipeline@v0.9.0 · 5647 in / 1065 out tokens · 70138 ms · 2026-05-25T11:42:06.509156+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

WNQ adopts weight normalization to avoid the long-tail distribution of network weights and subsequently reduces the quantization error.
IndisputableMonolith/Foundation/RealityFromDistinction reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The gradient of the element with maximum absolute value ... will make the element ... move towards zero ... avoid (eliminate) the long-tail

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 7 internal anchors

[1]

Deep learning with low precision by half-wave gaussian quantization

Zhaowei Cai, Xiaodong He, Jian Sun, and Nuno Vasconcelos. Deep learning with low precision by half-wave gaussian quantization. In IEEE Conference on Computer Vision and Pattern Recognition, 2017

work page 2017
[2]

Binaryconnect: Training deep neural networks with binary weights during propagations

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems, 2015

work page 2015
[3]

Regularizing Activation Distribution for Training Binarized Deep Networks

Ruizhou Ding, Ting-Wu Chin, Zeye Liu, and Diana Marculescu. Regularizing activation distribution for training binarized deep networks. arXiv preprint arXiv:1904.02823, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[4]

Heterogeneous bitwidth binarization in convolutional neural networks

Joshua Fromm, Shwetak Patel, and Matthai Philipose. Heterogeneous bitwidth binarization in convolutional neural networks. In Advances in Neural Information Processing Systems, 2018

work page 2018
[5]

Network sketching: Exploiting binary structure in deep cnns

Yiwen Guo, Anbang Yao, Hao Zhao, and Yurong Chen. Network sketching: Exploiting binary structure in deep cnns. In IEEE Conference on Computer Vision and Pattern Recognition, 2017

work page 2017
[6]

Learning both weights and connections for efﬁcient neural network

Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efﬁcient neural network. In Advances in Neural Information Processing Systems, 2015

work page 2015
[7]

Identity mappings in deep residual networks

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision, 2016

work page 2016
[8]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2016

work page 2016
[9]

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

Yang He, Ping Liu, Ziwei Wang, and Yi Yang. Pruning ﬁlter via geometric median for deep convolutional neural networks acceleration. arXiv preprint arXiv:1811.00250, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network using Truncated Gaussian Approximation

Zhezhi He and Deliang Fan. Simultaneously optimizing weight and quantizer of ternary neural network using truncated gaussian approximation. arXiv preprint arXiv:1810.01018, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[11]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[12]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efﬁcient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

Squeeze-and-excitation networks

Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018
[14]

Binarized neural networks

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks. In Advances in Neural Information Processing Systems, 2016

work page 2016
[15]

Quantization and training of neural networks for efﬁcient integer- arithmetic-only inference

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efﬁcient integer- arithmetic-only inference. In IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018
[16]

Learning to quantize deep networks by optimizing quantization intervals with task loss

Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Jae-Joon Han, Youngjun Kwak, Sung Ju Hwang, and Changkyu Choi. Learning to quantize deep networks by optimizing quantization intervals with task loss. In IEEE Conference on Computer Vision and Pattern Recognition, 2019

work page 2019
[17]

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009

work page 2009
[19]

Extremely low bit neural network: Squeeze the last bit out with admm

Cong Leng, Zesheng Dou, Hao Li, Shenghuo Zhu, and Rong Jin. Extremely low bit neural network: Squeeze the last bit out with admm. In AAAI Conference on Artiﬁcial Intelligence, 2018

work page 2018
[20]

Ternary weight networks

Fengfu Li, Bo Zhang, and Bin Liu. Ternary weight networks. arXiv preprint arXiv:1605.04711, 2016

work page arXiv 2016
[21]

Pruning ﬁlters for efﬁcient convnets

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning ﬁlters for efﬁcient convnets. In International Conference on Learning Representations, 2017. 9

work page 2017
[22]

Synaptic strength for convolutional neural network

Chen Lin, Zhao Zhong, Wei Wu, and Junjie Yan. Synaptic strength for convolutional neural network. In Advances in Neural Information Processing System, 2018

work page 2018
[23]

Towards accurate binary convolutional neural network

Xiaofan Lin, Cong Zhao, and Wei Pan. Towards accurate binary convolutional neural network. InAdvances in Neural Information Processing Systems, 2017

work page 2017
[24]

Thinet: A ﬁlter level pruning method for deep neural network compression

Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. Thinet: A ﬁlter level pruning method for deep neural network compression. In IEEE International Conference on Computer Vision, 2017

work page 2017
[25]

Tensorizing neural networks

Alexander Novikov, Dmitrii Podoprikhin, Anton Osokin, and Dmitry P Vetrov. Tensorizing neural networks. In Advances in Neural Information Processing Systems, 2015

work page 2015
[26]

Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration

Adam Paszke, Sam Gross, Soumith Chintala, and Gregory Chanan. Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration. PyTorch: Tensors and Dynamic Neural Networks in Python with Strong GPU Acceleration, 6, 2017

work page 2017
[27]

Extreme network compression via ﬁlter group approximation

Bo Peng, Wenming Tan, Zheyang Li, Shun Zhang, Di Xie, and Shiliang Pu. Extreme network compression via ﬁlter group approximation. In European Conference on Computer Vision, 2018

work page 2018
[28]

Xnor-net: Imagenet clas- siﬁcation using binary convolutional neural networks

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet clas- siﬁcation using binary convolutional neural networks. In European Conference on Computer Vision , 2016

work page 2016
[29]

Imagenet large scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015

work page 2015
[30]

Learning discrete weights using the local reparameterization trick

Oran Shayer, Dan Levi, and Ethan Fetaya. Learning discrete weights using the local reparameterization trick. In International Conference on Learning Representations, 2018

work page 2018
[31]

Tbn: Convolutional neural network with ternary inputs and binary weights

Diwen Wan, Fumin Shen, Li Liu, Fan Zhu, Jie Qin, Ling Shao, and Heng Tao Shen. Tbn: Convolutional neural network with ternary inputs and binary weights. In Proceedings of the European Conference on Computer Vision (ECCV), pages 315–332, 2018

work page 2018
[32]

Learning structured sparsity in deep neural networks

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems, 2016

work page 2016
[33]

Lq-nets: Learned quantization for highly accurate and compact deep neural networks

Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In European Conference on Computer Vision, 2018

work page 2018
[34]

Deep mutual learning

Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan Lu. Deep mutual learning. In IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018
[35]

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[36]

Trained ternary quantization

Chenzhuo Zhu, Song Han, Huizi Mao, and William J Dally. Trained ternary quantization. In International Conference on Learning Representations, 2017

work page 2017
[37]

Towards effective low-bitwidth convolutional neural networks

Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, and Ian Reid. Towards effective low-bitwidth convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, 2018. 10

work page 2018

[1] [1]

Deep learning with low precision by half-wave gaussian quantization

Zhaowei Cai, Xiaodong He, Jian Sun, and Nuno Vasconcelos. Deep learning with low precision by half-wave gaussian quantization. In IEEE Conference on Computer Vision and Pattern Recognition, 2017

work page 2017

[2] [2]

Binaryconnect: Training deep neural networks with binary weights during propagations

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems, 2015

work page 2015

[3] [3]

Regularizing Activation Distribution for Training Binarized Deep Networks

Ruizhou Ding, Ting-Wu Chin, Zeye Liu, and Diana Marculescu. Regularizing activation distribution for training binarized deep networks. arXiv preprint arXiv:1904.02823, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[4] [4]

Heterogeneous bitwidth binarization in convolutional neural networks

Joshua Fromm, Shwetak Patel, and Matthai Philipose. Heterogeneous bitwidth binarization in convolutional neural networks. In Advances in Neural Information Processing Systems, 2018

work page 2018

[5] [5]

Network sketching: Exploiting binary structure in deep cnns

Yiwen Guo, Anbang Yao, Hao Zhao, and Yurong Chen. Network sketching: Exploiting binary structure in deep cnns. In IEEE Conference on Computer Vision and Pattern Recognition, 2017

work page 2017

[6] [6]

Learning both weights and connections for efﬁcient neural network

Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efﬁcient neural network. In Advances in Neural Information Processing Systems, 2015

work page 2015

[7] [7]

Identity mappings in deep residual networks

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision, 2016

work page 2016

[8] [8]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2016

work page 2016

[9] [9]

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

Yang He, Ping Liu, Ziwei Wang, and Yi Yang. Pruning ﬁlter via geometric median for deep convolutional neural networks acceleration. arXiv preprint arXiv:1811.00250, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network using Truncated Gaussian Approximation

Zhezhi He and Deliang Fan. Simultaneously optimizing weight and quantizer of ternary neural network using truncated gaussian approximation. arXiv preprint arXiv:1810.01018, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[11] [11]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[12] [12]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efﬁcient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[13] [13]

Squeeze-and-excitation networks

Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018

[14] [14]

Binarized neural networks

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks. In Advances in Neural Information Processing Systems, 2016

work page 2016

[15] [15]

Quantization and training of neural networks for efﬁcient integer- arithmetic-only inference

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efﬁcient integer- arithmetic-only inference. In IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018

[16] [16]

Learning to quantize deep networks by optimizing quantization intervals with task loss

Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Jae-Joon Han, Youngjun Kwak, Sung Ju Hwang, and Changkyu Choi. Learning to quantize deep networks by optimizing quantization intervals with task loss. In IEEE Conference on Computer Vision and Pattern Recognition, 2019

work page 2019

[17] [17]

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[18] [18]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009

work page 2009

[19] [19]

Extremely low bit neural network: Squeeze the last bit out with admm

Cong Leng, Zesheng Dou, Hao Li, Shenghuo Zhu, and Rong Jin. Extremely low bit neural network: Squeeze the last bit out with admm. In AAAI Conference on Artiﬁcial Intelligence, 2018

work page 2018

[20] [20]

Ternary weight networks

Fengfu Li, Bo Zhang, and Bin Liu. Ternary weight networks. arXiv preprint arXiv:1605.04711, 2016

work page arXiv 2016

[21] [21]

Pruning ﬁlters for efﬁcient convnets

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning ﬁlters for efﬁcient convnets. In International Conference on Learning Representations, 2017. 9

work page 2017

[22] [22]

Synaptic strength for convolutional neural network

Chen Lin, Zhao Zhong, Wei Wu, and Junjie Yan. Synaptic strength for convolutional neural network. In Advances in Neural Information Processing System, 2018

work page 2018

[23] [23]

Towards accurate binary convolutional neural network

Xiaofan Lin, Cong Zhao, and Wei Pan. Towards accurate binary convolutional neural network. InAdvances in Neural Information Processing Systems, 2017

work page 2017

[24] [24]

Thinet: A ﬁlter level pruning method for deep neural network compression

Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. Thinet: A ﬁlter level pruning method for deep neural network compression. In IEEE International Conference on Computer Vision, 2017

work page 2017

[25] [25]

Tensorizing neural networks

Alexander Novikov, Dmitrii Podoprikhin, Anton Osokin, and Dmitry P Vetrov. Tensorizing neural networks. In Advances in Neural Information Processing Systems, 2015

work page 2015

[26] [26]

Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration

Adam Paszke, Sam Gross, Soumith Chintala, and Gregory Chanan. Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration. PyTorch: Tensors and Dynamic Neural Networks in Python with Strong GPU Acceleration, 6, 2017

work page 2017

[27] [27]

Extreme network compression via ﬁlter group approximation

Bo Peng, Wenming Tan, Zheyang Li, Shun Zhang, Di Xie, and Shiliang Pu. Extreme network compression via ﬁlter group approximation. In European Conference on Computer Vision, 2018

work page 2018

[28] [28]

Xnor-net: Imagenet clas- siﬁcation using binary convolutional neural networks

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet clas- siﬁcation using binary convolutional neural networks. In European Conference on Computer Vision , 2016

work page 2016

[29] [29]

Imagenet large scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015

work page 2015

[30] [30]

Learning discrete weights using the local reparameterization trick

Oran Shayer, Dan Levi, and Ethan Fetaya. Learning discrete weights using the local reparameterization trick. In International Conference on Learning Representations, 2018

work page 2018

[31] [31]

Tbn: Convolutional neural network with ternary inputs and binary weights

Diwen Wan, Fumin Shen, Li Liu, Fan Zhu, Jie Qin, Ling Shao, and Heng Tao Shen. Tbn: Convolutional neural network with ternary inputs and binary weights. In Proceedings of the European Conference on Computer Vision (ECCV), pages 315–332, 2018

work page 2018

[32] [32]

Learning structured sparsity in deep neural networks

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems, 2016

work page 2016

[33] [33]

Lq-nets: Learned quantization for highly accurate and compact deep neural networks

Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In European Conference on Computer Vision, 2018

work page 2018

[34] [34]

Deep mutual learning

Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan Lu. Deep mutual learning. In IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018

[35] [35]

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[36] [36]

Trained ternary quantization

Chenzhuo Zhu, Song Han, Huizi Mao, and William J Dally. Trained ternary quantization. In International Conference on Learning Representations, 2017

work page 2017

[37] [37]

Towards effective low-bitwidth convolutional neural networks

Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, and Ian Reid. Towards effective low-bitwidth convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, 2018. 10

work page 2018