New pointwise convolution in Deep Neural Networks through Extremely Fast and Non Parametric Transforms

Joonhyun Jeong; Sung-Ho Bae

arxiv: 1906.12172 · v1 · pith:FRAX6U6Gnew · submitted 2019-06-25 · 💻 cs.CV · cs.LG

New pointwise convolution in Deep Neural Networks through Extremely Fast and Non Parametric Transforms

Joonhyun Jeong , Sung-Ho Bae This is my paper

Pith reviewed 2026-05-25 16:52 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords pointwise convolutionDiscrete Walsh-Hadamard TransformDWHTneural network efficiencyparameter reductionCIFAR-100MobileNet-V1non-parametric transforms

0 comments

The pith

Fixed transforms like DWHT replace pointwise convolutions in neural networks, cutting parameters by 79% with an accuracy gain on CIFAR-100.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard transforms such as the Discrete Walsh-Hadamard Transform and Discrete Cosine Transform can serve as non-learnable replacements for pointwise convolution layers in deep neural networks. These transforms capture cross-channel correlations using fixed operations, primarily additions and subtractions for DWHT, which removes the need for trainable weights and floating-point multiplications. As a result, the networks require far fewer parameters and floating-point operations yet achieve comparable or better accuracy on image classification. The fast implementation of DWHT further reduces addition complexity from quadratic to logarithmic scaling. This produces highly efficient models, as shown by gains over the MobileNet-V1 baseline on CIFAR-100.

Core claim

The authors propose using the Discrete Walsh-Hadamard Transform as a parameter-free pointwise convolution in DNNs. This leverages the transform's ability to capture cross-channel correlations without learnable parameters, requiring only additions and subtractions with a fast algorithm that reduces complexity to O(n log n). When applied within MobileNet-V1 on CIFAR-100, the resulting model achieves 1.49% higher accuracy with 79.1% fewer parameters and 48.4% fewer FLOPs.

What carries the argument

Discrete Walsh-Hadamard Transform (DWHT) applied as a fixed, parameter-free pointwise convolution operator that mixes channels through additions and subtractions only.

If this is right

Pointwise convolution layers can be built with zero learnable parameters.
Floating-point multiplications disappear from those layers, leaving only additions and subtractions.
Fast algorithms lower the cost of those additions from O(n squared) to O(n log n).
The same substitution works with DCT and yields similar efficiency gains.
Accuracy on CIFAR-100 rises by 1.49 percent relative to the unmodified baseline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fixed-transform replacement could be inserted into other lightweight architectures besides MobileNet-V1.
Fewer free parameters may reduce overfitting risk on smaller training sets.
Combining the method with quantization or pruning would likely produce further compute savings.
Testing on ImageNet would reveal whether the correlation-capture property scales to higher-resolution data.

Load-bearing premise

Fixed transforms such as DWHT capture the cross-channel correlations needed for the task as effectively as learned pointwise convolutions, without any accuracy loss.

What would settle it

Training the DWHT-replaced model on CIFAR-100 and observing accuracy below the MobileNet-V1 baseline would disprove the claim that the replacement maintains or improves performance.

Figures

Figures reproduced from arXiv: 1906.12172 by Joonhyun Jeong, Sung-Ho Bae.

**Figure 2.** Figure 2: Our blocks using conventional transform pointwise convolution (CTPC), random constant [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Performance curve of hierarchically applying our optimal block on CIFAR100, Top: in the [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Histograms of hierarchy level (low-level, middle-level, high-level) activations after the [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Histogram of 3 × 3 depthwise convolution weights in the third block, out of last 3 blocks. DCT-3-H and DWHT-3-H models are based on ShuffleNet V2 1.1x model with (d) block. Baseline model is ShuffleNet V2 1.1x model [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation study of weight decay values (5e-4, 2e-3, 1e-2, 1e-1). We applied these weight [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Performance curve of hierarchically applying our optimal block (See Table 2 for detail [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Histograms of 3 × 3 depthwise convolution weights, Top: histogram of first block out of last 3 blocks, Bottom: histogram of second block out of last 3 blocks. DWHT-3-H and DCT-3-H models are based on ShuffleNet-V2 1.1x model with (d)-DWHT and (d)-DCT block, respectively. Baseline model is ShuffleNet-V2 1.1x model. Further, 3-M-Rear models gave slightly superior efficiency while 7-M, 3-M-Front, and low-leve… view at source ↗

read the original abstract

Some conventional transforms such as Discrete Walsh-Hadamard Transform (DWHT) and Discrete Cosine Transform (DCT) have been widely used as feature extractors in image processing but rarely applied in neural networks. However, we found that these conventional transforms have the ability to capture the cross-channel correlations without any learnable parameters in DNNs. This paper firstly proposes to apply conventional transforms to pointwise convolution, showing that such transforms significantly reduce the computational complexity of neural networks without accuracy performance degradation. Especially for DWHT, it requires no floating point multiplications but only additions and subtractions, which can considerably reduce computation overheads. In addition, its fast algorithm further reduces complexity of floating point addition from $\mathcal{O}(n^2)$ to $\mathcal{O}(n\log n)$. These nice properties construct extremely efficient networks in the number parameters and operations, enjoying accuracy gain. Our proposed DWHT-based model gained 1.49\% accuracy increase with 79.1\% reduced parameters and 48.4\% reduced FLOPs compared with its baseline model (MoblieNet-V1) on the CIFAR 100 dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper swaps pointwise convolutions for fixed DWHT in a MobileNet and reports a modest accuracy gain plus large efficiency cuts on CIFAR-100, but the single comparison leaves the result hard to trust without more checks.

read the letter

The core claim is straightforward: replace the learnable 1x1 pointwise layers in MobileNet-V1 with a fixed Discrete Walsh-Hadamard Transform. The authors say this keeps or improves accuracy while slashing parameters by 79% and FLOPs by 48% on CIFAR-100, and they note the transform needs only additions and subtractions with an O(n log n) fast version. They also float DCT as another option. This is new as a direct substitution into the channel-mixing step, even though the transforms are classical tools from signal processing. The experiment tests the assumption that these fixed matrices can capture cross-channel correlations without any trained weights, and the reported numbers say it works with a small accuracy upside. That is the useful part if it holds: a concrete, zero-parameter primitive that could matter for edge hardware. The evidence is thin. The abstract gives one head-to-head number with no error bars, no ablation on placement or training protocol, and no mention of whether the baseline received equal tuning effort. A single run on one dataset does not yet show the gain is robust. The paper is aimed at people building lightweight CNNs for constrained devices. A reader who needs a drop-in way to cut 1x1 conv cost might extract value once the implementation details are verified. I would send it to peer review. The idea is simple enough that referees can test the substitution themselves and see whether the efficiency numbers survive.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes replacing pointwise convolutions in DNNs (specifically MobileNet-V1) with fixed, parameter-free transforms such as the Discrete Walsh-Hadamard Transform (DWHT) and Discrete Cosine Transform (DCT). These transforms are claimed to capture cross-channel correlations, yielding networks with substantially lower parameter counts and FLOPs. On CIFAR-100 the DWHT variant is reported to improve accuracy by 1.49% while reducing parameters by 79.1% and FLOPs by 48.4% relative to the baseline.

Significance. If the empirical result is robust, the work offers a concrete, multiplication-free alternative to learned pointwise convolutions that also exploits the fast O(n log n) DWHT algorithm. This could be useful for resource-constrained settings. The manuscript does not supply machine-checked proofs or reproducible code, so the primary strength is the reported head-to-head measurement rather than a parameter-free derivation.

major comments (2)

[Experimental Results] Experimental section: the 1.49% accuracy gain, 79.1% parameter reduction, and 48.4% FLOP reduction on CIFAR-100 are presented from a single run with no error bars, no multiple random seeds, and no ablation on insertion point or baseline re-tuning. This single comparison is load-bearing for the central claim yet lacks the controls needed to establish robustness.
[Methods] Methods / architecture description: the paper does not specify how the fixed DWHT matrix is applied to feature maps of arbitrary channel count, whether any normalization or reshaping is required, or how the transform interacts with the existing depthwise layers. These details are necessary to reproduce the stated parameter and FLOP counts.

minor comments (2)

[Abstract] Abstract: 'MoblieNet-V1' is a typographical error.
[Abstract] The abstract states the transforms operate 'without accuracy performance degradation,' yet the reported result is a 1.49% improvement; the wording should be aligned with the actual finding.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and will make the indicated revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Experimental Results] Experimental section: the 1.49% accuracy gain, 79.1% parameter reduction, and 48.4% FLOP reduction on CIFAR-100 are presented from a single run with no error bars, no multiple random seeds, and no ablation on insertion point or baseline re-tuning. This single comparison is load-bearing for the central claim yet lacks the controls needed to establish robustness.

Authors: We agree that reporting results from multiple runs with error bars and ablations would better establish robustness. In the revised manuscript we will add experiments averaged over several random seeds (with standard deviations), plus ablations on transform insertion points and any baseline re-tuning required. revision: yes
Referee: [Methods] Methods / architecture description: the paper does not specify how the fixed DWHT matrix is applied to feature maps of arbitrary channel count, whether any normalization or reshaping is required, or how the transform interacts with the existing depthwise layers. These details are necessary to reproduce the stated parameter and FLOP counts.

Authors: We will expand the methods section with a precise description of how the fixed DWHT (and DCT) matrices are applied to feature maps of any channel count. This will include the exact reshaping/padding procedure, any normalization steps, and the integration point relative to the depthwise layers, allowing exact reproduction of the reported parameter and FLOP counts. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper advances an empirical proposal to replace pointwise convolutions with fixed, parameter-free transforms such as DWHT, then directly measures the resulting accuracy, parameter count, and FLOP reductions against MobileNet-V1 on CIFAR-100. No derivation chain, equation, or uniqueness claim is shown to reduce by construction to a fitted input, self-citation, or renamed ansatz; the central performance numbers are external experimental outcomes rather than algebraic identities.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim depends on the domain assumption that DWHT and DCT already encode the necessary cross-channel statistics; no new free parameters or invented entities are introduced because the transforms are non-parametric.

axioms (1)

domain assumption Conventional transforms such as DWHT and DCT can capture cross-channel correlations in feature maps without learnable parameters
Invoked in the abstract as the justification for replacing pointwise convolution.

pith-pipeline@v0.9.0 · 5731 in / 1181 out tokens · 28317 ms · 2026-05-25T16:52:16.075642+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 15 internal anchors

[2]

An Analysis of Deep Neural Network Models for Practical Applications

URL http://arxiv.org/ 9 Figure 5: Histogram of 3 × 3 depthwise convolution weights in the third block, out of last 3 blocks. DCT-3-H and DWHT-3-H models are based on ShufﬂeNet V2 1.1x model with (d) block. Baseline model is ShufﬂeNet V2 1.1x model. Figure 6: Ablation study of weight decay values (5e-4, 2e-3, 1e-2, 1e-1). We applied these weight decay valu...

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

URL http://arxiv. org/abs/1602.02830. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. CoRR, abs/1511.00363,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

URL http: //arxiv.org/abs/1511.00363. Saeed Dabbaghchian, Masoumeh P Ghaemmaghami, and Ali Aghagolzadeh. Feature extraction using discrete cosine transform and discrimination power analysis with a face recognition tech- nology. Pattern Recognition, 43(4):1431–1440,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Imagenet: A large-scale hi- erarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hi- erarchical image database. In 2009 IEEE conference on computer vision and pattern recognition , pp. 248–255. Ieee,

work page 2009
[6]

10 Arthita Ghosh and Rama Chellappa

doi: 10.1109/TIP.2014.2362652. 10 Arthita Ghosh and Rama Chellappa. Deep feature extraction in the dct domain. In 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3536–3541. IEEE,

work page doi:10.1109/tip.2014.2362652 2014
[7]

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015a. Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efﬁcient neural network. In Advances in neural information processing system...

work page internal anchor Pith review Pith/arXiv arXiv
[8]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 770–778, June

work page 2016
[9]

doi: 10.1109/CVPR.2016.90. M. Horowitz. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) , pp. 10–14, Feb

work page doi:10.1109/cvpr.2016.90 2016
[10]

Andrew G

doi: 10.1109/ISSCC.2014.6757323. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efﬁcient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861,

work page doi:10.1109/isscc.2014.6757323 2014
[11]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

URL http://arxiv.org/abs/ 1704.04861. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. CoRR, abs/1609.07061,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

URL http://arxiv.org/abs/1609.07061. Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

URL http://arxiv.org/ abs/1502.03167. Yunho Jeon and Junmo Kim. Active convolution: Learning the shape of convolution for image clas- siﬁcation. CoRR, abs/1703.09076,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Active Convolution: Learning the Shape of Convolution for Image Classification

URL http://arxiv.org/abs/1703.09076. Yunho Jeon and Junmo Kim. Constructing fast network through deconstruction of convolution. CoRR, abs/1806.07370,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Constructing Fast Network through Deconstruction of Convolution

URL http://arxiv.org/abs/1806.07370. Felix Juefei-Xu, Vishnu Naresh Boddeti, and Marios Savvides. Local binary convolutional neural networks. CoRR, abs/1608.06049,

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Local Binary Convolutional Neural Networks

URL http://arxiv.org/abs/1608.06049. Chi-Wah Kok. Fast algorithm for computing discrete cosine transform.IEEE Transactions on Signal Processing, 45(3):757–760,

work page internal anchor Pith review Pith/arXiv arXiv
[18]

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

URL http://arxiv. org/abs/1807.11164. Vinod Nair and Geoffrey E Hinton. Rectiﬁed linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) , pp. 807–814,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

doi: 10.1109/PROC.1969.6869

ISSN 0018-9219. doi: 10.1109/PROC.1969.6869. K Ramamohan Rao and Ping Yip. Discrete cosine transform: algorithms, advantages, applications . Academic press,

work page doi:10.1109/proc.1969.6869 1969
[20]

Regularized Evolution for Image Classifier Architecture Search

Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized evolution for image classiﬁer architecture search. arXiv preprint arXiv:1802.01548,

work page internal anchor Pith review Pith/arXiv arXiv
[22]

MobileNetV2: Inverted Residuals and Linear Bottlenecks

URL http://arxiv.org/abs/1801.04381. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556,

work page internal anchor Pith review Pith/arXiv arXiv
[23]

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR, abs/1602.07261, 2016a. URL http:// arxiv.org/abs/1602.07261. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethink- ing the inception architecture for computer vision. In Pro...

work page internal anchor Pith review Pith/arXiv arXiv
[25]

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

URL http: //arxiv.org/abs/1707.01083. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710,

work page internal anchor Pith review Pith/arXiv arXiv

[1] [2]

An Analysis of Deep Neural Network Models for Practical Applications

URL http://arxiv.org/ 9 Figure 5: Histogram of 3 × 3 depthwise convolution weights in the third block, out of last 3 blocks. DCT-3-H and DWHT-3-H models are based on ShufﬂeNet V2 1.1x model with (d) block. Baseline model is ShufﬂeNet V2 1.1x model. Figure 6: Ablation study of weight decay values (5e-4, 2e-3, 1e-2, 1e-1). We applied these weight decay valu...

work page internal anchor Pith review Pith/arXiv arXiv

[2] [3]

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

URL http://arxiv. org/abs/1602.02830. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. CoRR, abs/1511.00363,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [4]

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

URL http: //arxiv.org/abs/1511.00363. Saeed Dabbaghchian, Masoumeh P Ghaemmaghami, and Ali Aghagolzadeh. Feature extraction using discrete cosine transform and discrimination power analysis with a face recognition tech- nology. Pattern Recognition, 43(4):1431–1440,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [5]

Imagenet: A large-scale hi- erarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hi- erarchical image database. In 2009 IEEE conference on computer vision and pattern recognition , pp. 248–255. Ieee,

work page 2009

[5] [6]

10 Arthita Ghosh and Rama Chellappa

doi: 10.1109/TIP.2014.2362652. 10 Arthita Ghosh and Rama Chellappa. Deep feature extraction in the dct domain. In 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3536–3541. IEEE,

work page doi:10.1109/tip.2014.2362652 2014

[6] [7]

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015a. Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efﬁcient neural network. In Advances in neural information processing system...

work page internal anchor Pith review Pith/arXiv arXiv

[7] [8]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 770–778, June

work page 2016

[8] [9]

doi: 10.1109/CVPR.2016.90. M. Horowitz. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) , pp. 10–14, Feb

work page doi:10.1109/cvpr.2016.90 2016

[9] [10]

Andrew G

doi: 10.1109/ISSCC.2014.6757323. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efﬁcient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861,

work page doi:10.1109/isscc.2014.6757323 2014

[10] [11]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

URL http://arxiv.org/abs/ 1704.04861. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. CoRR, abs/1609.07061,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [12]

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

URL http://arxiv.org/abs/1609.07061. Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [13]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

URL http://arxiv.org/ abs/1502.03167. Yunho Jeon and Junmo Kim. Active convolution: Learning the shape of convolution for image clas- siﬁcation. CoRR, abs/1703.09076,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [14]

Active Convolution: Learning the Shape of Convolution for Image Classification

URL http://arxiv.org/abs/1703.09076. Yunho Jeon and Junmo Kim. Constructing fast network through deconstruction of convolution. CoRR, abs/1806.07370,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [15]

Constructing Fast Network through Deconstruction of Convolution

URL http://arxiv.org/abs/1806.07370. Felix Juefei-Xu, Vishnu Naresh Boddeti, and Marios Savvides. Local binary convolutional neural networks. CoRR, abs/1608.06049,

work page internal anchor Pith review Pith/arXiv arXiv

[15] [16]

Local Binary Convolutional Neural Networks

URL http://arxiv.org/abs/1608.06049. Chi-Wah Kok. Fast algorithm for computing discrete cosine transform.IEEE Transactions on Signal Processing, 45(3):757–760,

work page internal anchor Pith review Pith/arXiv arXiv

[16] [18]

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

URL http://arxiv. org/abs/1807.11164. Vinod Nair and Geoffrey E Hinton. Rectiﬁed linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) , pp. 807–814,

work page internal anchor Pith review Pith/arXiv arXiv

[17] [19]

doi: 10.1109/PROC.1969.6869

ISSN 0018-9219. doi: 10.1109/PROC.1969.6869. K Ramamohan Rao and Ping Yip. Discrete cosine transform: algorithms, advantages, applications . Academic press,

work page doi:10.1109/proc.1969.6869 1969

[18] [20]

Regularized Evolution for Image Classifier Architecture Search

Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized evolution for image classiﬁer architecture search. arXiv preprint arXiv:1802.01548,

work page internal anchor Pith review Pith/arXiv arXiv

[19] [22]

MobileNetV2: Inverted Residuals and Linear Bottlenecks

URL http://arxiv.org/abs/1801.04381. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556,

work page internal anchor Pith review Pith/arXiv arXiv

[20] [23]

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR, abs/1602.07261, 2016a. URL http:// arxiv.org/abs/1602.07261. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethink- ing the inception architecture for computer vision. In Pro...

work page internal anchor Pith review Pith/arXiv arXiv

[21] [25]

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

URL http: //arxiv.org/abs/1707.01083. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710,

work page internal anchor Pith review Pith/arXiv arXiv