New pointwise convolution in Deep Neural Networks through Extremely Fast and Non Parametric Transforms
Pith reviewed 2026-05-25 16:52 UTC · model grok-4.3
The pith
Fixed transforms like DWHT replace pointwise convolutions in neural networks, cutting parameters by 79% with an accuracy gain on CIFAR-100.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors propose using the Discrete Walsh-Hadamard Transform as a parameter-free pointwise convolution in DNNs. This leverages the transform's ability to capture cross-channel correlations without learnable parameters, requiring only additions and subtractions with a fast algorithm that reduces complexity to O(n log n). When applied within MobileNet-V1 on CIFAR-100, the resulting model achieves 1.49% higher accuracy with 79.1% fewer parameters and 48.4% fewer FLOPs.
What carries the argument
Discrete Walsh-Hadamard Transform (DWHT) applied as a fixed, parameter-free pointwise convolution operator that mixes channels through additions and subtractions only.
If this is right
- Pointwise convolution layers can be built with zero learnable parameters.
- Floating-point multiplications disappear from those layers, leaving only additions and subtractions.
- Fast algorithms lower the cost of those additions from O(n squared) to O(n log n).
- The same substitution works with DCT and yields similar efficiency gains.
- Accuracy on CIFAR-100 rises by 1.49 percent relative to the unmodified baseline.
Where Pith is reading between the lines
- The same fixed-transform replacement could be inserted into other lightweight architectures besides MobileNet-V1.
- Fewer free parameters may reduce overfitting risk on smaller training sets.
- Combining the method with quantization or pruning would likely produce further compute savings.
- Testing on ImageNet would reveal whether the correlation-capture property scales to higher-resolution data.
Load-bearing premise
Fixed transforms such as DWHT capture the cross-channel correlations needed for the task as effectively as learned pointwise convolutions, without any accuracy loss.
What would settle it
Training the DWHT-replaced model on CIFAR-100 and observing accuracy below the MobileNet-V1 baseline would disprove the claim that the replacement maintains or improves performance.
Figures
read the original abstract
Some conventional transforms such as Discrete Walsh-Hadamard Transform (DWHT) and Discrete Cosine Transform (DCT) have been widely used as feature extractors in image processing but rarely applied in neural networks. However, we found that these conventional transforms have the ability to capture the cross-channel correlations without any learnable parameters in DNNs. This paper firstly proposes to apply conventional transforms to pointwise convolution, showing that such transforms significantly reduce the computational complexity of neural networks without accuracy performance degradation. Especially for DWHT, it requires no floating point multiplications but only additions and subtractions, which can considerably reduce computation overheads. In addition, its fast algorithm further reduces complexity of floating point addition from $\mathcal{O}(n^2)$ to $\mathcal{O}(n\log n)$. These nice properties construct extremely efficient networks in the number parameters and operations, enjoying accuracy gain. Our proposed DWHT-based model gained 1.49\% accuracy increase with 79.1\% reduced parameters and 48.4\% reduced FLOPs compared with its baseline model (MoblieNet-V1) on the CIFAR 100 dataset.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes replacing pointwise convolutions in DNNs (specifically MobileNet-V1) with fixed, parameter-free transforms such as the Discrete Walsh-Hadamard Transform (DWHT) and Discrete Cosine Transform (DCT). These transforms are claimed to capture cross-channel correlations, yielding networks with substantially lower parameter counts and FLOPs. On CIFAR-100 the DWHT variant is reported to improve accuracy by 1.49% while reducing parameters by 79.1% and FLOPs by 48.4% relative to the baseline.
Significance. If the empirical result is robust, the work offers a concrete, multiplication-free alternative to learned pointwise convolutions that also exploits the fast O(n log n) DWHT algorithm. This could be useful for resource-constrained settings. The manuscript does not supply machine-checked proofs or reproducible code, so the primary strength is the reported head-to-head measurement rather than a parameter-free derivation.
major comments (2)
- [Experimental Results] Experimental section: the 1.49% accuracy gain, 79.1% parameter reduction, and 48.4% FLOP reduction on CIFAR-100 are presented from a single run with no error bars, no multiple random seeds, and no ablation on insertion point or baseline re-tuning. This single comparison is load-bearing for the central claim yet lacks the controls needed to establish robustness.
- [Methods] Methods / architecture description: the paper does not specify how the fixed DWHT matrix is applied to feature maps of arbitrary channel count, whether any normalization or reshaping is required, or how the transform interacts with the existing depthwise layers. These details are necessary to reproduce the stated parameter and FLOP counts.
minor comments (2)
- [Abstract] Abstract: 'MoblieNet-V1' is a typographical error.
- [Abstract] The abstract states the transforms operate 'without accuracy performance degradation,' yet the reported result is a 1.49% improvement; the wording should be aligned with the actual finding.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below and will make the indicated revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Experimental Results] Experimental section: the 1.49% accuracy gain, 79.1% parameter reduction, and 48.4% FLOP reduction on CIFAR-100 are presented from a single run with no error bars, no multiple random seeds, and no ablation on insertion point or baseline re-tuning. This single comparison is load-bearing for the central claim yet lacks the controls needed to establish robustness.
Authors: We agree that reporting results from multiple runs with error bars and ablations would better establish robustness. In the revised manuscript we will add experiments averaged over several random seeds (with standard deviations), plus ablations on transform insertion points and any baseline re-tuning required. revision: yes
-
Referee: [Methods] Methods / architecture description: the paper does not specify how the fixed DWHT matrix is applied to feature maps of arbitrary channel count, whether any normalization or reshaping is required, or how the transform interacts with the existing depthwise layers. These details are necessary to reproduce the stated parameter and FLOP counts.
Authors: We will expand the methods section with a precise description of how the fixed DWHT (and DCT) matrices are applied to feature maps of any channel count. This will include the exact reshaping/padding procedure, any normalization steps, and the integration point relative to the depthwise layers, allowing exact reproduction of the reported parameter and FLOP counts. revision: yes
Circularity Check
No significant circularity
full rationale
The paper advances an empirical proposal to replace pointwise convolutions with fixed, parameter-free transforms such as DWHT, then directly measures the resulting accuracy, parameter count, and FLOP reductions against MobileNet-V1 on CIFAR-100. No derivation chain, equation, or uniqueness claim is shown to reduce by construction to a fitted input, self-citation, or renamed ansatz; the central performance numbers are external experimental outcomes rather than algebraic identities.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Conventional transforms such as DWHT and DCT can capture cross-channel correlations in feature maps without learnable parameters
Reference graph
Works this paper leans on
-
[2]
An Analysis of Deep Neural Network Models for Practical Applications
URL http://arxiv.org/ 9 Figure 5: Histogram of 3 × 3 depthwise convolution weights in the third block, out of last 3 blocks. DCT-3-H and DWHT-3-H models are based on ShuffleNet V2 1.1x model with (d) block. Baseline model is ShuffleNet V2 1.1x model. Figure 6: Ablation study of weight decay values (5e-4, 2e-3, 1e-2, 1e-1). We applied these weight decay valu...
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
URL http://arxiv. org/abs/1602.02830. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. CoRR, abs/1511.00363,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
URL http: //arxiv.org/abs/1511.00363. Saeed Dabbaghchian, Masoumeh P Ghaemmaghami, and Ali Aghagolzadeh. Feature extraction using discrete cosine transform and discrimination power analysis with a face recognition tech- nology. Pattern Recognition, 43(4):1431–1440,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Imagenet: A large-scale hi- erarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hi- erarchical image database. In 2009 IEEE conference on computer vision and pattern recognition , pp. 248–255. Ieee,
work page 2009
-
[6]
10 Arthita Ghosh and Rama Chellappa
doi: 10.1109/TIP.2014.2362652. 10 Arthita Ghosh and Rama Chellappa. Deep feature extraction in the dct domain. In 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3536–3541. IEEE,
-
[7]
Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015a. Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In Advances in neural information processing system...
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 770–778, June
work page 2016
-
[9]
doi: 10.1109/CVPR.2016.90. M. Horowitz. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) , pp. 10–14, Feb
-
[10]
doi: 10.1109/ISSCC.2014.6757323. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861,
-
[11]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
URL http://arxiv.org/abs/ 1704.04861. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. CoRR, abs/1609.07061,
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
URL http://arxiv.org/abs/1609.07061. Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167,
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
URL http://arxiv.org/ abs/1502.03167. Yunho Jeon and Junmo Kim. Active convolution: Learning the shape of convolution for image clas- sification. CoRR, abs/1703.09076,
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Active Convolution: Learning the Shape of Convolution for Image Classification
URL http://arxiv.org/abs/1703.09076. Yunho Jeon and Junmo Kim. Constructing fast network through deconstruction of convolution. CoRR, abs/1806.07370,
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
Constructing Fast Network through Deconstruction of Convolution
URL http://arxiv.org/abs/1806.07370. Felix Juefei-Xu, Vishnu Naresh Boddeti, and Marios Savvides. Local binary convolutional neural networks. CoRR, abs/1608.06049,
work page internal anchor Pith review Pith/arXiv arXiv
-
[16]
Local Binary Convolutional Neural Networks
URL http://arxiv.org/abs/1608.06049. Chi-Wah Kok. Fast algorithm for computing discrete cosine transform.IEEE Transactions on Signal Processing, 45(3):757–760,
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
URL http://arxiv. org/abs/1807.11164. Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) , pp. 807–814,
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
ISSN 0018-9219. doi: 10.1109/PROC.1969.6869. K Ramamohan Rao and Ping Yip. Discrete cosine transform: algorithms, advantages, applications . Academic press,
-
[20]
Regularized Evolution for Image Classifier Architecture Search
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized evolution for image classifier architecture search. arXiv preprint arXiv:1802.01548,
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
MobileNetV2: Inverted Residuals and Linear Bottlenecks
URL http://arxiv.org/abs/1801.04381. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556,
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR, abs/1602.07261, 2016a. URL http:// arxiv.org/abs/1602.07261. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethink- ing the inception architecture for computer vision. In Pro...
work page internal anchor Pith review Pith/arXiv arXiv
-
[25]
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
URL http: //arxiv.org/abs/1707.01083. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710,
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.