Expressive yet Efficient Feature Expansion with Adaptive Cross-Hadamard Products
Pith reviewed 2026-05-19 13:27 UTC · model grok-4.3
The pith
The Adaptive Cross-Hadamard module expands image features expressively without adding convolutional parameters by using differentiable sampling and softsign normalization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents the Adaptive Cross-Hadamard (ACH) module as a novel operator that embeds learnability through differentiable discrete sampling and dynamic softsign normalization. This facilitates highly efficient feature reuse without incurring additional convolutional parameters, while ensuring stable gradient flow. Integrated into Hadaptive-Net via neural architecture search, the approach achieves state-of-the-art accuracy/speed trade-offs on image classification tasks, establishing Hadamard operations as specific building blocks for efficient vision models.
What carries the argument
The Adaptive Cross-Hadamard (ACH) module, which embeds learnability into Hadamard products via differentiable discrete sampling of cross terms and dynamic softsign normalization to enable parameter-free feature expansion.
If this is right
- Hadamard operations become viable as specific building blocks for efficient vision models.
- Feature expansion can occur with high expressivity while avoiding extra convolutional parameters.
- Neural architecture search can automatically place such adaptive modules for optimal efficiency.
- Stable gradient flow supports reliable training of networks that incorporate these operators.
Where Pith is reading between the lines
- The approach could extend to other vision tasks such as object detection or segmentation where parameter efficiency matters.
- Similar differentiable sampling and normalization tricks might improve other product-based operators in neural networks.
- Automated discovery of integration points via NAS could lower the manual effort needed to design efficient architectures.
- Gains might increase further when the module is combined with orthogonal efficiency methods like pruning or quantization.
Load-bearing premise
That the ACH module's differentiable discrete sampling and dynamic softsign normalization can be integrated into standard CNN architectures via NAS without adding convolutional parameters, while delivering measurable efficiency gains and stable training that exceed conventional feature expansion methods.
What would settle it
An experiment that places a conventional feature expansion method into the identical NAS-searched architecture and shows equal or superior accuracy-speed trade-offs on image classification benchmarks would falsify the superiority claim for ACH.
Figures
read the original abstract
Recent theoretical advances reveal that the Hadamard product induces nonlinear representations and implicit high-dimensional mappings for the field of deep learning, yet their practical deployment in resource-constrained vision models remains largely unexplored. To address this gap, we introduce the Adaptive Cross-Hadamard (ACH) module, a novel operator that embeds learnability through differentiable discrete sampling and dynamic softsign normalization. This facilitates highly efficient feature reuse without incurring additional convolutional parameters, while ensuring stable gradient flow. Integrated into Hadaptive-Net (Hadamard Adaptive Network) via neural architecture search, our approach achieves unprecedented efficiency. Comprehensive experiments demonstrate state-of-the-art accuracy/speed trade-offs on image classification tasks, establishing Hadamard operations as specific building blocks for efficient vision models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Adaptive Cross-Hadamard (ACH) module, which embeds learnability via differentiable discrete sampling and dynamic softsign normalization to enable efficient feature reuse and expansion in CNNs without additional convolutional parameters while maintaining stable gradient flow. The module is incorporated into Hadaptive-Net through neural architecture search, with claims of state-of-the-art accuracy/speed trade-offs on image classification tasks that position Hadamard operations as effective building blocks for efficient vision models.
Significance. If the zero-additional-convolutional-parameter claim and the resulting efficiency gains are rigorously verified with quantitative baselines, this could offer a practical operator for resource-constrained vision models, extending theoretical insights on Hadamard products into deployable architectures. The NAS integration and emphasis on stable training represent potential strengths if supported by reproducible experiments.
major comments (2)
- [Abstract] Abstract: The central efficiency claim that the ACH module enables 'highly efficient feature reuse without incurring additional convolutional parameters' is load-bearing for the SOTA accuracy/speed results but is not accompanied by any parameter-count breakdown, comparison to standard expansion baselines (e.g., 1x1 convolutions or depthwise separable layers), or explicit accounting for the learnable sampling parameters and softsign normalization weights. This omission prevents verification that the overhead is truly negligible or zero.
- [Abstract] Abstract: The assertion of 'state-of-the-art accuracy/speed trade-offs' and 'unprecedented efficiency' lacks any reported quantitative metrics, error bars, dataset names, or ablation studies on the contribution of the differentiable discrete sampling versus conventional feature expansion methods. Without these, the empirical support for the Hadaptive-Net results cannot be assessed.
minor comments (2)
- [Abstract] The abstract would benefit from naming the specific image classification datasets and baseline models used in the comprehensive experiments.
- Clarify the exact implementation of 'differentiable discrete sampling' to ensure it does not implicitly rely on additional convolutional layers for the sampling logits.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed comments. We address each major comment below and will revise the manuscript to provide greater clarity and empirical support for the claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central efficiency claim that the ACH module enables 'highly efficient feature reuse without incurring additional convolutional parameters' is load-bearing for the SOTA accuracy/speed results but is not accompanied by any parameter-count breakdown, comparison to standard expansion baselines (e.g., 1x1 convolutions or depthwise separable layers), or explicit accounting for the learnable sampling parameters and softsign normalization weights. This omission prevents verification that the overhead is truly negligible or zero.
Authors: We agree that an explicit parameter breakdown would strengthen the presentation. The ACH module achieves feature expansion exclusively through Hadamard products and does not introduce any new convolutional layers or convolutional kernel weights, which is the precise meaning of the 'additional convolutional parameters' claim. The differentiable discrete sampling operates on a small set of per-channel selection parameters, and the dynamic softsign normalization uses lightweight per-channel scaling factors; neither component adds convolutional parameters. In the revised manuscript we will insert a dedicated table (in the methods or experiments section) that reports total parameter counts and FLOPs for Hadaptive-Net versus standard expansion baselines such as 1x1 convolutions and depthwise-separable blocks, together with an itemized accounting of the non-convolutional learnable parameters introduced by ACH. revision: yes
-
Referee: [Abstract] Abstract: The assertion of 'state-of-the-art accuracy/speed trade-offs' and 'unprecedented efficiency' lacks any reported quantitative metrics, error bars, dataset names, or ablation studies on the contribution of the differentiable discrete sampling versus conventional feature expansion methods. Without these, the empirical support for the Hadaptive-Net results cannot be assessed.
Authors: The abstract is intentionally concise, but the full manuscript already contains the requested information: quantitative accuracy and latency results on CIFAR-10/100 and ImageNet, multiple-run error bars in the main tables, and ablations isolating the differentiable sampling and softsign normalization. To address the referee's concern directly, we will augment the abstract with the key headline numbers (e.g., top-1 accuracy and throughput on ImageNet) and will add a short sentence referencing the ablation study that quantifies the contribution of the differentiable discrete sampling relative to conventional expansion operators. revision: yes
Circularity Check
No significant circularity in ACH module or Hadaptive-Net derivation
full rationale
The paper presents the Adaptive Cross-Hadamard (ACH) module as an independent operator whose learnability is introduced via differentiable discrete sampling and dynamic softsign normalization, enabling feature reuse without additional convolutional parameters. This construction is then integrated into Hadaptive-Net through neural architecture search, with performance claims resting on empirical evaluation across image classification benchmarks rather than any self-referential equations or fitted quantities. No load-bearing step reduces by construction to the paper's own inputs, no self-citation chain justifies a uniqueness theorem, and no ansatz is smuggled through prior work. The derivation chain is self-contained against external benchmarks and experimental results.
Axiom & Free-Parameter Ledger
free parameters (1)
- learnable sampling parameters
axioms (1)
- domain assumption Hadamard product induces nonlinear representations and implicit high-dimensional mappings
Reference graph
Works this paper leans on
-
[2]
URLhttp://arxiv.org/abs/1607.06450. Yoshua Bengio. Estimating or propagating gradients through stochastic neurons.arXiv preprint arXiv:1305.2982,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
10 Published as a conference paper at ICLR 2026 Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, and S.-H. Gary Chan. Run, don’t walk: Chasing higher flops for faster neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12021–12031, June
work page 2026
-
[4]
URL https://doi.org/10.1109/TPAMI.2025.3560423
doi: 10.1109/TPAMI.2025.3560423. URL https://doi.org/10.1109/TPAMI.2025.3560423. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hi- erarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee,
-
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova
Version: 1.20.1. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio (eds.),Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tec...
work page 2019
-
[6]
BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
doi: 10.18653/V1/N19-1423. URL https://doi.org/10.18653/v1/n19-1423. Xiaohan Ding, X. Zhang, Ningning Ma, Jungong Han, Guiguang Ding, and Jian Sun. Repvgg: Mak- ing vgg-style convnets great again.2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13728–13737,
-
[7]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929,
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[8]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.CoRR, abs/2312.00752,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
doi: 10.48550/ARXIV .2312.00752. URLhttps://doi.org/10. 48550/arXiv.2312.00752. Emil Julius Gumbel. Statistical theory of extreme valuse and some practical applications.Nat. Bur. Standards Appl. Math. Ser. 33,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv
-
[10]
11 Published as a conference paper at ICLR 2026 Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. InPro- ceedings of the IEEE/CVF international conference on computer vision, pp. 1314–1324,
work page 2026
-
[11]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications.arXiv preprint arXiv:1704.04861,
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡1mb model size.ArXiv, abs/1602.07360,
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Hadamard product for low-rank bilinear pooling
Jin-Hwa Kim, Kyoung Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, and Byoung- Tak Zhang. Hadamard product for low-rank bilinear pooling. In5th International Confer- ence on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net,
work page 2017
-
[14]
Siyuan Li, Zedong Wang, Zicheng Liu, Cheng Tan, Haitao Lin, Di Wu, Zhiyuan Chen, Jiangbin Zheng, and Stan Z. Li. Moganet: Multi-order gated aggregation network. InThe Twelfth Inter- national Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11,
work page 2024
-
[16]
Microsoft COCO: Common Objects in Context
URLhttp://arxiv.org/abs/ 1405.0312. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. SSD: Single Shot MultiBox Detector.arXiv e-prints, art. arXiv:1512.02325, December
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
SSD: Single Shot MultiBox Detector
doi: 10.48550/arXiv.1512.02325. Zhenhua Liu, Zhiwei Hao, Kai Han, Yehui Tang, and Yunhe Wang. Ghostnetv3: Exploring the training strategies for compact models.ArXiv, abs/2404.11202,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1512.02325
-
[18]
Mobilevit: Light-weight, general-purpose, and mobile- friendly vision transformer
12 Published as a conference paper at ICLR 2026 Sachin Mehta and Mohammad Rastegari. Mobilevit: Light-weight, general-purpose, and mobile- friendly vision transformer. InThe Tenth International Conference on Learning Representa- tions, ICLR 2022, Virtual Event, April 25-29,
work page 2026
-
[19]
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y . Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. InProceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washingto...
work page 2013
-
[20]
URLhttps://doi.org/10.18653/v1/d13-1170
doi: 10.18653/V1/D13-1170. URLhttps://doi.org/10.18653/v1/d13-1170. Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural net- works. InInternational conference on machine learning, pp. 6105–6114. PMLR,
-
[21]
Ghostnetv2: Enhance cheap operation with long-range attention.ArXiv, abs/2211.12905,
Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Chaoting Xu, and Yunhe Wang. Ghostnetv2: Enhance cheap operation with long-range attention.ArXiv, abs/2211.12905,
-
[22]
Pavan Kumar Anasosalu Vasu, James Gregory Gabriel, Jeff J
URLhttps: //api.semanticscholar.org/CorpusID:253801665. Pavan Kumar Anasosalu Vasu, James Gregory Gabriel, Jeff J. Zhu, Oncel Tuzel, and Anurag Ranjan. Mobileone: An improved one millisecond mobile backbone.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7907–7917,
work page 2023
-
[23]
Adina Williams, Nikita Nangia, and Samuel R. Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Marilyn A. Walker, Heng Ji, and Amanda Stent (eds.),Proceedings of the 2018 Conference of the North American Chapter of the As- sociation for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orl...
work page 2018
-
[24]
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
doi: 10.18653/V1/N18-1101. URLhttps: //doi.org/10.18653/v1/n18-1101. 13 Published as a conference paper at ICLR 2026 Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Saining Xie. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF conference on computer vision and pa...
-
[25]
Vision mamba: Efficient visual representation learning with bidirectional state space model
Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model. InForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,
work page 2024
-
[26]
URLhttps://openreview.net/forum?id=YbHCqn4qF4. A APPENDIX A.1 TRAININGMECHANISM τAdjustment: We implement distinct temperature control mechanisms for ACH modules versus NAS due to fundamental differences in their training paradigms. For ACH modules distributed across network layers, which process heterogeneous features and semantics, we deliberately desig...
work page 2026
-
[27]
and LayerNorm (Ba et al., 2016), have a priori assumption that the statistical mean and statistical variance of the tensors they receive are knowable and traceable, which constitutes the basis of model convergence. In the process of ACH training and reasoning, we will involve a standardZi ⊙Z j cross Hadamard product calculation. In previous machine learni...
work page 2016
-
[28]
If the self referring Hadamard product is deformed, for exampleϕ 1(Z)⊙ϕ 2(Z), Letϕhere be a linear transformation operator, the corresponding matrix form isX 1, X2 (X∈R m×n), bias vectors areb 1, b2 (b∈R m), then: E[ϕ(Z)] =E[XZ+b] =µ· Pm i Pn j Xi,j m +E[b] For variance, sinceZcan approximate normal distribution, here we assume that its elements are i.i.d...
work page 2026
-
[29]
To implement Ghost and ACH module with adaptability, we design the Adaptive Bottleneck that can decide the expansion layer of the bottleneck manually. The net- 17 Published as a conference paper at ICLR 2026 Table 10:Neural Architecture Search Result (a).Compared with different kernel sizes. Reaching 67.55% top1-acc as result. Channels Ghost Conf. ACH Con...
work page 2026
-
[30]
All tests used ONNX Runtime 1.16.0 with default execution providers. Object Detection - Training Protocol: The base learning rate of 0.02 corresponds to a batch size of 64 distributed across 5 GPUs, scaled linearly according to the batch size. We apply 3-epoch linear warmup and reduce the learning rate to 1e-5 via cosine scheduling. Data augmentation incl...
work page 2026
-
[31]
To systematically evaluate these methods under varying tensor configurations (batch/channel di- mensions versus spatial sizes), we conducted comparative experiments using square matrices (same sized height & width). See fig. 7 for the experiment details and results. Both algorithms demonstrate relatively stable performance across varying batch sizes, indi...
work page 2026
-
[32]
and 10% MNLI datasets (Williams et al., 2018). The models were evaluated on two standard natural language understanding benchmarks: the Stan- ford Sentiment Treebank (SST-2) for binary sentiment classification and the Multi-Genre Natural Language Inference (MNLI) dataset for textual entailment. For SST-2, the model was trained and evaluated on the full da...
work page 2018
-
[33]
The optimization used a learning rate of 2e-5 with a linear warmup over the first 10% of the training steps and weight decay of 0.01. The models, which followed a BERT-base architecture (12 layers, 12 attention heads, 768-dimensional hidden states), were initialized with random weights. Input sequences were tokenized using the ‘bert-base-uncased‘ tokenize...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.