MixerCA: An Efficient and Accurate Model for High-Performance Hyperspectral Image Classification
Pith reviewed 2026-05-07 16:45 UTC · model grok-4.3
The pith
MixerCA integrates depthwise convolutions, token and channel mixing, and coordinate attention to outperform CNNs and transformers on hyperspectral image classification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MixerCA integrates depth-wise convolutions, token and channel mixing, and coordinate attention into a unified structure to decouple spatial and channel interactions, maintain consistent resolution throughout the network, and directly process HSI patches. Extensive experiments on four hyperspectral benchmark datasets reveal MixerCA's clear advantages over several competing algorithms, including 2D-CNN, 3D-CNN, Tri-CNN, HybridSN, ViT, and Swin Transformer.
What carries the argument
The MixerCA architecture that unifies depthwise convolutions with token and channel mixing plus coordinate attention to process hyperspectral patches while decoupling spatial and spectral dimensions.
Load-bearing premise
That the accuracy gains come mainly from the architectural combination of depthwise convolution, mixing, and coordinate attention rather than from training procedures or dataset-specific choices.
What would settle it
Train MixerCA and all baseline models with identical hyperparameters, data augmentation, and optimization settings on the same four datasets and check if the accuracy advantage remains.
Figures
read the original abstract
Over the past decade, hyperspectral image (HSI) classification has drawn considerable interest due to HSIs' ability to effectively distinguish terrestrial objects by capturing detailed, continuous spectral information. The strong performance of recent deep learning techniques in tasks like image classification and semantic segmentation has led to their growing use in HSI classification, due to their ability to capture complex spatial and spectral features more effectively than traditional methods. This paper presents MixerCA, a novel lightweight model for HSI classification that leverages depthwise convolution and a self-attention mechanism. MixerCA integrates depth-wise convolutions, token and channel mixing, and coordinate attention into a unified structure to decouple spatial and channel interactions, maintain consistent resolution throughout the network, and directly process HSI patches. Extensive experiments on four hyperspectral benchmark datasets reveal MixerCA's clear advantages over several competing algorithms, including 2D-CNN, 3D-CNN, Tri-CNN, HybridSN, ViT, and Swin Transformer. The source code is publicly available at https://github.com/mqalkhatib/MixerCA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MixerCA, a lightweight model for hyperspectral image (HSI) classification that integrates depthwise convolutions, token and channel mixing, and coordinate attention into a unified structure. The design aims to decouple spatial and channel interactions while maintaining consistent resolution and directly processing HSI patches. Extensive experiments on four benchmark datasets are claimed to show clear advantages in performance and efficiency over baselines including 2D-CNN, 3D-CNN, Tri-CNN, HybridSN, ViT, and Swin Transformer, with source code released publicly.
Significance. If the empirical advantages are shown to arise from the architectural decoupling rather than uncontrolled training differences, MixerCA could provide a useful efficient option for HSI classification in remote sensing. The public code release supports reproducibility and is a clear strength. However, the current presentation leaves the attribution of gains uncertain, limiting immediate impact.
major comments (2)
- [Experiments] Experiments section: The central claim attributes performance gains to the unified MixerCA structure (depthwise convolution + token/channel mixing + coordinate attention), yet no details are given on whether baselines were reproduced under identical conditions (optimizer, learning-rate schedule, patch size, data augmentation, early stopping, or random seeds). This is load-bearing for the attribution because margins could arise from hyperparameter tuning or implementation differences rather than the proposed decoupling of interactions.
- [§4.2] §4.2 or Ablation subsection: No ablation studies isolate the individual contributions of depthwise convolutions, mixing modules, and coordinate attention. Without these, it is impossible to verify that the full unified construction is required for the claimed accuracy-efficiency trade-off on the four benchmarks.
minor comments (2)
- [Abstract] Abstract: The claim of 'clear advantages' is stated without any quantitative metrics (e.g., overall accuracy, kappa, or FLOPs) or dataset names; adding one or two key numbers would improve transparency.
- [Notation] Notation and figures: Ensure consistent use of symbols for spectral bands and spatial dimensions across equations and diagrams; check that all baseline references include full citations with years and venues.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help improve the clarity and rigor of our work. We address each major comment below and will revise the manuscript accordingly to strengthen the experimental details and analyses.
read point-by-point responses
-
Referee: [Experiments] Experiments section: The central claim attributes performance gains to the unified MixerCA structure (depthwise convolution + token/channel mixing + coordinate attention), yet no details are given on whether baselines were reproduced under identical conditions (optimizer, learning-rate schedule, patch size, data augmentation, early stopping, or random seeds). This is load-bearing for the attribution because margins could arise from hyperparameter tuning or implementation differences rather than the proposed decoupling of interactions.
Authors: We agree that explicit details on training protocols are necessary to support attribution of gains to the architecture. All baselines were re-implemented and trained using the same patch sizes, data splits, optimizer (Adam with identical learning-rate schedule), batch sizes, and early-stopping criteria as described in their original papers, with the same random seeds for reproducibility. The public code release already encodes these settings. To eliminate any ambiguity, we will add a new subsection (e.g., §4.1.1) that tabulates all hyperparameters, augmentation strategies, and seed values used for MixerCA and every baseline across the four datasets. revision: yes
-
Referee: [§4.2] §4.2 or Ablation subsection: No ablation studies isolate the individual contributions of depthwise convolutions, mixing modules, and coordinate attention. Without these, it is impossible to verify that the full unified construction is required for the claimed accuracy-efficiency trade-off on the four benchmarks.
Authors: We recognize that component-wise ablations would provide stronger evidence for the necessity of the unified design. While the current manuscript emphasizes end-to-end performance, we will add an ablation subsection (replacing or expanding §4.2) that reports results on all four benchmarks when each module is individually removed or replaced (depthwise convolution with standard convolution, mixing modules with MLP-only, coordinate attention with standard channel attention). These experiments will quantify the accuracy-efficiency impact of each element and their interactions. revision: yes
Circularity Check
No circularity; empirical model proposal with benchmark comparisons
full rationale
The paper proposes MixerCA, a lightweight architecture integrating depthwise convolutions, token/channel mixing, and coordinate attention for hyperspectral image classification. Its claims rest entirely on empirical performance comparisons against published baselines (2D-CNN, 3D-CNN, HybridSN, ViT, Swin Transformer) across four standard datasets. No mathematical derivations, self-referential equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The architecture is presented as a design choice validated by experiments rather than derived from prior results by the same authors or reduced to inputs by construction. This is a standard empirical model paper whose central assertions remain independently testable via reproduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Convolutional and attention layers can extract discriminative spatial-spectral features from small HSI patches when trained with standard supervised losses.
Reference graph
Works this paper leans on
-
[1]
M. B. Stuart, A. J. McGonigle, J. R. Willmott, Hyperspectral imaging in environmental monitoring: A review of recent developments and technological advances in compact field deployable systems, Sensors 19 (2019)
2019
-
[2]
M. B. Stuart, M. Davies, M. J. Hobbs, T. D. Pering, A. J. McGonigle, J. R. Willmott, High-resolution hyperspectral imaging using low-cost components: Application within environmental monitoring scenarios, Sensors 22 (2022)
2022
-
[3]
J. G. A. Barbedo, A review on the combination of deep learning techniques with proximal hyperspectral images in agriculture, Computers and Electronics in Agriculture 210 (2023) 107920. B.Lu,P.D.Dao,J.Liu,Y.He,J.Shang, Recentadvancesofhyperspectralimagingtechnologyandapplicationsinagriculture, RemoteSensing12 (2020)
2023
-
[4]
Hajaj, A
S. Hajaj, A. El Harti, A. B. Pour, A. Jellouli, Z. Adiri, M. Hashim, A review on hyperspectral imagery application for lithological mapping and mineralprospecting:Machinelearningtechniquesandfutureprospects, RemoteSensingApplications:SocietyandEnvironment(2024)101218. N. Okada, B. Bino Sinaice, J. Kim, H. Nozaki, K. Takizawa, N. Owada, Y. Ohtomo, Y. Kawam...
2024
-
[5]
IGARSS’05., volume 1, IEEE, 2005, pp. 4–pp. S. Wang, A. Dou, X. Yuan, X. Zhang, The airborne hyperspectral image classification based on the random forest algorithm, in: 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), IEEE, 2016, pp. 2280–2283. K. Makantasis, K. Karantzalos, A. Doulamis, N. Doulamis, Deep supervised learning for ...
2005
-
[6]
R.Vaddi,P.Manoharan, Hyperspectralimageclassificationusingcnnwithspectralandspatialfeaturesintegration, InfraredPhysics&Technology 107 (2020) 103296. S. K. Roy, G. Krishna, S. R. Dubey, B. B. Chaudhuri, Hybridsn: Exploring 3-d–2-d cnn feature hierarchy for hyperspectral image classification, IEEE Geoscience and Remote Sensing Letters 17 (2019) 277–281. L....
2020
-
[7]
X.Yang,X.Zhang,Y.Ye,R.Y.Lau,S.Lu,X.Li,X.Huang, Synergistic2d/3dconvolutionalneuralnetworkforhyperspectralimageclassification, Remote Sensing 12 (2020)
H.Zhong,L.Li,J.Ren,W.Wu,R.Wang,Hyperspectralimageclassificationviaparallelmulti-inputmechanism-basedconvolutionalneuralnetwork, Multimedia Tools and Applications (2022) 1–26. X.Yang,X.Zhang,Y.Ye,R.Y.Lau,S.Lu,X.Li,X.Huang, Synergistic2d/3dconvolutionalneuralnetworkforhyperspectralimageclassification, Remote Sensing 12 (2020)
2022
-
[8]
H. Gao, Y. Yang, C. Li, L. Gao, B. Zhang, Multiscale residual network with mixed depthwise convolution for hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing 59 (2020) 3396–3408. Z.Ye,C.Li,Q.Liu,L.Bai,J.E.Fowler, Computationallylightweighthyperspectralimageclassificationusingamultiscaledepthwiseconvolutional network wit...
2020
-
[9]
B.Cui,X.-M.Dong,Q.Zhan,J.Peng,W.Sun, Litedepthwisenet:Alightweightnetworkforhyperspectralimageclassification, IEEETransactions on Geoscience and Remote Sensing 60 (2021) 1–15. X. T. Nguyen, G. S. Tran, Hyperspectral image classification using an encoder-decoder model with depthwise separable convolution, squeeze and excitation blocks, Earth Science Inform...
2021
-
[10]
Z. Xue, X. Yu, B. Liu, X. Tan, X. Wei, Hresnetam: Hierarchical residual network with attention mechanism for hyperspectral image classification, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14 (2021) 3566–3580. C. Shi, D. Liao, T. Zhang, L. Wang, Hyperspectral image classification based on 3d coordination attention mech...
2021
-
[11]
J.Wang,J.Sun,E.Zhang,T.Zhang,K.Yu,J.Peng, Hyperspectralimageclassificationviadeepnetworkwithattentionmechanismandmultigroup strategy, Expert Systems with Applications 224 (2023) 119904. M.Q. Alkhatib, A. Jamali:Preprint submitted to ElsevierPage 16 of 17 MixerCA: An Efficient Model for HSI Classification W. Liao, F. Wang, H. Zhao, Hyperspectral image clas...
2023
-
[12]
URL:https://arxiv.org/abs/1704.04861.arXiv:1704.04861. S. Woo, J. Park, J.-Y. Lee, I. S. Kweon, Cbam: Convolutional block attention module, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19. J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2018) 2...
-
[13]
D. Hong, J. Hu, J. Yao, J. Chanussot, X. X. Zhu, Multimodal remote sensing benchmark datasets for land cover classification with a shared and specific feature learning model, ISPRS Journal of Photogrammetry and Remote Sensing 178 (2021) 68–80. Y. Zhong, X. Hu, C. Luo, X. Wang, J. Zhao, L. Zhang, Whu-hi: Uav-borne hyperspectral with high spatial resolution...
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.