pith. sign in

arxiv: 2506.18682 · v2 · submitted 2025-06-23 · 💻 cs.CV · cs.AI

Multi-Scale Spectral Attention Module-based Hyperspectral Segmentation in Autonomous Driving Scenarios

Pith reviewed 2026-05-19 08:13 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords hyperspectral imagingsemantic segmentationautonomous drivingmulti-scale attentionUNet skip connectionsurban driving scenariosspectral feature extraction
0
0 comments X

The pith

Integrating a multi-scale spectral attention module into UNet skip connections raises hyperspectral segmentation accuracy by 2.32 percent mIoU on urban driving data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests a Multi-Scale Spectral Attention Module that runs three parallel 1D convolutions with different kernel sizes to pull out spectral features from high-dimensional hyperspectral images. Placing this module in the skip connections of a standard UNet produces better semantic segmentation results across several urban driving datasets. The gains hold while inference times stay competitive with other attention designs. This approach addresses the difficulty of processing rich spectral data that could help vehicles handle poor lighting or weather. Ablation tests show that the best kernel sets depend on the particular dataset.

Core claim

By integrating the Multi-Scale Attention Mechanism (MSAM) into UNet's skip connections, the method achieves average improvements of 2.32% in mean Intersection over Union (mIoU) and 2.88% in mean F1 score over the baseline UNet-SC across multiple hyperspectral imaging datasets for urban driving scenarios, while maintaining competitive GPU performance.

What carries the argument

The Multi-Scale Spectral Attention Module (MSAM) that applies three parallel 1D convolutions with varying kernel sizes and performs adaptive feature aggregation to capture multi-scale spectral information.

If this is right

  • Kernel combinations such as (1;5;11) and (3;7;11) perform strongly but vary with the dataset.
  • MSAM keeps GPU runtime competitive with other established attention mechanisms.
  • The module improves spectral feature extraction for perception in challenging lighting and weather.
  • The work provides a starting point for adaptive multi-scale spectral processing in automotive systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Kernel selection could be made dynamic during driving to match changing scene types.
  • The same multi-scale spectral idea might transfer to other dense prediction tasks such as depth estimation from HSI.
  • Pairing MSAM with temporal fusion across video frames could reduce frame-to-frame label flicker.
  • Running the model on embedded automotive hardware would test whether the accuracy gains survive real-time constraints.

Load-bearing premise

The measured gains come from the MSAM design itself and generalize beyond the specific datasets and urban driving conditions tested rather than arising from dataset-specific tuning.

What would settle it

Testing the exact MSAM-UNet model on a new hyperspectral dataset recorded in a different city or with a different sensor and measuring whether the mIoU gain stays near 2.3 percent without changing the kernel sizes.

Figures

Figures reproduced from arXiv: 2506.18682 by Brian Deegan, Edward Jones, Enda Ward, Imad Ali Shah, Jiarong Li, Martin Glavin, Tim Brophy.

Figure 1
Figure 1. Figure 1: FIGURE 1 [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIGURE 2 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIGURE 3 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIGURE 4 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIGURE 6 [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIGURE 7 [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: FIGURE 9 [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
read the original abstract

Recent advances in autonomous driving (AD) have highlighted the potential of hyperspectral imaging (HSI) for enhanced environmental perception, particularly in challenging weather and lighting conditions. However, efficiently processing high-dimensional spectral data remains a significant challenge. This paper presents an empirical investigation of a Multi-Scale Attention Mechanism (MSAM) for enhanced spectral feature extraction through three parallel 1D convolutions with varying kernel sizes (1-11) and adaptive feature aggregation. By integrating MSAM into UNet's skip connections, we evaluate performance improvements in semantic segmentation across multiple HSI datasets for urban driving scenarios. Comprehensive ablation studies demonstrate that MSAM consistently outperforms baseline UNet-SC, achieving average improvements of 2.32% in mIoU and 2.88% in mF1, while maintaining competitive GPU performance against established attention mechanisms. Our findings reveal that optimal kernel combinations are dataset-specific, with configurations such as (1;5;11) and (3;7;11) demonstrating particularly strong performance. This empirical investigation advances understanding of HSI processing capabilities for AD applications and establishes a foundation for adaptive multi-scale spectral feature extraction in automotive deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript presents an empirical investigation of a Multi-Scale Spectral Attention Module (MSAM) for hyperspectral semantic segmentation in autonomous driving. MSAM applies three parallel 1D convolutions with varying kernel sizes (e.g., combinations such as (1;5;11) and (3;7;11)) followed by adaptive feature aggregation, and is inserted into the skip connections of a UNet architecture (UNet-SC). Across multiple HSI datasets for urban driving scenarios, the authors report that MSAM yields average gains of 2.32% mIoU and 2.88% mF1 over the baseline UNet-SC while remaining competitive in GPU runtime; ablation studies are provided to support the module design.

Significance. If the performance margins can be shown to arise from a single fixed MSAM configuration rather than dataset-specific kernel retuning, the work would offer a practical, lightweight attention mechanism for high-dimensional spectral data in AD perception pipelines. The empirical focus with ablation results provides a useful baseline for future multi-scale spectral processing research.

major comments (1)
  1. [Abstract and §4 (Experimental Results)] Abstract and §4 (Experimental Results): the central claim of consistent average improvements (2.32% mIoU / 2.88% mF1) across datasets is presented alongside the statement that optimal kernel combinations are dataset-specific. If the reported averages reflect selection of the best kernel triple per dataset rather than a single fixed MSAM configuration evaluated on every dataset, the generalization argument for the module itself is not yet supported. The manuscript should either (a) report results for one fixed kernel triple (e.g., (3;7;11)) on all datasets without retuning or (b) explicitly state that the averages are best-per-dataset and qualify the generalization claim accordingly.
minor comments (2)
  1. [§3 (Method)] §3 (Method): the precise formulation of the adaptive aggregation step after the parallel convolutions is described only at a high level; adding an equation or short pseudocode would improve reproducibility.
  2. [Tables in §4] Tables in §4: inclusion of standard deviations across multiple random seeds or cross-validation folds would strengthen the statistical interpretation of the reported mIoU/mF1 deltas.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point-by-point below and outline the revisions we will make to clarify our results and strengthen the generalization claims.

read point-by-point responses
  1. Referee: [Abstract and §4 (Experimental Results)] Abstract and §4 (Experimental Results): the central claim of consistent average improvements (2.32% mIoU / 2.88% mF1) across datasets is presented alongside the statement that optimal kernel combinations are dataset-specific. If the reported averages reflect selection of the best kernel triple per dataset rather than a single fixed MSAM configuration evaluated on every dataset, the generalization argument for the module itself is not yet supported. The manuscript should either (a) report results for one fixed kernel triple (e.g., (3;7;11)) on all datasets without retuning or (b) explicitly state that the averages are best-per-dataset and qualify the generalization claim accordingly.

    Authors: We agree that the current presentation creates ambiguity. The reported average gains of 2.32% mIoU and 2.88% mF1 are computed from the best kernel triple selected independently for each dataset, as already noted in the abstract and §4 where we state that optimal combinations are dataset-specific. This reflects the module's practical adaptability to varying spectral properties across urban driving HSI datasets. To resolve the concern, we will revise the abstract, §4, and conclusions to explicitly qualify that the primary averages use per-dataset optimal kernels. In addition, we will add new results in §4 showing performance for one fixed kernel triple (e.g., (3;7;11)) evaluated uniformly across all datasets without retuning. These revisions will be incorporated in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results from ablation studies

full rationale

The paper is an empirical study that integrates a proposed Multi-Scale Spectral Attention Module (MSAM) into UNet skip connections and reports measured mIoU/mF1 gains from ablation experiments on multiple HSI datasets. No derivation chain, equations, or first-principles predictions are claimed; performance numbers are obtained directly from training and evaluation rather than by fitting a parameter and relabeling it as a prediction. Kernel-size choices are explicitly noted as dataset-specific, but this is an experimental observation, not a self-definitional loop or fitted-input prediction. The work is self-contained against external benchmarks with no load-bearing self-citations or uniqueness theorems invoked.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard UNet architecture assumptions, the representativeness of the HSI datasets for real AD conditions, and the validity of mIoU/mF1 as primary metrics; the module itself is introduced without external validation beyond the reported experiments.

free parameters (1)
  • kernel size combinations
    Dataset-specific choices such as (1;5;11) and (3;7;11) selected via ablation studies to optimize performance.
axioms (1)
  • domain assumption UNet with skip connections is an appropriate base architecture for hyperspectral semantic segmentation
    Invoked by the choice to insert MSAM into skip connections without further justification in the abstract.
invented entities (1)
  • Multi-Scale Spectral Attention Module (MSAM) no independent evidence
    purpose: To perform adaptive multi-scale spectral feature extraction via parallel 1D convolutions
    New module proposed in the paper; no independent evidence such as theoretical guarantees or external benchmarks provided beyond the empirical results.

pith-pipeline@v0.9.0 · 5750 in / 1288 out tokens · 33043 ms · 2026-05-19T08:13:25.813210+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CSNR and JMIM Based Spectral Band Selection for Reducing Metamerism in Urban Driving

    cs.CV 2025-08 unverdicted novelty 4.0

    The work identifies bands at 497 nm, 607 nm, and 895 nm that deliver large gains in material dissimilarity and perceptual separability on the H-City dataset compared with RGB.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Hsi-drive: A dataset for the research of hyperspectral image processing applied to autonomous driving systems,

    K. Basterretxea, V . Martínez, J. Echanobe, J. Gutiérrez-Zaballa, and I. Del Campo, “Hsi-drive: A dataset for the research of hyperspectral image processing applied to autonomous driving systems,” in 2021 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2021, pp. 866–873

  2. [2]

    Hsi-drive v2. 0: More data for new chal- lenges in scene understanding for autonomous driving,

    J. Gutiérrez-Zaballa, K. Basterretxea, J. Echanobe, M. V . Martínez, and U. Martinez-Corral, “Hsi-drive v2. 0: More data for new chal- lenges in scene understanding for autonomous driving,” in 2023 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2023, pp. 207–214

  3. [3]

    Urban scene understanding via hyperspectral images: Dataset and benchmark,

    Q. Shen, Y . Huang, T. Ren, Y . Fu, and S. You, “Urban scene understanding via hyperspectral images: Dataset and benchmark,” Available at SSRN 4560035

  4. [4]

    Most relevant spectral bands identification for brain cancer detection using hyperspectral imaging,

    B. Martinez, R. Leon, H. Fabelo, S. Ortega, J. F. Piñeiro, A. Szolna, M. Hernandez, C. Espino, A. J. O’Shanahan, D. Carrera et al., “Most relevant spectral bands identification for brain cancer detection using hyperspectral imaging,” Sensors, vol. 19, no. 24, p. 5481, 2019

  5. [5]

    Hyperspectral satellites, evolution, and development his- tory,

    S.-E. Qian, “Hyperspectral satellites, evolution, and development his- tory,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , vol. 14, pp. 7032–7056, 2021

  6. [6]

    A review of hyperspectral remote sensing and its application in vegetation and water resource studies,

    M. Govender, K. Chetty, and H. Bulcock, “A review of hyperspectral remote sensing and its application in vegetation and water resource studies,” Water Sa, vol. 33, no. 2, pp. 145–151, 2007

  7. [7]

    The properties of the cornea based on hyperspectral imaging: Optical biomedical engineering perspective,

    S. S. M. Noor, K. Michael, S. Marshall, J. Ren, J. Tschannerl, and F.- J. Kao, “The properties of the cornea based on hyperspectral imaging: Optical biomedical engineering perspective,” in 2016 International Conference on Systems, Signals and Image Processing (IWSSIP) . IEEE, 2016, pp. 1–4

  8. [8]

    Weakly-supervised semantic segmentation in cityscape via hyperspectral image,

    Y . Huang, Q. Shen, Y . Fu, and S. You, “Weakly-supervised semantic segmentation in cityscape via hyperspectral image,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2021, pp. 1117–1126

  9. [9]

    Road condition estimation using deep learning with hyperspectral images: detection of water and snow

    D. Valme, J. Galindos, and D. C. Liyanage, “Road condition estimation using deep learning with hyperspectral images: detection of water and snow.” Proceedings of the Estonian Academy of Sciences , vol. 73, no. 1, 2024

  10. [10]

    Exploring fully convolutional networks for the segmen- tation of hyperspectral imaging applied to advanced driver assistance systems,

    J. Gutiérrez-Zaballa, K. Basterretxea, J. Echanobe, M. V . Martínez, and I. del Campo, “Exploring fully convolutional networks for the segmen- tation of hyperspectral imaging applied to advanced driver assistance systems,” in International Workshop on Design and Architecture for Signal and Image Processing . Springer, 2022, pp. 136–148

  11. [11]

    Hs3-bench: A benchmark and strong baseline for hyperspectral semantic segmenta- tion in driving scenarios,

    N. Theisen, R. Bartsch, D. Paulus, and P. Neubert, “Hs3-bench: A benchmark and strong baseline for hyperspectral semantic segmenta- tion in driving scenarios,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2024, pp. 5895– 5901

  12. [12]

    Hy- perspectral imaging-based perception in autonomous driving scenarios: Benchmarking baseline semantic segmentation models,

    I. A. Shah, J. Li, M. Glavin, E. Jones, E. Ward, and B. Deegan, “Hy- perspectral imaging-based perception in autonomous driving scenarios: Benchmarking baseline semantic segmentation models,”arXiv preprint arXiv:2410.22101, 2024

  13. [13]

    Imagenet large scale visual recognition challenge,

    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernsteinet al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, pp. 211–252, 2015

  14. [14]

    Dimensionality reduction techniques with hydranet framework for hsi classification,

    M. Q. Alkhatib, M. Al-Saad, N. Aburaed, S. Al Mansoori, and H. Al Ahmad, “Dimensionality reduction techniques with hydranet framework for hsi classification,” in 2022 IEEE International Confer- ence on Image Processing (ICIP) . IEEE, 2022, pp. 3151–3155. 12 VOLUME 00, 2024 TABLE 8. Computational Overhead of the proposed UNet-MSAM compared to UNet-SC for b...

  15. [15]

    Impact of dimensionality reduction techniques on classification of hyperspectral images,

    V . K. Munipalle, U. R. Nelakuditi, and R. R. Nidamanuri, “Impact of dimensionality reduction techniques on classification of hyperspectral images,” in 2023 3rd International Conference on Intelligent Tech- nologies (CONIT). IEEE, 2023, pp. 1–6

  16. [16]

    Hyperspectral image classification based on multi-scale convolutional features and multi-attention mechanisms,

    Q. Sun, G. Zhao, X. Xia, Y . Xie, C. Fang, L. Sun, Z. Wu, and C. Pan, “Hyperspectral image classification based on multi-scale convolutional features and multi-attention mechanisms,” Remote Sensing , vol. 16, no. 12, p. 2185, 2024

  17. [17]

    Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,

    X. Mao, C. Shen, and Y .-B. Yang, “Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,” Advances in neural information processing systems , vol. 29, 2016

  18. [18]

    U-net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, pro- ceedings, part III 18 . Springer, 2015, pp. 234–241

  19. [19]

    The cityscapes dataset,

    M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset,” inCVPR Workshop on the Future of Datasets in Vision, vol. 2, 2015, p. 1

  20. [20]

    Multispectral pedestrian detection: Benchmark dataset and baseline,

    S. Hwang, J. Park, N. Kim, Y . Choi, and I. So Kweon, “Multispectral pedestrian detection: Benchmark dataset and baseline,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2015, pp. 1037–1045

  21. [21]

    Are we ready for autonomous driving? the kitti vision benchmark suite,

    A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354–3361

  22. [22]

    nuscenes: A multimodal dataset for autonomous driving,

    H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2020, pp. 11 621–11 631

  23. [23]

    Hyko: A spectral dataset for scene understanding,

    C. Winkens, F. Sattler, V . Adams, and D. Paulus, “Hyko: A spectral dataset for scene understanding,” in Proceedings of the IEEE Interna- tional Conference on Computer Vision Workshops, 2017, pp. 254–261

  24. [24]

    Hsi road: a hyper spectral image dataset for road segmentation,

    J. Lu, H. Liu, Y . Yao, S. Tao, Z. Tang, and J. Lu, “Hsi road: a hyper spectral image dataset for road segmentation,” in 2020 IEEE International Conference on Multimedia and Expo (ICME) . IEEE, 2020, pp. 1–6

  25. [25]

    Hyper- drive: Visible-short wave infrared hyperspectral imaging datasets for robots in unstructured environments,

    N. Hanson, B. Pyatski, S. Hibbard, C. DiMarzio, and T. Padır, “Hyper- drive: Visible-short wave infrared hyperspectral imaging datasets for robots in unstructured environments,” in 2023 13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS). IEEE, 2023, pp. 1–5

  26. [26]

    Hyperspectral imaging for mobile robot navigation,

    K. Jakubczyk, B. Siemi ˛ atkowska, R. Wi˛ eckowski, and J. Rapcewicz, “Hyperspectral imaging for mobile robot navigation,” Sensors, vol. 23, no. 1, p. 383, 2022

  27. [27]

    Dual fusion network for hyperspectral semantic segmentation,

    X. Ding, S. Gu, and J. Yang, “Dual fusion network for hyperspectral semantic segmentation,” in International Conference on Image and Graphics. Springer, 2023, pp. 149–161

  28. [28]

    3-d deep learning approach for remote sensing image classification,

    A. B. Hamida, A. Benoit, P. Lambert, and C. B. Amar, “3-d deep learning approach for remote sensing image classification,” IEEE Transactions on geoscience and remote sensing , vol. 56, no. 8, pp. 4420–4434, 2018

  29. [29]

    Deep learning for classifi- cation of hyperspectral data: A comparative review,

    N. Audebert, B. Le Saux, and S. Lefèvre, “Deep learning for classifi- cation of hyperspectral data: A comparative review,” IEEE geoscience and remote sensing magazine , vol. 7, no. 2, pp. 159–173, 2019

  30. [30]

    Spectralzoom: Efficient segmentation with an adaptive hyperspectral camera,

    J. Arnold, S. Rossi, C. Petrosino, E. Mitchell, and S. J. Koppal, “Spectralzoom: Efficient segmentation with an adaptive hyperspectral camera,” arXiv preprint arXiv:2406.04287 , 2024

  31. [31]

    Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,

    L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence , vol. 40, no. 4, pp. 834– 848, 2017

  32. [32]

    High-Resolution Representations for Labeling Pixels and Regions

    K. Sun, Y . Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y . Mu, X. Wang, W. Liu, and J. Wang, “High-resolution representations for labeling pixels and regions,” arXiv preprint arXiv:1904.04514 , 2019

  33. [33]

    Pyramid scene parsing network,

    H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2017, pp. 2881–2890

  34. [34]

    Cbam: Convolutional block attention module,

    S. Woo, J. Park, J.-Y . Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV) , 2018, pp. 3–19

  35. [35]

    Coordinate attention unet

    Q. A. Dang and D. D. Nguyen, “Coordinate attention unet.” in ROBOVIS, 2021, pp. 122–127

  36. [36]

    Hyperspectral image segmentation: a comprehensive survey,

    R. Grewal, S. S. Kasana, and G. Kasana, “Hyperspectral image segmentation: a comprehensive survey,” Multimedia Tools and Appli- cations, vol. 82, no. 14, pp. 20 819–20 872, 2023

  37. [37]

    V oxnet: A 3d convolutional neural network for real-time object recognition,

    D. Maturana and S. A. Scherer, “V oxnet: A 3d convolutional neural network for real-time object recognition,” 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 922–928, 2015. [Online]. Available: https://api.semanticscholar. org/CorpusID:14620252

  38. [38]

    A joint network of 3d-2d cnn feature hierarchy and pyramidal residual model for hyperspectral image classification,

    H. Wei, Y . Wang, Y . Sun, J. Zheng, and X. Yu, “A joint network of 3d-2d cnn feature hierarchy and pyramidal residual model for hyperspectral image classification,” IEEE Access, 2025

  39. [39]

    Attention is all you need,

    A. Vaswani, “Attention is all you need,” Advances in Neural Informa- tion Processing Systems , 2017. VOLUME 00, 2024 13 Shah et al.: Manuscript Submitted to IEEE OPEN JOURNAL OF VEHICULAR TECHNOLOGY

  40. [40]

    Attention residual hybrid network for unmanned aerial vehicles hyperspectral image classification,

    Z. Zhang, L. Jiang, B.-H. Tang, J. Liu, Q. Wang, Y . Hu, L. Huang, and Z. Fu, “Attention residual hybrid network for unmanned aerial vehicles hyperspectral image classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2025

  41. [41]

    Rectifier nonlinearities improve neural network acoustic models,

    A. L. Maas, A. Y . Hannun, A. Y . Ng et al. , “Rectifier nonlinearities improve neural network acoustic models,” in Proc. icml, vol. 30, no. 1. Atlanta, GA, 2013, p. 3

  42. [42]

    Adabelief optimizer: Adapting stepsizes by the belief in observed gradients,

    J. Zhuang, T. Tang, Y . Ding, S. C. Tatikonda, N. Dvornek, X. Pa- pademetris, and J. Duncan, “Adabelief optimizer: Adapting stepsizes by the belief in observed gradients,” Advances in neural information processing systems, vol. 33, pp. 18 795–18 806, 2020

  43. [43]

    Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation,

    M. Yeung, E. Sala, C.-B. Schönlieb, and L. Rundo, “Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation,”Computerized Medical Imaging and Graphics , vol. 95, p. 102026, 2022. 14 VOLUME 00, 2024