Multi-Scale Spectral Attention Module-based Hyperspectral Segmentation in Autonomous Driving Scenarios

Brian Deegan; Edward Jones; Enda Ward; Imad Ali Shah; Jiarong Li; Martin Glavin; Tim Brophy

arxiv: 2506.18682 · v2 · submitted 2025-06-23 · 💻 cs.CV · cs.AI

Multi-Scale Spectral Attention Module-based Hyperspectral Segmentation in Autonomous Driving Scenarios

Imad Ali Shah , Jiarong Li , Tim Brophy , Martin Glavin , Edward Jones , Enda Ward , Brian Deegan This is my paper

Pith reviewed 2026-05-19 08:13 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords hyperspectral imagingsemantic segmentationautonomous drivingmulti-scale attentionUNet skip connectionsurban driving scenariosspectral feature extraction

0 comments

The pith

Integrating a multi-scale spectral attention module into UNet skip connections raises hyperspectral segmentation accuracy by 2.32 percent mIoU on urban driving data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests a Multi-Scale Spectral Attention Module that runs three parallel 1D convolutions with different kernel sizes to pull out spectral features from high-dimensional hyperspectral images. Placing this module in the skip connections of a standard UNet produces better semantic segmentation results across several urban driving datasets. The gains hold while inference times stay competitive with other attention designs. This approach addresses the difficulty of processing rich spectral data that could help vehicles handle poor lighting or weather. Ablation tests show that the best kernel sets depend on the particular dataset.

Core claim

By integrating the Multi-Scale Attention Mechanism (MSAM) into UNet's skip connections, the method achieves average improvements of 2.32% in mean Intersection over Union (mIoU) and 2.88% in mean F1 score over the baseline UNet-SC across multiple hyperspectral imaging datasets for urban driving scenarios, while maintaining competitive GPU performance.

What carries the argument

The Multi-Scale Spectral Attention Module (MSAM) that applies three parallel 1D convolutions with varying kernel sizes and performs adaptive feature aggregation to capture multi-scale spectral information.

If this is right

Kernel combinations such as (1;5;11) and (3;7;11) perform strongly but vary with the dataset.
MSAM keeps GPU runtime competitive with other established attention mechanisms.
The module improves spectral feature extraction for perception in challenging lighting and weather.
The work provides a starting point for adaptive multi-scale spectral processing in automotive systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Kernel selection could be made dynamic during driving to match changing scene types.
The same multi-scale spectral idea might transfer to other dense prediction tasks such as depth estimation from HSI.
Pairing MSAM with temporal fusion across video frames could reduce frame-to-frame label flicker.
Running the model on embedded automotive hardware would test whether the accuracy gains survive real-time constraints.

Load-bearing premise

The measured gains come from the MSAM design itself and generalize beyond the specific datasets and urban driving conditions tested rather than arising from dataset-specific tuning.

What would settle it

Testing the exact MSAM-UNet model on a new hyperspectral dataset recorded in a different city or with a different sensor and measuring whether the mIoU gain stays near 2.3 percent without changing the kernel sizes.

Figures

Figures reproduced from arXiv: 2506.18682 by Brian Deegan, Edward Jones, Enda Ward, Imad Ali Shah, Jiarong Li, Martin Glavin, Tim Brophy.

**Figure 2.** Figure 2: FIGURE 2 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: FIGURE 3 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: FIGURE 4 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 6.** Figure 6: FIGURE 6 [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: FIGURE 7 [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 9.** Figure 9: FIGURE 9 [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

read the original abstract

Recent advances in autonomous driving (AD) have highlighted the potential of hyperspectral imaging (HSI) for enhanced environmental perception, particularly in challenging weather and lighting conditions. However, efficiently processing high-dimensional spectral data remains a significant challenge. This paper presents an empirical investigation of a Multi-Scale Attention Mechanism (MSAM) for enhanced spectral feature extraction through three parallel 1D convolutions with varying kernel sizes (1-11) and adaptive feature aggregation. By integrating MSAM into UNet's skip connections, we evaluate performance improvements in semantic segmentation across multiple HSI datasets for urban driving scenarios. Comprehensive ablation studies demonstrate that MSAM consistently outperforms baseline UNet-SC, achieving average improvements of 2.32% in mIoU and 2.88% in mF1, while maintaining competitive GPU performance against established attention mechanisms. Our findings reveal that optimal kernel combinations are dataset-specific, with configurations such as (1;5;11) and (3;7;11) demonstrating particularly strong performance. This empirical investigation advances understanding of HSI processing capabilities for AD applications and establishes a foundation for adaptive multi-scale spectral feature extraction in automotive deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MSAM adds a multi-scale 1D conv block to UNet skips for HSI segmentation and shows small average gains, but dataset-specific kernel selection makes the fixed-module generalization claim shaky.

read the letter

The main point is that this paper plugs a new Multi-Scale Spectral Attention Module into UNet skip connections for hyperspectral segmentation in urban driving scenes and reports average lifts of 2.32% mIoU and 2.88% mF1 over a plain UNet-SC baseline across several datasets, while keeping runtime competitive with other attention methods. The module itself runs three parallel 1D convolutions with kernel sizes in the 1-11 range plus adaptive aggregation, which is a concrete design choice even if multi-scale and attention ideas are not brand new. They include ablation results on the kernel combinations and note that certain triples like (1;5;11) and (3;7;11) work especially well on particular sets. That empirical focus and the runtime numbers are the parts that could actually help someone trying to add HSI to an autonomous driving stack. The stress-test concern lands: the abstract explicitly says optimal kernels are dataset-specific, so the quoted average improvements likely reflect per-dataset selection rather than one fixed MSAM configuration delivering the gains everywhere. If the full paper does not show a single kernel triple holding the margins across all sets without retuning, then the claim that the module itself produces consistent benefits is weaker than presented. No error bars or statistical tests are mentioned in the summary, which leaves the practical significance of a 2% lift unclear. The work stays empirical with no circular fitting or invented entities beyond the module name, and the citation pattern looks standard for this corner of computer vision. This paper is aimed at researchers building perception pipelines who already have access to hyperspectral cameras and want a lightweight plug-in to try on segmentation. A reader already working on attention modules or HSI for AD would find the ablations and GPU numbers worth a look, but it is too incremental for a broad audience. I would send it for peer review so the authors can clarify the kernel selection protocol and add a fixed-configuration experiment; the experiments are relevant enough to justify referee time even if revisions are needed.

Referee Report

1 major / 2 minor

Summary. The manuscript presents an empirical investigation of a Multi-Scale Spectral Attention Module (MSAM) for hyperspectral semantic segmentation in autonomous driving. MSAM applies three parallel 1D convolutions with varying kernel sizes (e.g., combinations such as (1;5;11) and (3;7;11)) followed by adaptive feature aggregation, and is inserted into the skip connections of a UNet architecture (UNet-SC). Across multiple HSI datasets for urban driving scenarios, the authors report that MSAM yields average gains of 2.32% mIoU and 2.88% mF1 over the baseline UNet-SC while remaining competitive in GPU runtime; ablation studies are provided to support the module design.

Significance. If the performance margins can be shown to arise from a single fixed MSAM configuration rather than dataset-specific kernel retuning, the work would offer a practical, lightweight attention mechanism for high-dimensional spectral data in AD perception pipelines. The empirical focus with ablation results provides a useful baseline for future multi-scale spectral processing research.

major comments (1)

[Abstract and §4 (Experimental Results)] Abstract and §4 (Experimental Results): the central claim of consistent average improvements (2.32% mIoU / 2.88% mF1) across datasets is presented alongside the statement that optimal kernel combinations are dataset-specific. If the reported averages reflect selection of the best kernel triple per dataset rather than a single fixed MSAM configuration evaluated on every dataset, the generalization argument for the module itself is not yet supported. The manuscript should either (a) report results for one fixed kernel triple (e.g., (3;7;11)) on all datasets without retuning or (b) explicitly state that the averages are best-per-dataset and qualify the generalization claim accordingly.

minor comments (2)

[§3 (Method)] §3 (Method): the precise formulation of the adaptive aggregation step after the parallel convolutions is described only at a high level; adding an equation or short pseudocode would improve reproducibility.
[Tables in §4] Tables in §4: inclusion of standard deviations across multiple random seeds or cross-validation folds would strengthen the statistical interpretation of the reported mIoU/mF1 deltas.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point-by-point below and outline the revisions we will make to clarify our results and strengthen the generalization claims.

read point-by-point responses

Referee: [Abstract and §4 (Experimental Results)] Abstract and §4 (Experimental Results): the central claim of consistent average improvements (2.32% mIoU / 2.88% mF1) across datasets is presented alongside the statement that optimal kernel combinations are dataset-specific. If the reported averages reflect selection of the best kernel triple per dataset rather than a single fixed MSAM configuration evaluated on every dataset, the generalization argument for the module itself is not yet supported. The manuscript should either (a) report results for one fixed kernel triple (e.g., (3;7;11)) on all datasets without retuning or (b) explicitly state that the averages are best-per-dataset and qualify the generalization claim accordingly.

Authors: We agree that the current presentation creates ambiguity. The reported average gains of 2.32% mIoU and 2.88% mF1 are computed from the best kernel triple selected independently for each dataset, as already noted in the abstract and §4 where we state that optimal combinations are dataset-specific. This reflects the module's practical adaptability to varying spectral properties across urban driving HSI datasets. To resolve the concern, we will revise the abstract, §4, and conclusions to explicitly qualify that the primary averages use per-dataset optimal kernels. In addition, we will add new results in §4 showing performance for one fixed kernel triple (e.g., (3;7;11)) evaluated uniformly across all datasets without retuning. These revisions will be incorporated in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results from ablation studies

full rationale

The paper is an empirical study that integrates a proposed Multi-Scale Spectral Attention Module (MSAM) into UNet skip connections and reports measured mIoU/mF1 gains from ablation experiments on multiple HSI datasets. No derivation chain, equations, or first-principles predictions are claimed; performance numbers are obtained directly from training and evaluation rather than by fitting a parameter and relabeling it as a prediction. Kernel-size choices are explicitly noted as dataset-specific, but this is an experimental observation, not a self-definitional loop or fitted-input prediction. The work is self-contained against external benchmarks with no load-bearing self-citations or uniqueness theorems invoked.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard UNet architecture assumptions, the representativeness of the HSI datasets for real AD conditions, and the validity of mIoU/mF1 as primary metrics; the module itself is introduced without external validation beyond the reported experiments.

free parameters (1)

kernel size combinations
Dataset-specific choices such as (1;5;11) and (3;7;11) selected via ablation studies to optimize performance.

axioms (1)

domain assumption UNet with skip connections is an appropriate base architecture for hyperspectral semantic segmentation
Invoked by the choice to insert MSAM into skip connections without further justification in the abstract.

invented entities (1)

Multi-Scale Spectral Attention Module (MSAM) no independent evidence
purpose: To perform adaptive multi-scale spectral feature extraction via parallel 1D convolutions
New module proposed in the paper; no independent evidence such as theoretical guarantees or external benchmarks provided beyond the empirical results.

pith-pipeline@v0.9.0 · 5750 in / 1288 out tokens · 33043 ms · 2026-05-19T08:13:25.813210+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

three parallel 1D convolutions with varying kernel sizes (1-11) and adaptive feature aggregation... integrated into UNet's skip connections
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

optimal kernel combinations are dataset-specific, with configurations such as (1;5;11) and (3;7;11)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CSNR and JMIM Based Spectral Band Selection for Reducing Metamerism in Urban Driving
cs.CV 2025-08 unverdicted novelty 4.0

The work identifies bands at 497 nm, 607 nm, and 895 nm that deliver large gains in material dissimilarity and perceptual separability on the H-City dataset compared with RGB.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Hsi-drive: A dataset for the research of hyperspectral image processing applied to autonomous driving systems,

K. Basterretxea, V . Martínez, J. Echanobe, J. Gutiérrez-Zaballa, and I. Del Campo, “Hsi-drive: A dataset for the research of hyperspectral image processing applied to autonomous driving systems,” in 2021 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2021, pp. 866–873

work page 2021
[2]

Hsi-drive v2. 0: More data for new chal- lenges in scene understanding for autonomous driving,

J. Gutiérrez-Zaballa, K. Basterretxea, J. Echanobe, M. V . Martínez, and U. Martinez-Corral, “Hsi-drive v2. 0: More data for new chal- lenges in scene understanding for autonomous driving,” in 2023 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2023, pp. 207–214

work page 2023
[3]

Urban scene understanding via hyperspectral images: Dataset and benchmark,

Q. Shen, Y . Huang, T. Ren, Y . Fu, and S. You, “Urban scene understanding via hyperspectral images: Dataset and benchmark,” Available at SSRN 4560035

work page
[4]

Most relevant spectral bands identification for brain cancer detection using hyperspectral imaging,

B. Martinez, R. Leon, H. Fabelo, S. Ortega, J. F. Piñeiro, A. Szolna, M. Hernandez, C. Espino, A. J. O’Shanahan, D. Carrera et al., “Most relevant spectral bands identification for brain cancer detection using hyperspectral imaging,” Sensors, vol. 19, no. 24, p. 5481, 2019

work page 2019
[5]

Hyperspectral satellites, evolution, and development his- tory,

S.-E. Qian, “Hyperspectral satellites, evolution, and development his- tory,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , vol. 14, pp. 7032–7056, 2021

work page 2021
[6]

A review of hyperspectral remote sensing and its application in vegetation and water resource studies,

M. Govender, K. Chetty, and H. Bulcock, “A review of hyperspectral remote sensing and its application in vegetation and water resource studies,” Water Sa, vol. 33, no. 2, pp. 145–151, 2007

work page 2007
[7]

The properties of the cornea based on hyperspectral imaging: Optical biomedical engineering perspective,

S. S. M. Noor, K. Michael, S. Marshall, J. Ren, J. Tschannerl, and F.- J. Kao, “The properties of the cornea based on hyperspectral imaging: Optical biomedical engineering perspective,” in 2016 International Conference on Systems, Signals and Image Processing (IWSSIP) . IEEE, 2016, pp. 1–4

work page 2016
[8]

Weakly-supervised semantic segmentation in cityscape via hyperspectral image,

Y . Huang, Q. Shen, Y . Fu, and S. You, “Weakly-supervised semantic segmentation in cityscape via hyperspectral image,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2021, pp. 1117–1126

work page 2021
[9]

Road condition estimation using deep learning with hyperspectral images: detection of water and snow

D. Valme, J. Galindos, and D. C. Liyanage, “Road condition estimation using deep learning with hyperspectral images: detection of water and snow.” Proceedings of the Estonian Academy of Sciences , vol. 73, no. 1, 2024

work page 2024
[10]

Exploring fully convolutional networks for the segmen- tation of hyperspectral imaging applied to advanced driver assistance systems,

J. Gutiérrez-Zaballa, K. Basterretxea, J. Echanobe, M. V . Martínez, and I. del Campo, “Exploring fully convolutional networks for the segmen- tation of hyperspectral imaging applied to advanced driver assistance systems,” in International Workshop on Design and Architecture for Signal and Image Processing . Springer, 2022, pp. 136–148

work page 2022
[11]

Hs3-bench: A benchmark and strong baseline for hyperspectral semantic segmenta- tion in driving scenarios,

N. Theisen, R. Bartsch, D. Paulus, and P. Neubert, “Hs3-bench: A benchmark and strong baseline for hyperspectral semantic segmenta- tion in driving scenarios,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2024, pp. 5895– 5901

work page 2024
[12]

Hy- perspectral imaging-based perception in autonomous driving scenarios: Benchmarking baseline semantic segmentation models,

I. A. Shah, J. Li, M. Glavin, E. Jones, E. Ward, and B. Deegan, “Hy- perspectral imaging-based perception in autonomous driving scenarios: Benchmarking baseline semantic segmentation models,”arXiv preprint arXiv:2410.22101, 2024

work page arXiv 2024
[13]

Imagenet large scale visual recognition challenge,

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernsteinet al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, pp. 211–252, 2015

work page 2015
[14]

Dimensionality reduction techniques with hydranet framework for hsi classification,

M. Q. Alkhatib, M. Al-Saad, N. Aburaed, S. Al Mansoori, and H. Al Ahmad, “Dimensionality reduction techniques with hydranet framework for hsi classification,” in 2022 IEEE International Confer- ence on Image Processing (ICIP) . IEEE, 2022, pp. 3151–3155. 12 VOLUME 00, 2024 TABLE 8. Computational Overhead of the proposed UNet-MSAM compared to UNet-SC for b...

work page 2022
[15]

Impact of dimensionality reduction techniques on classification of hyperspectral images,

V . K. Munipalle, U. R. Nelakuditi, and R. R. Nidamanuri, “Impact of dimensionality reduction techniques on classification of hyperspectral images,” in 2023 3rd International Conference on Intelligent Tech- nologies (CONIT). IEEE, 2023, pp. 1–6

work page 2023
[16]

Hyperspectral image classification based on multi-scale convolutional features and multi-attention mechanisms,

Q. Sun, G. Zhao, X. Xia, Y . Xie, C. Fang, L. Sun, Z. Wu, and C. Pan, “Hyperspectral image classification based on multi-scale convolutional features and multi-attention mechanisms,” Remote Sensing , vol. 16, no. 12, p. 2185, 2024

work page 2024
[17]

Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,

X. Mao, C. Shen, and Y .-B. Yang, “Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,” Advances in neural information processing systems , vol. 29, 2016

work page 2016
[18]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, pro- ceedings, part III 18 . Springer, 2015, pp. 234–241

work page 2015
[19]

The cityscapes dataset,

M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset,” inCVPR Workshop on the Future of Datasets in Vision, vol. 2, 2015, p. 1

work page 2015
[20]

Multispectral pedestrian detection: Benchmark dataset and baseline,

S. Hwang, J. Park, N. Kim, Y . Choi, and I. So Kweon, “Multispectral pedestrian detection: Benchmark dataset and baseline,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2015, pp. 1037–1045

work page 2015
[21]

Are we ready for autonomous driving? the kitti vision benchmark suite,

A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354–3361

work page 2012
[22]

nuscenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2020, pp. 11 621–11 631

work page 2020
[23]

Hyko: A spectral dataset for scene understanding,

C. Winkens, F. Sattler, V . Adams, and D. Paulus, “Hyko: A spectral dataset for scene understanding,” in Proceedings of the IEEE Interna- tional Conference on Computer Vision Workshops, 2017, pp. 254–261

work page 2017
[24]

Hsi road: a hyper spectral image dataset for road segmentation,

J. Lu, H. Liu, Y . Yao, S. Tao, Z. Tang, and J. Lu, “Hsi road: a hyper spectral image dataset for road segmentation,” in 2020 IEEE International Conference on Multimedia and Expo (ICME) . IEEE, 2020, pp. 1–6

work page 2020
[25]

Hyper- drive: Visible-short wave infrared hyperspectral imaging datasets for robots in unstructured environments,

N. Hanson, B. Pyatski, S. Hibbard, C. DiMarzio, and T. Padır, “Hyper- drive: Visible-short wave infrared hyperspectral imaging datasets for robots in unstructured environments,” in 2023 13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS). IEEE, 2023, pp. 1–5

work page 2023
[26]

Hyperspectral imaging for mobile robot navigation,

K. Jakubczyk, B. Siemi ˛ atkowska, R. Wi˛ eckowski, and J. Rapcewicz, “Hyperspectral imaging for mobile robot navigation,” Sensors, vol. 23, no. 1, p. 383, 2022

work page 2022
[27]

Dual fusion network for hyperspectral semantic segmentation,

X. Ding, S. Gu, and J. Yang, “Dual fusion network for hyperspectral semantic segmentation,” in International Conference on Image and Graphics. Springer, 2023, pp. 149–161

work page 2023
[28]

3-d deep learning approach for remote sensing image classification,

A. B. Hamida, A. Benoit, P. Lambert, and C. B. Amar, “3-d deep learning approach for remote sensing image classification,” IEEE Transactions on geoscience and remote sensing , vol. 56, no. 8, pp. 4420–4434, 2018

work page 2018
[29]

Deep learning for classifi- cation of hyperspectral data: A comparative review,

N. Audebert, B. Le Saux, and S. Lefèvre, “Deep learning for classifi- cation of hyperspectral data: A comparative review,” IEEE geoscience and remote sensing magazine , vol. 7, no. 2, pp. 159–173, 2019

work page 2019
[30]

Spectralzoom: Efficient segmentation with an adaptive hyperspectral camera,

J. Arnold, S. Rossi, C. Petrosino, E. Mitchell, and S. J. Koppal, “Spectralzoom: Efficient segmentation with an adaptive hyperspectral camera,” arXiv preprint arXiv:2406.04287 , 2024

work page arXiv 2024
[31]

Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence , vol. 40, no. 4, pp. 834– 848, 2017

work page 2017
[32]

High-Resolution Representations for Labeling Pixels and Regions

K. Sun, Y . Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y . Mu, X. Wang, W. Liu, and J. Wang, “High-resolution representations for labeling pixels and regions,” arXiv preprint arXiv:1904.04514 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[33]

Pyramid scene parsing network,

H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2017, pp. 2881–2890

work page 2017
[34]

Cbam: Convolutional block attention module,

S. Woo, J. Park, J.-Y . Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV) , 2018, pp. 3–19

work page 2018
[35]

Coordinate attention unet

Q. A. Dang and D. D. Nguyen, “Coordinate attention unet.” in ROBOVIS, 2021, pp. 122–127

work page 2021
[36]

Hyperspectral image segmentation: a comprehensive survey,

R. Grewal, S. S. Kasana, and G. Kasana, “Hyperspectral image segmentation: a comprehensive survey,” Multimedia Tools and Appli- cations, vol. 82, no. 14, pp. 20 819–20 872, 2023

work page 2023
[37]

V oxnet: A 3d convolutional neural network for real-time object recognition,

D. Maturana and S. A. Scherer, “V oxnet: A 3d convolutional neural network for real-time object recognition,” 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 922–928, 2015. [Online]. Available: https://api.semanticscholar. org/CorpusID:14620252

work page 2015
[38]

A joint network of 3d-2d cnn feature hierarchy and pyramidal residual model for hyperspectral image classification,

H. Wei, Y . Wang, Y . Sun, J. Zheng, and X. Yu, “A joint network of 3d-2d cnn feature hierarchy and pyramidal residual model for hyperspectral image classification,” IEEE Access, 2025

work page 2025
[39]

Attention is all you need,

A. Vaswani, “Attention is all you need,” Advances in Neural Informa- tion Processing Systems , 2017. VOLUME 00, 2024 13 Shah et al.: Manuscript Submitted to IEEE OPEN JOURNAL OF VEHICULAR TECHNOLOGY

work page 2017
[40]

Attention residual hybrid network for unmanned aerial vehicles hyperspectral image classification,

Z. Zhang, L. Jiang, B.-H. Tang, J. Liu, Q. Wang, Y . Hu, L. Huang, and Z. Fu, “Attention residual hybrid network for unmanned aerial vehicles hyperspectral image classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2025

work page 2025
[41]

Rectifier nonlinearities improve neural network acoustic models,

A. L. Maas, A. Y . Hannun, A. Y . Ng et al. , “Rectifier nonlinearities improve neural network acoustic models,” in Proc. icml, vol. 30, no. 1. Atlanta, GA, 2013, p. 3

work page 2013
[42]

Adabelief optimizer: Adapting stepsizes by the belief in observed gradients,

J. Zhuang, T. Tang, Y . Ding, S. C. Tatikonda, N. Dvornek, X. Pa- pademetris, and J. Duncan, “Adabelief optimizer: Adapting stepsizes by the belief in observed gradients,” Advances in neural information processing systems, vol. 33, pp. 18 795–18 806, 2020

work page 2020
[43]

Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation,

M. Yeung, E. Sala, C.-B. Schönlieb, and L. Rundo, “Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation,”Computerized Medical Imaging and Graphics , vol. 95, p. 102026, 2022. 14 VOLUME 00, 2024

work page 2022

[1] [1]

Hsi-drive: A dataset for the research of hyperspectral image processing applied to autonomous driving systems,

K. Basterretxea, V . Martínez, J. Echanobe, J. Gutiérrez-Zaballa, and I. Del Campo, “Hsi-drive: A dataset for the research of hyperspectral image processing applied to autonomous driving systems,” in 2021 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2021, pp. 866–873

work page 2021

[2] [2]

Hsi-drive v2. 0: More data for new chal- lenges in scene understanding for autonomous driving,

J. Gutiérrez-Zaballa, K. Basterretxea, J. Echanobe, M. V . Martínez, and U. Martinez-Corral, “Hsi-drive v2. 0: More data for new chal- lenges in scene understanding for autonomous driving,” in 2023 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2023, pp. 207–214

work page 2023

[3] [3]

Urban scene understanding via hyperspectral images: Dataset and benchmark,

Q. Shen, Y . Huang, T. Ren, Y . Fu, and S. You, “Urban scene understanding via hyperspectral images: Dataset and benchmark,” Available at SSRN 4560035

work page

[4] [4]

Most relevant spectral bands identification for brain cancer detection using hyperspectral imaging,

B. Martinez, R. Leon, H. Fabelo, S. Ortega, J. F. Piñeiro, A. Szolna, M. Hernandez, C. Espino, A. J. O’Shanahan, D. Carrera et al., “Most relevant spectral bands identification for brain cancer detection using hyperspectral imaging,” Sensors, vol. 19, no. 24, p. 5481, 2019

work page 2019

[5] [5]

Hyperspectral satellites, evolution, and development his- tory,

S.-E. Qian, “Hyperspectral satellites, evolution, and development his- tory,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , vol. 14, pp. 7032–7056, 2021

work page 2021

[6] [6]

A review of hyperspectral remote sensing and its application in vegetation and water resource studies,

M. Govender, K. Chetty, and H. Bulcock, “A review of hyperspectral remote sensing and its application in vegetation and water resource studies,” Water Sa, vol. 33, no. 2, pp. 145–151, 2007

work page 2007

[7] [7]

The properties of the cornea based on hyperspectral imaging: Optical biomedical engineering perspective,

S. S. M. Noor, K. Michael, S. Marshall, J. Ren, J. Tschannerl, and F.- J. Kao, “The properties of the cornea based on hyperspectral imaging: Optical biomedical engineering perspective,” in 2016 International Conference on Systems, Signals and Image Processing (IWSSIP) . IEEE, 2016, pp. 1–4

work page 2016

[8] [8]

Weakly-supervised semantic segmentation in cityscape via hyperspectral image,

Y . Huang, Q. Shen, Y . Fu, and S. You, “Weakly-supervised semantic segmentation in cityscape via hyperspectral image,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2021, pp. 1117–1126

work page 2021

[9] [9]

Road condition estimation using deep learning with hyperspectral images: detection of water and snow

D. Valme, J. Galindos, and D. C. Liyanage, “Road condition estimation using deep learning with hyperspectral images: detection of water and snow.” Proceedings of the Estonian Academy of Sciences , vol. 73, no. 1, 2024

work page 2024

[10] [10]

Exploring fully convolutional networks for the segmen- tation of hyperspectral imaging applied to advanced driver assistance systems,

J. Gutiérrez-Zaballa, K. Basterretxea, J. Echanobe, M. V . Martínez, and I. del Campo, “Exploring fully convolutional networks for the segmen- tation of hyperspectral imaging applied to advanced driver assistance systems,” in International Workshop on Design and Architecture for Signal and Image Processing . Springer, 2022, pp. 136–148

work page 2022

[11] [11]

Hs3-bench: A benchmark and strong baseline for hyperspectral semantic segmenta- tion in driving scenarios,

N. Theisen, R. Bartsch, D. Paulus, and P. Neubert, “Hs3-bench: A benchmark and strong baseline for hyperspectral semantic segmenta- tion in driving scenarios,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2024, pp. 5895– 5901

work page 2024

[12] [12]

Hy- perspectral imaging-based perception in autonomous driving scenarios: Benchmarking baseline semantic segmentation models,

I. A. Shah, J. Li, M. Glavin, E. Jones, E. Ward, and B. Deegan, “Hy- perspectral imaging-based perception in autonomous driving scenarios: Benchmarking baseline semantic segmentation models,”arXiv preprint arXiv:2410.22101, 2024

work page arXiv 2024

[13] [13]

Imagenet large scale visual recognition challenge,

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernsteinet al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, pp. 211–252, 2015

work page 2015

[14] [14]

Dimensionality reduction techniques with hydranet framework for hsi classification,

M. Q. Alkhatib, M. Al-Saad, N. Aburaed, S. Al Mansoori, and H. Al Ahmad, “Dimensionality reduction techniques with hydranet framework for hsi classification,” in 2022 IEEE International Confer- ence on Image Processing (ICIP) . IEEE, 2022, pp. 3151–3155. 12 VOLUME 00, 2024 TABLE 8. Computational Overhead of the proposed UNet-MSAM compared to UNet-SC for b...

work page 2022

[15] [15]

Impact of dimensionality reduction techniques on classification of hyperspectral images,

V . K. Munipalle, U. R. Nelakuditi, and R. R. Nidamanuri, “Impact of dimensionality reduction techniques on classification of hyperspectral images,” in 2023 3rd International Conference on Intelligent Tech- nologies (CONIT). IEEE, 2023, pp. 1–6

work page 2023

[16] [16]

Hyperspectral image classification based on multi-scale convolutional features and multi-attention mechanisms,

Q. Sun, G. Zhao, X. Xia, Y . Xie, C. Fang, L. Sun, Z. Wu, and C. Pan, “Hyperspectral image classification based on multi-scale convolutional features and multi-attention mechanisms,” Remote Sensing , vol. 16, no. 12, p. 2185, 2024

work page 2024

[17] [17]

Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,

X. Mao, C. Shen, and Y .-B. Yang, “Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,” Advances in neural information processing systems , vol. 29, 2016

work page 2016

[18] [18]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, pro- ceedings, part III 18 . Springer, 2015, pp. 234–241

work page 2015

[19] [19]

The cityscapes dataset,

M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset,” inCVPR Workshop on the Future of Datasets in Vision, vol. 2, 2015, p. 1

work page 2015

[20] [20]

Multispectral pedestrian detection: Benchmark dataset and baseline,

S. Hwang, J. Park, N. Kim, Y . Choi, and I. So Kweon, “Multispectral pedestrian detection: Benchmark dataset and baseline,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2015, pp. 1037–1045

work page 2015

[21] [21]

Are we ready for autonomous driving? the kitti vision benchmark suite,

A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354–3361

work page 2012

[22] [22]

nuscenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2020, pp. 11 621–11 631

work page 2020

[23] [23]

Hyko: A spectral dataset for scene understanding,

C. Winkens, F. Sattler, V . Adams, and D. Paulus, “Hyko: A spectral dataset for scene understanding,” in Proceedings of the IEEE Interna- tional Conference on Computer Vision Workshops, 2017, pp. 254–261

work page 2017

[24] [24]

Hsi road: a hyper spectral image dataset for road segmentation,

J. Lu, H. Liu, Y . Yao, S. Tao, Z. Tang, and J. Lu, “Hsi road: a hyper spectral image dataset for road segmentation,” in 2020 IEEE International Conference on Multimedia and Expo (ICME) . IEEE, 2020, pp. 1–6

work page 2020

[25] [25]

Hyper- drive: Visible-short wave infrared hyperspectral imaging datasets for robots in unstructured environments,

N. Hanson, B. Pyatski, S. Hibbard, C. DiMarzio, and T. Padır, “Hyper- drive: Visible-short wave infrared hyperspectral imaging datasets for robots in unstructured environments,” in 2023 13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS). IEEE, 2023, pp. 1–5

work page 2023

[26] [26]

Hyperspectral imaging for mobile robot navigation,

K. Jakubczyk, B. Siemi ˛ atkowska, R. Wi˛ eckowski, and J. Rapcewicz, “Hyperspectral imaging for mobile robot navigation,” Sensors, vol. 23, no. 1, p. 383, 2022

work page 2022

[27] [27]

Dual fusion network for hyperspectral semantic segmentation,

X. Ding, S. Gu, and J. Yang, “Dual fusion network for hyperspectral semantic segmentation,” in International Conference on Image and Graphics. Springer, 2023, pp. 149–161

work page 2023

[28] [28]

3-d deep learning approach for remote sensing image classification,

A. B. Hamida, A. Benoit, P. Lambert, and C. B. Amar, “3-d deep learning approach for remote sensing image classification,” IEEE Transactions on geoscience and remote sensing , vol. 56, no. 8, pp. 4420–4434, 2018

work page 2018

[29] [29]

Deep learning for classifi- cation of hyperspectral data: A comparative review,

N. Audebert, B. Le Saux, and S. Lefèvre, “Deep learning for classifi- cation of hyperspectral data: A comparative review,” IEEE geoscience and remote sensing magazine , vol. 7, no. 2, pp. 159–173, 2019

work page 2019

[30] [30]

Spectralzoom: Efficient segmentation with an adaptive hyperspectral camera,

J. Arnold, S. Rossi, C. Petrosino, E. Mitchell, and S. J. Koppal, “Spectralzoom: Efficient segmentation with an adaptive hyperspectral camera,” arXiv preprint arXiv:2406.04287 , 2024

work page arXiv 2024

[31] [31]

Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence , vol. 40, no. 4, pp. 834– 848, 2017

work page 2017

[32] [32]

High-Resolution Representations for Labeling Pixels and Regions

K. Sun, Y . Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y . Mu, X. Wang, W. Liu, and J. Wang, “High-resolution representations for labeling pixels and regions,” arXiv preprint arXiv:1904.04514 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[33] [33]

Pyramid scene parsing network,

H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2017, pp. 2881–2890

work page 2017

[34] [34]

Cbam: Convolutional block attention module,

S. Woo, J. Park, J.-Y . Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV) , 2018, pp. 3–19

work page 2018

[35] [35]

Coordinate attention unet

Q. A. Dang and D. D. Nguyen, “Coordinate attention unet.” in ROBOVIS, 2021, pp. 122–127

work page 2021

[36] [36]

Hyperspectral image segmentation: a comprehensive survey,

R. Grewal, S. S. Kasana, and G. Kasana, “Hyperspectral image segmentation: a comprehensive survey,” Multimedia Tools and Appli- cations, vol. 82, no. 14, pp. 20 819–20 872, 2023

work page 2023

[37] [37]

V oxnet: A 3d convolutional neural network for real-time object recognition,

D. Maturana and S. A. Scherer, “V oxnet: A 3d convolutional neural network for real-time object recognition,” 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 922–928, 2015. [Online]. Available: https://api.semanticscholar. org/CorpusID:14620252

work page 2015

[38] [38]

A joint network of 3d-2d cnn feature hierarchy and pyramidal residual model for hyperspectral image classification,

H. Wei, Y . Wang, Y . Sun, J. Zheng, and X. Yu, “A joint network of 3d-2d cnn feature hierarchy and pyramidal residual model for hyperspectral image classification,” IEEE Access, 2025

work page 2025

[39] [39]

Attention is all you need,

A. Vaswani, “Attention is all you need,” Advances in Neural Informa- tion Processing Systems , 2017. VOLUME 00, 2024 13 Shah et al.: Manuscript Submitted to IEEE OPEN JOURNAL OF VEHICULAR TECHNOLOGY

work page 2017

[40] [40]

Attention residual hybrid network for unmanned aerial vehicles hyperspectral image classification,

Z. Zhang, L. Jiang, B.-H. Tang, J. Liu, Q. Wang, Y . Hu, L. Huang, and Z. Fu, “Attention residual hybrid network for unmanned aerial vehicles hyperspectral image classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2025

work page 2025

[41] [41]

Rectifier nonlinearities improve neural network acoustic models,

A. L. Maas, A. Y . Hannun, A. Y . Ng et al. , “Rectifier nonlinearities improve neural network acoustic models,” in Proc. icml, vol. 30, no. 1. Atlanta, GA, 2013, p. 3

work page 2013

[42] [42]

Adabelief optimizer: Adapting stepsizes by the belief in observed gradients,

J. Zhuang, T. Tang, Y . Ding, S. C. Tatikonda, N. Dvornek, X. Pa- pademetris, and J. Duncan, “Adabelief optimizer: Adapting stepsizes by the belief in observed gradients,” Advances in neural information processing systems, vol. 33, pp. 18 795–18 806, 2020

work page 2020

[43] [43]

Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation,

M. Yeung, E. Sala, C.-B. Schönlieb, and L. Rundo, “Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation,”Computerized Medical Imaging and Graphics , vol. 95, p. 102026, 2022. 14 VOLUME 00, 2024

work page 2022