arxiv: 2604.26478 · v1 · submitted 2026-04-29 · 💻 cs.CV

Recognition: unknown

Cross-Domain Transfer of Hyperspectral Foundation Models

Nick Theisen , Peer Neubert

Authors on Pith no claims yet

Pith reviewed 2026-05-07 11:13 UTC · model grok-4.3

classification 💻 cs.CV

keywords hyperspectral imagingsemantic segmentationfoundation modelscross-domain transferremote sensingproximal sensingHSI

0 comments

The pith

Reusing remote-sensing hyperspectral models for proximal sensing boosts segmentation accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates how to improve semantic segmentation of hyperspectral images when training data is limited in proximal sensing applications. Rather than training models solely on scarce local data or adapting RGB-based foundation models, it explores transferring hyperspectral foundation models pretrained on remote sensing data. This cross-domain approach avoids the need to bridge different imaging modalities, keeping the full spectral information and a straightforward model structure. On the HS3-Bench benchmark, the transferred models show large gains over standard in-domain training, close much of the gap to more complex cross-modality methods, and hold up well when data is scarce.

Core claim

Cross-domain transfer of hyperspectral foundation models, pretrained in remote sensing, to proximal sensing applications yields large performance improvements in semantic segmentation over in-domain in-modality training, reduces the performance gap to cross-modality approaches, and sustains strong results even in limited-data regimes.

What carries the argument

Cross-domain transfer, which applies HSI foundation models trained on remote sensing data directly to proximal sensing tasks without modality adaptation.

Load-bearing premise

Performance differences on the HS3-Bench benchmark result from the choice of transfer strategy and not from differences in training details, data handling, or model sizes, and the benchmark reflects performance in actual proximal sensing uses.

What would settle it

Running all compared methods with exactly the same training hyperparameters, data splits, and model architectures on HS3-Bench and finding no advantage for cross-domain transfer, or testing the models in a real proximal sensing deployment where accuracy drops significantly.

Figures

Figures reproduced from arXiv: 2604.26478 by Nick Theisen, Peer Neubert.

**Figure 1.** Figure 1: The typical approach for HSI semantic segmentation uses in-domain-inmodality training (a), i. e. training models on data from the target domain (red) and target modality (blue), but many applications face limited training data availability. An established method to address this problem is cross-modality knowledge transfer (b), bridging RGB and HSI to exploit vision foundation models. These methods either … view at source ↗

**Figure 1.** Figure 1: 1. In-Domain-In-Modality Training ( view at source ↗

**Figure 2.** Figure 2: HyperSL-RU-Net consists of a HyperSL [9] backbone and an RU-Net [18] encoder-decoder module for semantic segmentation. HyperSL Encoder Spectral Tokenizer Spectral Wavelenghts Vector Class Probability Vector Feature Vector FullyConnected + Softmax view at source ↗

**Figure 3.** Figure 3: HyperSL-FC consists of a HyperSL [9] backbone and and a fully connected layer followed by softmax activation for spectral classification. 4 Experiments & Results In this section we first evaluate the effectiveness of cross-domain knowledge transfer for HSI semantic segmentation and spectral classification in Sec. 4.2. We then compare cross-domain to cross-modality knowledge transfer in Sec. 4.3 and final… view at source ↗

**Figure 4.** Figure 4: Improvements in mIoU-score for HyperSL-RU-Net over RU-Net. Green bars indicate positive effect, red bar indicates negative effect. less useful and the simpler RU-Net could exploit spatial features more effectively given sufficient training time. However, HyperSL-RU-Net’s better results for 100% data on HSI-Drive show that, with sufficient data, HyperSL features improve performance. HyperSL-RU-Net achieved… view at source ↗

read the original abstract

Hyperspectral imaging (HSI) semantic segmentation typically relies on in-domain training, but limited data availability often restricts model performance in real-world applications. Current approaches to leverage foundation models in proximal sensing use cross-modality techniques, bridging RGB and HSI to exploit vision foundation models. However, these methods either discard spectral information or introduce architectural complexity. We propose cross-domain transfer as an alternative, reusing HSI foundation models - originally trained in remote sensing - for proximal sensing applications. By eliminating the need to bridge modality gaps, our approach preserves spectral information while maintaining a simple architecture. Using the HS3-Bench benchmark, we systematically evaluate and compare conventional in-domain, in-modality training, cross-modality transfer and cross-domain transfer strategies. Our results demonstrate that cross-domain transfer achieves large performance improvements over in-domain, in-modality training, reduces the performance gap to cross-modality approaches and maintains strong performance in limited data settings. Thus, this work advances more effective HSI semantic segmentation in diverse applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Cross-domain transfer from remote HSI models to proximal tasks is a clean framing, but the abstract's performance claims rest on no visible numbers or controls.

read the letter

The paper's main contribution is framing cross-domain transfer—taking HSI foundation models trained on remote sensing data and applying them to proximal sensing semantic segmentation—as a simpler option than cross-modality bridging to RGB models. They run a systematic comparison on HS3-Bench against standard in-domain training and the cross-modality route, which is a useful way to organize the options. The practical upside they highlight is keeping the full spectral information without extra architectural overhead, which matters when labeled proximal HSI data is scarce and expensive to collect. That part of the argument is straightforward and worth testing in applications like agriculture or environmental monitoring. The soft spot is the evidence base. The abstract states large gains over in-domain training and a reduced gap to cross-modality methods, yet it supplies no metrics, error bars, training schedules, or ablation details. The stress-test concern about unmatched protocols is the real issue here: if the in-domain baselines used shorter training or lower capacity while the transferred models inherited pretrained weights and possibly different fine-tuning, the attribution to cross-domain transfer itself does not hold. Until the full paper shows matched experimental controls and the actual numbers, the central claim stays hard to evaluate. This work is aimed at people already working on hyperspectral segmentation who need low-data solutions and are weighing transfer strategies. A reader who cares about benchmark comparisons in this niche would get something out of the HS3-Bench results if the controls are solid. I would send it to peer review because the problem is relevant and the framing is direct, but the experimental section needs to be tightened before it can support the claims.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes cross-domain transfer of hyperspectral foundation models pretrained on remote-sensing data for use in proximal-sensing semantic segmentation. It systematically compares this strategy to conventional in-domain in-modality training and to cross-modality transfer on the HS3-Bench benchmark, claiming large gains over in-domain baselines, a narrowed gap to cross-modality methods, and robust performance under limited-data conditions.

Significance. If the performance differences can be rigorously attributed to the transfer strategy itself, the work offers a simpler architectural path for leveraging existing HSI foundation models while preserving full spectral information, avoiding the information loss or added complexity of RGB-HSI bridging. This could meaningfully expand the practical utility of foundation models in data-scarce proximal-sensing applications.

major comments (2)

[Abstract and §4] Abstract and §4 (Results): the central claim of 'large performance improvements' and 'reduces the performance gap' is stated without any numerical values, error bars, statistical tests, or baseline specifications. The manuscript must supply concrete metrics (e.g., mIoU deltas, standard deviations) and explicit descriptions of the in-domain and cross-modality baselines to allow readers to evaluate the magnitude and reliability of the reported gains.
[§3 and §4] §3 (Experimental Setup) and §4: the attribution of observed gains to cross-domain transfer (rather than unmatched training protocols) is load-bearing for the main conclusion. The paper must demonstrate that in-domain baselines were trained with identical epoch counts, optimizers, learning-rate schedules, data augmentations, and effective model capacity as the transferred models; otherwise the HS3-Bench comparisons cannot isolate the effect of the proposed transfer method.

minor comments (2)

Clarify the precise definition of 'limited data settings' (e.g., number of labeled samples per class) and ensure all tables report both mean and variance across runs.
Add a short related-work paragraph contrasting the proposed cross-domain approach with existing domain-adaptation techniques for hyperspectral data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight opportunities to strengthen the clarity and rigor of our claims. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Results): the central claim of 'large performance improvements' and 'reduces the performance gap' is stated without any numerical values, error bars, statistical tests, or baseline specifications. The manuscript must supply concrete metrics (e.g., mIoU deltas, standard deviations) and explicit descriptions of the in-domain and cross-modality baselines to allow readers to evaluate the magnitude and reliability of the reported gains.

Authors: We agree that the abstract and results would be strengthened by explicit numerical support. In the revised manuscript we will insert concrete mIoU values, deltas relative to baselines, standard deviations from repeated runs, and statistical significance tests (e.g., paired t-tests) into both the abstract and §4. We will also add explicit descriptions of the in-domain and cross-modality baselines, including their architectures and key hyperparameters, so readers can directly assess the magnitude and reliability of the gains. revision: yes
Referee: [§3 and §4] §3 (Experimental Setup) and §4: the attribution of observed gains to cross-domain transfer (rather than unmatched training protocols) is load-bearing for the main conclusion. The paper must demonstrate that in-domain baselines were trained with identical epoch counts, optimizers, learning-rate schedules, data augmentations, and effective model capacity as the transferred models; otherwise the HS3-Bench comparisons cannot isolate the effect of the proposed transfer method.

Authors: We confirm that the in-domain baselines were trained under identical protocols to the transferred models (same epoch count, optimizer, learning-rate schedule, data augmentations, and model capacity) precisely to isolate the contribution of cross-domain transfer. To address the concern, we will expand §3 with a dedicated paragraph and comparison table that explicitly lists the training configuration for every method. This will make the fairness of the HS3-Bench comparisons transparent and allow readers to attribute performance differences to the transfer strategy itself. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical benchmark comparisons contain no derivations or self-referential reductions.

full rationale

The paper is an empirical ML study that evaluates cross-domain transfer of hyperspectral foundation models against in-domain and cross-modality baselines on the HS3-Bench benchmark. No equations, derivations, predictions, or first-principles results are present in the abstract or described full text. Claims rest on observed performance differences rather than any fitted parameter renamed as a prediction, self-definitional construction, or load-bearing self-citation chain. The absence of a derivation chain means the central results cannot reduce to their own inputs by construction; the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that remote-sensing-trained HSI models contain features transferable to proximal sensing without modality conversion. No free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption HSI foundation models trained in remote sensing contain features that transfer effectively to proximal sensing semantic segmentation tasks
This premise underpins the entire cross-domain transfer proposal and the reported performance gains.

pith-pipeline@v0.9.0 · 5466 in / 1260 out tokens · 64626 ms · 2026-05-07T11:13:10.567785+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 16 canonical work pages

[1]

HSI-Drive: A Dataset for the Rese arch of Hyper- spectral Image Processing Applied to Autonomous Driving Sy stems

K. Basterretxea et al. “HSI-Drive: A Dataset for the Rese arch of Hyper- spectral Image Processing Applied to Autonomous Driving Sy stems”. In: 2021 IEEE Intelligent Vehicles Symposium (IV) . 2021, pp. 866–873. doi: 10.1109/IV48863.2021.9575298

work page doi:10.1109/iv48863.2021.9575298 2021
[2]

Spec- tralEarth: Training hyperspectral foundation models at scale

Nassim Ait Ali Braham et al. SpectralEarth: Training Hyperspectral Foun- dation Models at Scale . Aug. 2024. doi: 10.48550/arXiv.2408.08447. eprint: 2408.08447 (cs)

work page doi:10.48550/arxiv.2408.08447 2024
[3]

SatMAE: Pre-Training Transformers f or Temporal and Multi-Spectral Satellite Imagery

Yezhen Cong et al. “SatMAE: Pre-Training Transformers f or Temporal and Multi-Spectral Satellite Imagery”. In: Proceedings of the 36th Inter- national Conference on Neural Information Processing System s. NIPS ’22. Red Hook, NY, USA: Curran Associates Inc., Nov. 2022, pp. 197 –211

2022
[4]

Schmidt, and Geoffrey I

Angus Dempster, Daniel F. Schmidt, and Geoﬀrey I. Webb. “ MiniRocket: A Very Fast (Almost) Deterministic Transform for Time Serie s Classiﬁca- tion”. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining . KDD ’21. New York, NY, USA: Association for Computing Machinery, Aug. 2021, pp. 248–257. doi: 10.1145/3447548.3467231

work page doi:10.1145/3447548.3467231 2021
[5]

ImageNet: A large-scale hierarchical im age database

Jia Deng et al. “ImageNet: A large-scale hierarchical im age database”. In: 2009 IEEE Conference on Computer Vision and Pattern Recogniti on. 2009, pp. 248–255. doi: 10.1109/CVPR.2009.5206848

work page doi:10.1109/cvpr.2009.5206848 2009
[6]

SpectralGPT: Spectral Remote Sensi ng Foundation Model

Danfeng Hong et al. “SpectralGPT: Spectral Remote Sensi ng Foundation Model”. In: IEEE Transactions on Pattern Analysis and Machine Intelli- gence 46.8 (Aug. 2024), pp. 5227–5244. issn: 0162-8828, 2160-9292, 1939-

2024
[7]

eprint: 2311.07113 (cs)

doi: 10.1109/TPAMI.2024.3362475. eprint: 2311.07113 (cs)

work page doi:10.1109/tpami.2024.3362475 2024
[8]

Hyperspectral Adapter for Semantic Segmentation With Vision Foundation M odels

Juana Valeria Hurtado, Rohit Mohan, and Abhinav Valada. Hyperspectral Adapter for Semantic Segmentation With Vision Foundation M odels. 2026. doi: 10.1109/LRA.2026.3656795

work page doi:10.1109/lra.2026.3656795 2026
[9]

Semantic Segmentation in Satel lite Hyperspectral Imagery by Deep Learning

Jon Alvarez Justo et al. “Semantic Segmentation in Satel lite Hyperspectral Imagery by Deep Learning”. In: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 18 (2025), pp. 273–293. issn: 2151-1535. doi: 10.1109/JSTARS.2024.3487360

work page doi:10.1109/jstars.2024.3487360 2025
[10]

HyperSL: A Spectral Foundation Model f or Hyperspec- tral Image Interpretation

Weili Kong et al. “HyperSL: A Spectral Foundation Model f or Hyperspec- tral Image Interpretation”. In: IEEE Transactions on Geoscience and Re- mote Sensing 63 (2025), pp. 1–19. issn: 1558-0644. doi: 10.1109/TGRS.2025.3566205

work page doi:10.1109/tgrs.2025.3566205 2025
[11]

A General Purpose Spectral Foundational Model for Both Proximal and Remote Sensing Spectral Imaging

William Michael Laprade et al. A General Purpose Spectral Foundational Model for Both Proximal and Remote Sensing Spectral Imaging . Mar. 2025. doi: 10.48550/arXiv.2503.01628. eprint: 2503.01628 (cs)

work page doi:10.48550/arxiv.2503.01628 2025
[12]

HyperFree: A Channel-adaptive and Tu ning-free Founda- tion Model for Hyperspectral Remote Sensing Imagery

Jingtao Li et al. “HyperFree: A Channel-adaptive and Tu ning-free Founda- tion Model for Hyperspectral Remote Sensing Imagery”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogn ition. 2025, pp. 23048–23058. (Visited on 10/24/2025)

2025
[13]

HyperspectralCityV2.0

Yu Li et al. HyperspectralCityV2.0. Last Accessed: 13.03.2024. 2021. url: https://pbdl-ws.github.io/pbdl2021/challenge/downlo ad.html (vis- ited on 07/13/2022). Cross-Domain Transfer of Hyperspectral Foundation Models 15

2024
[14]

Parameter-Eﬃcient Fine-Tuning of Multispectral Foundation Models for Hyperspectral Image Classiﬁcation

Bernardin Ligan et al. Parameter-Eﬃcient Fine-Tuning of Multispectral Foundation Models for Hyperspectral Image Classiﬁcation . May 2025. doi: 10.48550/arXiv.2505.15334. eprint: 2505.15334 (cs)

work page doi:10.48550/arxiv.2505.15334 2025
[15]

DINOv2: Learning Robust Visual Fea tures without Supervision

Maxime Oquab et al. “DINOv2: Learning Robust Visual Fea tures without Supervision”. In: Trans. Mach. Learn. Res. 2024 (2024)

2024
[16]

U-Net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U -Net: Convolu- tional Networks for Biomedical Image Segmentation”. In: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 . Ed. by Nassir Navab et al. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015, pp. 234–241. isbn: 978-3-319-24574-4. doi: 10.1...

work page doi:10.1007/978-3-319-24574-4_28 2015
[17]

HDC- MiniROCKET: Explicit Time Encoding in Time Series Classiﬁcation with Hy perdimen- sional Computing

Kenny Schlegel, Peer Neubert, and Peter Protzel. “HDC- MiniROCKET: Explicit Time Encoding in Time Series Classiﬁcation with Hy perdimen- sional Computing”. In: 2022 International Joint Conference on Neural Net- works (IJCNN). July 2022, pp. 1–8. doi: 10.1109/IJCNN55064.2022.9892158

work page doi:10.1109/ijcnn55064.2022.9892158 2022
[18]

Data-Eﬃcient Spectral Classiﬁcat ion of Hyperspec- tral Data Using MiniROCKET and HDC-MiniROCKET

Nick Theisen et al. “Data-Eﬃcient Spectral Classiﬁcat ion of Hyperspec- tral Data Using MiniROCKET and HDC-MiniROCKET”. In: 2025 IEEE 21st International Conference on Automation Science and Engi neering (CASE). Aug. 2025, pp. 1865–1871. doi: 10.1109/CASE58245.2025.11163869

work page doi:10.1109/case58245.2025.11163869 2025
[19]

In: 2024 IEEE/RSJ Int

Nick Theisen et al. “HS3-Bench: A Benchmark and Strong B aseline for Hy- perspectral Semantic Segmentation in Driving Scenarios”. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IR OS). Oct. 2024, pp. 5895–5901. doi: 10.1109/IROS58592.2024.10801768

work page doi:10.1109/iros58592.2024.10801768 2024
[20]

HyperSIGMA: Hyperspectral Intelligenc e Comprehension Foundation Model

Di Wang et al. “HyperSIGMA: Hyperspectral Intelligenc e Comprehension Foundation Model”. In: IEEE Transactions on Pattern Analysis and Ma- chine Intelligence 47.8 (Aug. 2025), pp. 6427–6444. issn: 1939-3539. doi: 10.1109/TPAMI.2025.3557581

work page doi:10.1109/tpami.2025.3557581 2025
[21]

Deep Dimension Reduction for Spatial-Spectral Road Scene Classiﬁcation

Christian Winkens et al. “Deep Dimension Reduction for Spatial-Spectral Road Scene Classiﬁcation”. In: Electronic Imaging 31 (Jan. 2019), pp. 1–9. issn: 2470-1173. doi: 10.2352/ISSN.2470-1173.2019.15.AVM-049

work page doi:10.2352/issn.2470-1173.2019.15.avm-049 2019
[22]

HyKo: A Spectral Dataset for S cene Under- standing

Christian Winkens et al. “HyKo: A Spectral Dataset for S cene Under- standing”. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) . Institute of Electrical and Electronics Engineers, 2017, pp. 254–261

2017