pith. machine review for the scientific record. sign in

arxiv: 2604.26478 · v1 · submitted 2026-04-29 · 💻 cs.CV

Recognition: unknown

Cross-Domain Transfer of Hyperspectral Foundation Models

Authors on Pith no claims yet

Pith reviewed 2026-05-07 11:13 UTC · model grok-4.3

classification 💻 cs.CV
keywords hyperspectral imagingsemantic segmentationfoundation modelscross-domain transferremote sensingproximal sensingHSI
0
0 comments X

The pith

Reusing remote-sensing hyperspectral models for proximal sensing boosts segmentation accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates how to improve semantic segmentation of hyperspectral images when training data is limited in proximal sensing applications. Rather than training models solely on scarce local data or adapting RGB-based foundation models, it explores transferring hyperspectral foundation models pretrained on remote sensing data. This cross-domain approach avoids the need to bridge different imaging modalities, keeping the full spectral information and a straightforward model structure. On the HS3-Bench benchmark, the transferred models show large gains over standard in-domain training, close much of the gap to more complex cross-modality methods, and hold up well when data is scarce.

Core claim

Cross-domain transfer of hyperspectral foundation models, pretrained in remote sensing, to proximal sensing applications yields large performance improvements in semantic segmentation over in-domain in-modality training, reduces the performance gap to cross-modality approaches, and sustains strong results even in limited-data regimes.

What carries the argument

Cross-domain transfer, which applies HSI foundation models trained on remote sensing data directly to proximal sensing tasks without modality adaptation.

Load-bearing premise

Performance differences on the HS3-Bench benchmark result from the choice of transfer strategy and not from differences in training details, data handling, or model sizes, and the benchmark reflects performance in actual proximal sensing uses.

What would settle it

Running all compared methods with exactly the same training hyperparameters, data splits, and model architectures on HS3-Bench and finding no advantage for cross-domain transfer, or testing the models in a real proximal sensing deployment where accuracy drops significantly.

Figures

Figures reproduced from arXiv: 2604.26478 by Nick Theisen, Peer Neubert.

Figure 1
Figure 1. Figure 1: The typical approach for HSI semantic segmentation uses in-domain-in￾modality training (a), i. e. training models on data from the target domain (red) and target modality (blue), but many applications face limited training data availability. An established method to address this problem is cross-modality knowledge transfer (b), bridging RGB and HSI to exploit vision foundation models. These methods either … view at source ↗
Figure 1
Figure 1. Figure 1: 1. In-Domain-In-Modality Training ( view at source ↗
Figure 2
Figure 2. Figure 2: HyperSL-RU-Net consists of a HyperSL [9] backbone and an RU-Net [18] encoder-decoder module for semantic segmentation. HyperSL Encoder Spectral Tokenizer Spectral Wavelenghts Vector Class Probability Vector Feature Vector Fully￾Connected + Softmax view at source ↗
Figure 3
Figure 3. Figure 3: HyperSL-FC consists of a HyperSL [9] backbone and and a fully connected layer followed by softmax activation for spectral classification. 4 Experiments & Results In this section we first evaluate the effectiveness of cross-domain knowledge trans￾fer for HSI semantic segmentation and spectral classification in Sec. 4.2. We then compare cross-domain to cross-modality knowledge transfer in Sec. 4.3 and fi￾nal… view at source ↗
Figure 4
Figure 4. Figure 4: Improvements in mIoU-score for HyperSL-RU-Net over RU-Net. Green bars indicate positive effect, red bar indicates negative effect. less useful and the simpler RU-Net could exploit spatial features more effec￾tively given sufficient training time. However, HyperSL-RU-Net’s better results for 100% data on HSI-Drive show that, with sufficient data, HyperSL features improve performance. HyperSL-RU-Net achieved… view at source ↗
read the original abstract

Hyperspectral imaging (HSI) semantic segmentation typically relies on in-domain training, but limited data availability often restricts model performance in real-world applications. Current approaches to leverage foundation models in proximal sensing use cross-modality techniques, bridging RGB and HSI to exploit vision foundation models. However, these methods either discard spectral information or introduce architectural complexity. We propose cross-domain transfer as an alternative, reusing HSI foundation models - originally trained in remote sensing - for proximal sensing applications. By eliminating the need to bridge modality gaps, our approach preserves spectral information while maintaining a simple architecture. Using the HS3-Bench benchmark, we systematically evaluate and compare conventional in-domain, in-modality training, cross-modality transfer and cross-domain transfer strategies. Our results demonstrate that cross-domain transfer achieves large performance improvements over in-domain, in-modality training, reduces the performance gap to cross-modality approaches and maintains strong performance in limited data settings. Thus, this work advances more effective HSI semantic segmentation in diverse applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes cross-domain transfer of hyperspectral foundation models pretrained on remote-sensing data for use in proximal-sensing semantic segmentation. It systematically compares this strategy to conventional in-domain in-modality training and to cross-modality transfer on the HS3-Bench benchmark, claiming large gains over in-domain baselines, a narrowed gap to cross-modality methods, and robust performance under limited-data conditions.

Significance. If the performance differences can be rigorously attributed to the transfer strategy itself, the work offers a simpler architectural path for leveraging existing HSI foundation models while preserving full spectral information, avoiding the information loss or added complexity of RGB-HSI bridging. This could meaningfully expand the practical utility of foundation models in data-scarce proximal-sensing applications.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Results): the central claim of 'large performance improvements' and 'reduces the performance gap' is stated without any numerical values, error bars, statistical tests, or baseline specifications. The manuscript must supply concrete metrics (e.g., mIoU deltas, standard deviations) and explicit descriptions of the in-domain and cross-modality baselines to allow readers to evaluate the magnitude and reliability of the reported gains.
  2. [§3 and §4] §3 (Experimental Setup) and §4: the attribution of observed gains to cross-domain transfer (rather than unmatched training protocols) is load-bearing for the main conclusion. The paper must demonstrate that in-domain baselines were trained with identical epoch counts, optimizers, learning-rate schedules, data augmentations, and effective model capacity as the transferred models; otherwise the HS3-Bench comparisons cannot isolate the effect of the proposed transfer method.
minor comments (2)
  1. Clarify the precise definition of 'limited data settings' (e.g., number of labeled samples per class) and ensure all tables report both mean and variance across runs.
  2. Add a short related-work paragraph contrasting the proposed cross-domain approach with existing domain-adaptation techniques for hyperspectral data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight opportunities to strengthen the clarity and rigor of our claims. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Results): the central claim of 'large performance improvements' and 'reduces the performance gap' is stated without any numerical values, error bars, statistical tests, or baseline specifications. The manuscript must supply concrete metrics (e.g., mIoU deltas, standard deviations) and explicit descriptions of the in-domain and cross-modality baselines to allow readers to evaluate the magnitude and reliability of the reported gains.

    Authors: We agree that the abstract and results would be strengthened by explicit numerical support. In the revised manuscript we will insert concrete mIoU values, deltas relative to baselines, standard deviations from repeated runs, and statistical significance tests (e.g., paired t-tests) into both the abstract and §4. We will also add explicit descriptions of the in-domain and cross-modality baselines, including their architectures and key hyperparameters, so readers can directly assess the magnitude and reliability of the gains. revision: yes

  2. Referee: [§3 and §4] §3 (Experimental Setup) and §4: the attribution of observed gains to cross-domain transfer (rather than unmatched training protocols) is load-bearing for the main conclusion. The paper must demonstrate that in-domain baselines were trained with identical epoch counts, optimizers, learning-rate schedules, data augmentations, and effective model capacity as the transferred models; otherwise the HS3-Bench comparisons cannot isolate the effect of the proposed transfer method.

    Authors: We confirm that the in-domain baselines were trained under identical protocols to the transferred models (same epoch count, optimizer, learning-rate schedule, data augmentations, and model capacity) precisely to isolate the contribution of cross-domain transfer. To address the concern, we will expand §3 with a dedicated paragraph and comparison table that explicitly lists the training configuration for every method. This will make the fairness of the HS3-Bench comparisons transparent and allow readers to attribute performance differences to the transfer strategy itself. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical benchmark comparisons contain no derivations or self-referential reductions.

full rationale

The paper is an empirical ML study that evaluates cross-domain transfer of hyperspectral foundation models against in-domain and cross-modality baselines on the HS3-Bench benchmark. No equations, derivations, predictions, or first-principles results are present in the abstract or described full text. Claims rest on observed performance differences rather than any fitted parameter renamed as a prediction, self-definitional construction, or load-bearing self-citation chain. The absence of a derivation chain means the central results cannot reduce to their own inputs by construction; the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that remote-sensing-trained HSI models contain features transferable to proximal sensing without modality conversion. No free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption HSI foundation models trained in remote sensing contain features that transfer effectively to proximal sensing semantic segmentation tasks
    This premise underpins the entire cross-domain transfer proposal and the reported performance gains.

pith-pipeline@v0.9.0 · 5466 in / 1260 out tokens · 64626 ms · 2026-05-07T11:13:10.567785+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 16 canonical work pages

  1. [1]

    HSI-Drive: A Dataset for the Rese arch of Hyper- spectral Image Processing Applied to Autonomous Driving Sy stems

    K. Basterretxea et al. “HSI-Drive: A Dataset for the Rese arch of Hyper- spectral Image Processing Applied to Autonomous Driving Sy stems”. In: 2021 IEEE Intelligent Vehicles Symposium (IV) . 2021, pp. 866–873. doi: 10.1109/IV48863.2021.9575298

  2. [2]

    Spec- tralEarth: Training hyperspectral foundation models at scale

    Nassim Ait Ali Braham et al. SpectralEarth: Training Hyperspectral Foun- dation Models at Scale . Aug. 2024. doi: 10.48550/arXiv.2408.08447. eprint: 2408.08447 (cs)

  3. [3]

    SatMAE: Pre-Training Transformers f or Temporal and Multi-Spectral Satellite Imagery

    Yezhen Cong et al. “SatMAE: Pre-Training Transformers f or Temporal and Multi-Spectral Satellite Imagery”. In: Proceedings of the 36th Inter- national Conference on Neural Information Processing System s. NIPS ’22. Red Hook, NY, USA: Curran Associates Inc., Nov. 2022, pp. 197 –211

  4. [4]

    Schmidt, and Geoffrey I

    Angus Dempster, Daniel F. Schmidt, and Geoffrey I. Webb. “ MiniRocket: A Very Fast (Almost) Deterministic Transform for Time Serie s Classifica- tion”. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining . KDD ’21. New York, NY, USA: Association for Computing Machinery, Aug. 2021, pp. 248–257. doi: 10.1145/3447548.3467231

  5. [5]

    ImageNet: A large-scale hierarchical im age database

    Jia Deng et al. “ImageNet: A large-scale hierarchical im age database”. In: 2009 IEEE Conference on Computer Vision and Pattern Recogniti on. 2009, pp. 248–255. doi: 10.1109/CVPR.2009.5206848

  6. [6]

    SpectralGPT: Spectral Remote Sensi ng Foundation Model

    Danfeng Hong et al. “SpectralGPT: Spectral Remote Sensi ng Foundation Model”. In: IEEE Transactions on Pattern Analysis and Machine Intelli- gence 46.8 (Aug. 2024), pp. 5227–5244. issn: 0162-8828, 2160-9292, 1939-

  7. [7]

    eprint: 2311.07113 (cs)

    doi: 10.1109/TPAMI.2024.3362475. eprint: 2311.07113 (cs)

  8. [8]

    Hyperspectral Adapter for Semantic Segmentation With Vision Foundation M odels

    Juana Valeria Hurtado, Rohit Mohan, and Abhinav Valada. Hyperspectral Adapter for Semantic Segmentation With Vision Foundation M odels. 2026. doi: 10.1109/LRA.2026.3656795

  9. [9]

    Semantic Segmentation in Satel lite Hyperspectral Imagery by Deep Learning

    Jon Alvarez Justo et al. “Semantic Segmentation in Satel lite Hyperspectral Imagery by Deep Learning”. In: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 18 (2025), pp. 273–293. issn: 2151-1535. doi: 10.1109/JSTARS.2024.3487360

  10. [10]

    HyperSL: A Spectral Foundation Model f or Hyperspec- tral Image Interpretation

    Weili Kong et al. “HyperSL: A Spectral Foundation Model f or Hyperspec- tral Image Interpretation”. In: IEEE Transactions on Geoscience and Re- mote Sensing 63 (2025), pp. 1–19. issn: 1558-0644. doi: 10.1109/TGRS.2025.3566205

  11. [11]

    A General Purpose Spectral Foundational Model for Both Proximal and Remote Sensing Spectral Imaging

    William Michael Laprade et al. A General Purpose Spectral Foundational Model for Both Proximal and Remote Sensing Spectral Imaging . Mar. 2025. doi: 10.48550/arXiv.2503.01628. eprint: 2503.01628 (cs)

  12. [12]

    HyperFree: A Channel-adaptive and Tu ning-free Founda- tion Model for Hyperspectral Remote Sensing Imagery

    Jingtao Li et al. “HyperFree: A Channel-adaptive and Tu ning-free Founda- tion Model for Hyperspectral Remote Sensing Imagery”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogn ition. 2025, pp. 23048–23058. (Visited on 10/24/2025)

  13. [13]

    HyperspectralCityV2.0

    Yu Li et al. HyperspectralCityV2.0. Last Accessed: 13.03.2024. 2021. url: https://pbdl-ws.github.io/pbdl2021/challenge/downlo ad.html (vis- ited on 07/13/2022). Cross-Domain Transfer of Hyperspectral Foundation Models 15

  14. [14]

    Parameter-Efficient Fine-Tuning of Multispectral Foundation Models for Hyperspectral Image Classification

    Bernardin Ligan et al. Parameter-Efficient Fine-Tuning of Multispectral Foundation Models for Hyperspectral Image Classification . May 2025. doi: 10.48550/arXiv.2505.15334. eprint: 2505.15334 (cs)

  15. [15]

    DINOv2: Learning Robust Visual Fea tures without Supervision

    Maxime Oquab et al. “DINOv2: Learning Robust Visual Fea tures without Supervision”. In: Trans. Mach. Learn. Res. 2024 (2024)

  16. [16]

    U-Net: Convolutional networks for biomedical image segmentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U -Net: Convolu- tional Networks for Biomedical Image Segmentation”. In: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 . Ed. by Nassir Navab et al. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015, pp. 234–241. isbn: 978-3-319-24574-4. doi: 10.1...

  17. [17]

    HDC- MiniROCKET: Explicit Time Encoding in Time Series Classification with Hy perdimen- sional Computing

    Kenny Schlegel, Peer Neubert, and Peter Protzel. “HDC- MiniROCKET: Explicit Time Encoding in Time Series Classification with Hy perdimen- sional Computing”. In: 2022 International Joint Conference on Neural Net- works (IJCNN). July 2022, pp. 1–8. doi: 10.1109/IJCNN55064.2022.9892158

  18. [18]

    Data-Efficient Spectral Classificat ion of Hyperspec- tral Data Using MiniROCKET and HDC-MiniROCKET

    Nick Theisen et al. “Data-Efficient Spectral Classificat ion of Hyperspec- tral Data Using MiniROCKET and HDC-MiniROCKET”. In: 2025 IEEE 21st International Conference on Automation Science and Engi neering (CASE). Aug. 2025, pp. 1865–1871. doi: 10.1109/CASE58245.2025.11163869

  19. [19]

    In: 2024 IEEE/RSJ Int

    Nick Theisen et al. “HS3-Bench: A Benchmark and Strong B aseline for Hy- perspectral Semantic Segmentation in Driving Scenarios”. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IR OS). Oct. 2024, pp. 5895–5901. doi: 10.1109/IROS58592.2024.10801768

  20. [20]

    HyperSIGMA: Hyperspectral Intelligenc e Comprehension Foundation Model

    Di Wang et al. “HyperSIGMA: Hyperspectral Intelligenc e Comprehension Foundation Model”. In: IEEE Transactions on Pattern Analysis and Ma- chine Intelligence 47.8 (Aug. 2025), pp. 6427–6444. issn: 1939-3539. doi: 10.1109/TPAMI.2025.3557581

  21. [21]

    Deep Dimension Reduction for Spatial-Spectral Road Scene Classification

    Christian Winkens et al. “Deep Dimension Reduction for Spatial-Spectral Road Scene Classification”. In: Electronic Imaging 31 (Jan. 2019), pp. 1–9. issn: 2470-1173. doi: 10.2352/ISSN.2470-1173.2019.15.AVM-049

  22. [22]

    HyKo: A Spectral Dataset for S cene Under- standing

    Christian Winkens et al. “HyKo: A Spectral Dataset for S cene Under- standing”. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) . Institute of Electrical and Electronics Engineers, 2017, pp. 254–261