Recognition: unknown
Cross-Domain Transfer of Hyperspectral Foundation Models
Pith reviewed 2026-05-07 11:13 UTC · model grok-4.3
The pith
Reusing remote-sensing hyperspectral models for proximal sensing boosts segmentation accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Cross-domain transfer of hyperspectral foundation models, pretrained in remote sensing, to proximal sensing applications yields large performance improvements in semantic segmentation over in-domain in-modality training, reduces the performance gap to cross-modality approaches, and sustains strong results even in limited-data regimes.
What carries the argument
Cross-domain transfer, which applies HSI foundation models trained on remote sensing data directly to proximal sensing tasks without modality adaptation.
Load-bearing premise
Performance differences on the HS3-Bench benchmark result from the choice of transfer strategy and not from differences in training details, data handling, or model sizes, and the benchmark reflects performance in actual proximal sensing uses.
What would settle it
Running all compared methods with exactly the same training hyperparameters, data splits, and model architectures on HS3-Bench and finding no advantage for cross-domain transfer, or testing the models in a real proximal sensing deployment where accuracy drops significantly.
Figures
read the original abstract
Hyperspectral imaging (HSI) semantic segmentation typically relies on in-domain training, but limited data availability often restricts model performance in real-world applications. Current approaches to leverage foundation models in proximal sensing use cross-modality techniques, bridging RGB and HSI to exploit vision foundation models. However, these methods either discard spectral information or introduce architectural complexity. We propose cross-domain transfer as an alternative, reusing HSI foundation models - originally trained in remote sensing - for proximal sensing applications. By eliminating the need to bridge modality gaps, our approach preserves spectral information while maintaining a simple architecture. Using the HS3-Bench benchmark, we systematically evaluate and compare conventional in-domain, in-modality training, cross-modality transfer and cross-domain transfer strategies. Our results demonstrate that cross-domain transfer achieves large performance improvements over in-domain, in-modality training, reduces the performance gap to cross-modality approaches and maintains strong performance in limited data settings. Thus, this work advances more effective HSI semantic segmentation in diverse applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes cross-domain transfer of hyperspectral foundation models pretrained on remote-sensing data for use in proximal-sensing semantic segmentation. It systematically compares this strategy to conventional in-domain in-modality training and to cross-modality transfer on the HS3-Bench benchmark, claiming large gains over in-domain baselines, a narrowed gap to cross-modality methods, and robust performance under limited-data conditions.
Significance. If the performance differences can be rigorously attributed to the transfer strategy itself, the work offers a simpler architectural path for leveraging existing HSI foundation models while preserving full spectral information, avoiding the information loss or added complexity of RGB-HSI bridging. This could meaningfully expand the practical utility of foundation models in data-scarce proximal-sensing applications.
major comments (2)
- [Abstract and §4] Abstract and §4 (Results): the central claim of 'large performance improvements' and 'reduces the performance gap' is stated without any numerical values, error bars, statistical tests, or baseline specifications. The manuscript must supply concrete metrics (e.g., mIoU deltas, standard deviations) and explicit descriptions of the in-domain and cross-modality baselines to allow readers to evaluate the magnitude and reliability of the reported gains.
- [§3 and §4] §3 (Experimental Setup) and §4: the attribution of observed gains to cross-domain transfer (rather than unmatched training protocols) is load-bearing for the main conclusion. The paper must demonstrate that in-domain baselines were trained with identical epoch counts, optimizers, learning-rate schedules, data augmentations, and effective model capacity as the transferred models; otherwise the HS3-Bench comparisons cannot isolate the effect of the proposed transfer method.
minor comments (2)
- Clarify the precise definition of 'limited data settings' (e.g., number of labeled samples per class) and ensure all tables report both mean and variance across runs.
- Add a short related-work paragraph contrasting the proposed cross-domain approach with existing domain-adaptation techniques for hyperspectral data.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight opportunities to strengthen the clarity and rigor of our claims. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Results): the central claim of 'large performance improvements' and 'reduces the performance gap' is stated without any numerical values, error bars, statistical tests, or baseline specifications. The manuscript must supply concrete metrics (e.g., mIoU deltas, standard deviations) and explicit descriptions of the in-domain and cross-modality baselines to allow readers to evaluate the magnitude and reliability of the reported gains.
Authors: We agree that the abstract and results would be strengthened by explicit numerical support. In the revised manuscript we will insert concrete mIoU values, deltas relative to baselines, standard deviations from repeated runs, and statistical significance tests (e.g., paired t-tests) into both the abstract and §4. We will also add explicit descriptions of the in-domain and cross-modality baselines, including their architectures and key hyperparameters, so readers can directly assess the magnitude and reliability of the gains. revision: yes
-
Referee: [§3 and §4] §3 (Experimental Setup) and §4: the attribution of observed gains to cross-domain transfer (rather than unmatched training protocols) is load-bearing for the main conclusion. The paper must demonstrate that in-domain baselines were trained with identical epoch counts, optimizers, learning-rate schedules, data augmentations, and effective model capacity as the transferred models; otherwise the HS3-Bench comparisons cannot isolate the effect of the proposed transfer method.
Authors: We confirm that the in-domain baselines were trained under identical protocols to the transferred models (same epoch count, optimizer, learning-rate schedule, data augmentations, and model capacity) precisely to isolate the contribution of cross-domain transfer. To address the concern, we will expand §3 with a dedicated paragraph and comparison table that explicitly lists the training configuration for every method. This will make the fairness of the HS3-Bench comparisons transparent and allow readers to attribute performance differences to the transfer strategy itself. revision: yes
Circularity Check
No circularity; empirical benchmark comparisons contain no derivations or self-referential reductions.
full rationale
The paper is an empirical ML study that evaluates cross-domain transfer of hyperspectral foundation models against in-domain and cross-modality baselines on the HS3-Bench benchmark. No equations, derivations, predictions, or first-principles results are present in the abstract or described full text. Claims rest on observed performance differences rather than any fitted parameter renamed as a prediction, self-definitional construction, or load-bearing self-citation chain. The absence of a derivation chain means the central results cannot reduce to their own inputs by construction; the work is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption HSI foundation models trained in remote sensing contain features that transfer effectively to proximal sensing semantic segmentation tasks
Reference graph
Works this paper leans on
-
[1]
K. Basterretxea et al. “HSI-Drive: A Dataset for the Rese arch of Hyper- spectral Image Processing Applied to Autonomous Driving Sy stems”. In: 2021 IEEE Intelligent Vehicles Symposium (IV) . 2021, pp. 866–873. doi: 10.1109/IV48863.2021.9575298
-
[2]
Spec- tralEarth: Training hyperspectral foundation models at scale
Nassim Ait Ali Braham et al. SpectralEarth: Training Hyperspectral Foun- dation Models at Scale . Aug. 2024. doi: 10.48550/arXiv.2408.08447. eprint: 2408.08447 (cs)
-
[3]
SatMAE: Pre-Training Transformers f or Temporal and Multi-Spectral Satellite Imagery
Yezhen Cong et al. “SatMAE: Pre-Training Transformers f or Temporal and Multi-Spectral Satellite Imagery”. In: Proceedings of the 36th Inter- national Conference on Neural Information Processing System s. NIPS ’22. Red Hook, NY, USA: Curran Associates Inc., Nov. 2022, pp. 197 –211
2022
-
[4]
Angus Dempster, Daniel F. Schmidt, and Geoffrey I. Webb. “ MiniRocket: A Very Fast (Almost) Deterministic Transform for Time Serie s Classifica- tion”. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining . KDD ’21. New York, NY, USA: Association for Computing Machinery, Aug. 2021, pp. 248–257. doi: 10.1145/3447548.3467231
-
[5]
ImageNet: A large-scale hierarchical im age database
Jia Deng et al. “ImageNet: A large-scale hierarchical im age database”. In: 2009 IEEE Conference on Computer Vision and Pattern Recogniti on. 2009, pp. 248–255. doi: 10.1109/CVPR.2009.5206848
-
[6]
SpectralGPT: Spectral Remote Sensi ng Foundation Model
Danfeng Hong et al. “SpectralGPT: Spectral Remote Sensi ng Foundation Model”. In: IEEE Transactions on Pattern Analysis and Machine Intelli- gence 46.8 (Aug. 2024), pp. 5227–5244. issn: 0162-8828, 2160-9292, 1939-
2024
-
[7]
doi: 10.1109/TPAMI.2024.3362475. eprint: 2311.07113 (cs)
-
[8]
Hyperspectral Adapter for Semantic Segmentation With Vision Foundation M odels
Juana Valeria Hurtado, Rohit Mohan, and Abhinav Valada. Hyperspectral Adapter for Semantic Segmentation With Vision Foundation M odels. 2026. doi: 10.1109/LRA.2026.3656795
-
[9]
Semantic Segmentation in Satel lite Hyperspectral Imagery by Deep Learning
Jon Alvarez Justo et al. “Semantic Segmentation in Satel lite Hyperspectral Imagery by Deep Learning”. In: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 18 (2025), pp. 273–293. issn: 2151-1535. doi: 10.1109/JSTARS.2024.3487360
-
[10]
HyperSL: A Spectral Foundation Model f or Hyperspec- tral Image Interpretation
Weili Kong et al. “HyperSL: A Spectral Foundation Model f or Hyperspec- tral Image Interpretation”. In: IEEE Transactions on Geoscience and Re- mote Sensing 63 (2025), pp. 1–19. issn: 1558-0644. doi: 10.1109/TGRS.2025.3566205
-
[11]
A General Purpose Spectral Foundational Model for Both Proximal and Remote Sensing Spectral Imaging
William Michael Laprade et al. A General Purpose Spectral Foundational Model for Both Proximal and Remote Sensing Spectral Imaging . Mar. 2025. doi: 10.48550/arXiv.2503.01628. eprint: 2503.01628 (cs)
-
[12]
HyperFree: A Channel-adaptive and Tu ning-free Founda- tion Model for Hyperspectral Remote Sensing Imagery
Jingtao Li et al. “HyperFree: A Channel-adaptive and Tu ning-free Founda- tion Model for Hyperspectral Remote Sensing Imagery”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogn ition. 2025, pp. 23048–23058. (Visited on 10/24/2025)
2025
-
[13]
HyperspectralCityV2.0
Yu Li et al. HyperspectralCityV2.0. Last Accessed: 13.03.2024. 2021. url: https://pbdl-ws.github.io/pbdl2021/challenge/downlo ad.html (vis- ited on 07/13/2022). Cross-Domain Transfer of Hyperspectral Foundation Models 15
2024
-
[14]
Bernardin Ligan et al. Parameter-Efficient Fine-Tuning of Multispectral Foundation Models for Hyperspectral Image Classification . May 2025. doi: 10.48550/arXiv.2505.15334. eprint: 2505.15334 (cs)
-
[15]
DINOv2: Learning Robust Visual Fea tures without Supervision
Maxime Oquab et al. “DINOv2: Learning Robust Visual Fea tures without Supervision”. In: Trans. Mach. Learn. Res. 2024 (2024)
2024
-
[16]
U-Net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U -Net: Convolu- tional Networks for Biomedical Image Segmentation”. In: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 . Ed. by Nassir Navab et al. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015, pp. 234–241. isbn: 978-3-319-24574-4. doi: 10.1...
-
[17]
Kenny Schlegel, Peer Neubert, and Peter Protzel. “HDC- MiniROCKET: Explicit Time Encoding in Time Series Classification with Hy perdimen- sional Computing”. In: 2022 International Joint Conference on Neural Net- works (IJCNN). July 2022, pp. 1–8. doi: 10.1109/IJCNN55064.2022.9892158
-
[18]
Data-Efficient Spectral Classificat ion of Hyperspec- tral Data Using MiniROCKET and HDC-MiniROCKET
Nick Theisen et al. “Data-Efficient Spectral Classificat ion of Hyperspec- tral Data Using MiniROCKET and HDC-MiniROCKET”. In: 2025 IEEE 21st International Conference on Automation Science and Engi neering (CASE). Aug. 2025, pp. 1865–1871. doi: 10.1109/CASE58245.2025.11163869
-
[19]
Nick Theisen et al. “HS3-Bench: A Benchmark and Strong B aseline for Hy- perspectral Semantic Segmentation in Driving Scenarios”. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IR OS). Oct. 2024, pp. 5895–5901. doi: 10.1109/IROS58592.2024.10801768
-
[20]
HyperSIGMA: Hyperspectral Intelligenc e Comprehension Foundation Model
Di Wang et al. “HyperSIGMA: Hyperspectral Intelligenc e Comprehension Foundation Model”. In: IEEE Transactions on Pattern Analysis and Ma- chine Intelligence 47.8 (Aug. 2025), pp. 6427–6444. issn: 1939-3539. doi: 10.1109/TPAMI.2025.3557581
-
[21]
Deep Dimension Reduction for Spatial-Spectral Road Scene Classification
Christian Winkens et al. “Deep Dimension Reduction for Spatial-Spectral Road Scene Classification”. In: Electronic Imaging 31 (Jan. 2019), pp. 1–9. issn: 2470-1173. doi: 10.2352/ISSN.2470-1173.2019.15.AVM-049
-
[22]
HyKo: A Spectral Dataset for S cene Under- standing
Christian Winkens et al. “HyKo: A Spectral Dataset for S cene Under- standing”. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) . Institute of Electrical and Electronics Engineers, 2017, pp. 254–261
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.