SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining
Pith reviewed 2026-05-21 05:21 UTC · model grok-4.3
The pith
SpectralEarth-FM uses a hierarchical transformer with spectral tokenization and cross-sensor fusion to jointly pretrain on hyperspectral imagery and other Earth observation sensors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SpectralEarth-FM is a hierarchical transformer for multisensor EO input with heterogeneous spectral dimensionality. The architecture combines spectral tokenization for hyperspectral inputs, sensor-specific encoders, a cross-sensor fusion module, and a shared hierarchical encoder, enabling joint processing of HSI and lower-channel observations. Pretraining on the curated SpectralEarth-MM dataset with a Joint-Embedding Predictive Architecture objective produces representations that achieve state-of-the-art results on hyperspectral downstream tasks and standard EO benchmarks under the PANGAEA protocol.
What carries the argument
Cross-sensor fusion module that integrates outputs from sensor-specific encoders before the shared hierarchical encoder in a transformer that also applies spectral tokenization to hyperspectral inputs.
If this is right
- Hyperspectral imagery can now be included in the same pretraining pipeline as multispectral and SAR data without requiring separate models.
- Representations learned this way improve results on both hyperspectral-specific tasks and conventional EO benchmarks.
- A single model can accept inputs from sensors with widely varying channel counts after the fusion stage.
- The JEPA-style matching of global and single-sensor local views scales to heterogeneous sensor stacks.
Where Pith is reading between the lines
- The same fusion approach could be tested on temporal sequences to see whether it captures change signals across sensor types.
- If the alignment assumption holds, the method might extend to other high-dimensional remote-sensing domains such as atmospheric sounding.
- Downstream applications that combine optical and radar data could gain from the joint hyperspectral embeddings without retraining separate heads.
Load-bearing premise
The co-located patches from EnMAP, EMIT, DESIS, Sentinel-2, Landsat, LST and Sentinel-1 supply sufficiently aligned and representative training signal for the fusion module to learn useful joint representations instead of sensor-specific artifacts.
What would settle it
Performance on downstream hyperspectral tasks drops to the level of single-sensor baselines when the cross-sensor fusion module is removed or when training uses only non-overlapping sensor footprints.
Figures
read the original abstract
Earth observation (EO) foundation models (FMs) are increasingly trained on multisensor data, spanning multispectral imagery (MSI), synthetic aperture radar (SAR), and derived geospatial layers, but hyperspectral imagery (HSI) remains underrepresented. Conversely, existing hyperspectral FMs are trained on HSI alone, leaving joint pretraining and fusion of HSI with co-located EO sensors unexplored. We introduce SpectralEarth-FM, a hierarchical transformer for multisensor EO input with heterogeneous spectral dimensionality. The architecture combines spectral tokenization for hyperspectral inputs, sensor-specific encoders, a cross-sensor fusion module, and a shared hierarchical encoder, enabling joint processing of HSI and lower-channel observations. To pretrain SpectralEarth-FM, we curate SpectralEarth-MM, a dataset that co-locates HSI from three spaceborne sensors (EnMAP, EMIT, DESIS) with Sentinel-2, Landsat-8/9 optical imagery, Landsat land surface temperature (LST), and Sentinel-1 SAR, over common geographic footprints. It comprises approximately 2M globally distributed locations, 25M georeferenced patches, and over 40TB of data. Pretraining uses a Joint-Embedding Predictive Architecture (JEPA)-style objective that matches representations between global views and single-sensor local views from the same location. We evaluate SpectralEarth-FM on hyperspectral downstream tasks and standard EO benchmarks following the PANGAEA protocol, achieving state-of-the-art results across both evaluation settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SpectralEarth-FM, a hierarchical transformer architecture for multisensor Earth observation pretraining that incorporates hyperspectral imagery (HSI) from EnMAP, EMIT, and DESIS alongside Sentinel-2, Landsat, LST, and Sentinel-1 data. It curates the SpectralEarth-MM dataset of approximately 2M globally distributed co-located patches and pretrains using a JEPA-style objective that matches global multi-sensor views to single-sensor local views from the same location. The model is evaluated on hyperspectral downstream tasks and PANGAEA benchmarks, with claims of state-of-the-art results in both settings.
Significance. If the performance claims hold after addressing alignment concerns, this would be a meaningful advance in multimodal EO foundation models by integrating previously underrepresented HSI data into joint pretraining. The large-scale dataset curation and the sensor-specific encoder plus cross-sensor fusion design represent concrete contributions that could improve cross-modal representations for remote sensing applications.
major comments (2)
- [§3] §3 (Dataset Curation): The description of SpectralEarth-MM provides no quantitative alignment metrics (e.g., mean temporal offset between HSI and MSI/SAR acquisitions, spatial registration RMSE, or cloud-cover overlap statistics). Because the JEPA objective relies on the assumption that co-located patches supply aligned multi-sensor signals for the fusion module to learn joint rather than artifact-driven representations, the absence of these metrics leaves open the possibility that reported gains reflect dataset scale or sensor-specific biases instead of genuine multimodal fusion.
- [§5] §5 (Experiments): The manuscript claims state-of-the-art results on PANGAEA and hyperspectral tasks but does not report full baseline tables, ablation studies isolating the cross-sensor fusion module, number of random seeds, or error bars. Without these, it is impossible to verify that the gains are robust to baseline choices, data splits, or the specific alignment properties of the curated patches.
minor comments (2)
- [§2] The notation for the hierarchical encoder and fusion module could be clarified with an explicit diagram showing token flow between sensor-specific encoders and the shared backbone.
- A few figure captions (e.g., Figure 3) omit the exact number of patches or geographic distribution statistics shown in the plots.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address the major concerns point by point below, agreeing where revisions are needed to improve clarity and rigor.
read point-by-point responses
-
Referee: [§3] §3 (Dataset Curation): The description of SpectralEarth-MM provides no quantitative alignment metrics (e.g., mean temporal offset between HSI and MSI/SAR acquisitions, spatial registration RMSE, or cloud-cover overlap statistics). Because the JEPA objective relies on the assumption that co-located patches supply aligned multi-sensor signals for the fusion module to learn joint rather than artifact-driven representations, the absence of these metrics leaves open the possibility that reported gains reflect dataset scale or sensor-specific biases instead of genuine multimodal fusion.
Authors: We agree that quantitative alignment metrics are important to substantiate the quality of the SpectralEarth-MM dataset and the validity of the JEPA pretraining objective. Although the dataset was curated using georeferenced patches from overlapping sensor footprints with efforts to minimize temporal discrepancies, we did not include explicit statistics in the original submission. In the revised manuscript, we will add these metrics to Section 3, including average temporal offsets between acquisitions, spatial registration accuracy from the source metadata, and cloud cover overlap percentages. This will allow readers to better assess the alignment quality. revision: yes
-
Referee: [§5] §5 (Experiments): The manuscript claims state-of-the-art results on PANGAEA and hyperspectral tasks but does not report full baseline tables, ablation studies isolating the cross-sensor fusion module, number of random seeds, or error bars. Without these, it is impossible to verify that the gains are robust to baseline choices, data splits, or the specific alignment properties of the curated patches.
Authors: We acknowledge that additional details on the experimental setup and results would enhance the verifiability of our claims. We will expand Section 5 to include complete baseline comparison tables, ablation studies specifically isolating the contribution of the cross-sensor fusion module, and report performance metrics averaged over multiple random seeds with standard error bars. These additions will demonstrate the robustness of the reported improvements. revision: yes
Circularity Check
No significant circularity detected; derivation applies established JEPA objective to new multimodal dataset and architecture.
full rationale
The paper's central chain consists of curating SpectralEarth-MM (co-located HSI/MSI/SAR patches), defining a hierarchical transformer with sensor-specific encoders plus cross-sensor fusion, and applying a JEPA-style matching objective between global multi-sensor views and single-sensor local views. This objective is explicitly drawn from prior literature rather than derived within the paper, and the reported SOTA results on hyperspectral and PANGAEA benchmarks are presented as empirical outcomes of training on the new ~2M-location dataset. No equations, parameter fits, or self-citations are shown that reduce the architecture, objective, or performance claims to tautological inputs by construction. The derivation remains self-contained with independent content from the dataset curation and architectural choices.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, Y . LeCun, and N. Ballas. Self-supervised learning from images with a joint-embedding predictive architecture.arXiv preprint arXiv:2301.08243, 2023
- [3]
- [4]
-
[5]
Walk in the cloud: Learning curves for point clouds shape analysis, pp
K. Ayush, B. Uzkent, C. Meng, K. Tanmay, M. Burke, D. Lobell, and S. Ermon. Geography- aware self-supervised learning. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 10161–10170, 2021. doi: 10.1109/ICCV48922.2021.01002
-
[6]
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
R. Balestriero and Y . LeCun. Lejepa: Provable and scalable self-supervised learning without the heuristics.arXiv preprint arXiv:2511.08544, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)
F. Bastani, P. Wolters, R. Gupta, J. Ferdinando, and A. Kembhavi. Satlaspretrain: A large-scale dataset for remote sensing image understanding. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 16726–16736, 2023. doi: 10.1109/ICCV51070.2023.01538
-
[8]
A. Baumann, L. Ayala, S. Seidlitz, J. Sellner, A. Studier-Fischer, B. Özdemir, L. Maier-hein, and S. Ilic. CARL: Camera-agnostic representation learning for spectral image analysis. In The F ourteenth International Conference on Learning Representations, 2026. URL https: //openreview.net/forum?id=TpbhS1yfz0
work page 2026
-
[9]
B. Blumenstiel, P. Fraccaro, V . Marsocci, J. Jakubik, S. Maurogiovanni, M. Czerkawski, R. Sedona, G. Cavallaro, T. Brunschwiler, J. Bernabe-Moreno, et al. Terramesh: A planetary mosaic of multimodal earth observation data.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025
work page 2025
-
[10]
N. A. A. Braham, C. M. Albrecht, J. Mairal, J. Chanussot, Y . Wang, and X. X. Zhu. Spectralearth: Training hyperspectral foundation models at scale.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 18:16780–16797, 2025. doi: 10.1109/JSTARS.2025. 3581451
-
[11]
C. F. Brown, M. R. Kazmierski, V . J. Pasquarella, W. J. Rucklidge, M. Samsikova, C. Zhang, E. Shelhamer, E. Lahera, O. Wiles, S. Ilyushchenko, et al. Alphaearth foundations: An embedding field model for accurate and efficient global mapping from sparse label data.arXiv preprint arXiv:2507.22291, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
H. Chen, W. Zhao, T. Xu, G. Shi, S. Zhou, P. Liu, and J. Li. Spectral-wise implicit neural representation for hyperspectral image reconstruction.IEEE Transactions on Circuits and Systems for Video Technology, 34(5):3714–3727, 2024. doi: 10.1109/TCSVT.2023.3318366
-
[13]
Derf: Decomposed radiance fields,
X. Chen and K. He. Exploring simple siamese representation learning. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15745–15753, 2021. doi: 10.1109/ CVPR46437.2021.01549
-
[14]
Y . Cong, S. Khanna, C. Meng, P. Liu, E. Rozi, Y . He, M. Burke, D. B. Lobell, and S. Ermon. SatMAE: Pre-training transformers for temporal and multi-spectral satellite imagery. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors,Advances in Neural Information Processing Systems, 2022. URLhttps://openreview.net/forum?id=WBhqzpF6KYH. 11
work page 2022
-
[15]
M. S. Danish, M. A. Munir, S. R. A. Shah, M. H. Khan, R. M. Anwer, J. Laaksonen, F. S. Khan, and S. Khan. TerraFM: A scalable foundation model for unified multisensor earth observation. InThe F ourteenth International Conference on Learning Representations, 2026
work page 2026
-
[16]
Copernicus legal notice: Free, full and open access to Sentinel data, 2024
European Union. Copernicus legal notice: Free, full and open access to Sentinel data, 2024. URL https://www.copernicus.eu/en/terms-use/how-access-data . Covers Sentinel- 1 and Sentinel-2 data access and exploitation for any public or private organization
work page 2024
-
[17]
T. Forgaard, J. H. Reksten, A. U. Waldeland, V . Marsocci, N. Longépé, M. Kampffmeyer, and A.-B. Salberg. Thor: A versatile foundation model for earth observation climate and society applications.arXiv preprint arXiv:2601.16011, 2026
-
[18]
A. Francis and M. Czerkawski. Major tom: Expandable datasets for earth observation. In2024 IEEE International Geoscience and Remote Sensing Symposium, pages 2935–2940, 2024. doi: 10.1109/IGARSS53475.2024.10640760
-
[19]
M. H. P. Fuchs and B. Demir. Hyspecnet-11k: a large-scale hyperspectral dataset for benchmarking learning-based hyperspectral image compression methods. In2023 IEEE International Geoscience and Remote Sensing Symposium, pages 1779–1782, 2023. doi: 10.1109/IGARSS52108.2023.10283385
-
[20]
A. Fuller, K. Millard, and J. Green. Croma: Remote sensing representations with contrastive radar-optical masked autoencoders. In A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Infor- mation Processing Systems, volume 36, pages 5506–5538. Curran Associates, Inc.,
-
[21]
URL https://proceedings.neurips.cc/paper_files/paper/2023/file/ 11822e84689e631615199db3b75cd0e4-Paper-Conference.pdf
work page 2023
-
[22]
V . S. F. Garnot and L. Landrieu. Panoptic segmentation of satellite image time series with convolutional temporal attention networks. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4872–4881, 2021
work page 2021
-
[23]
V . S. F. Garnot, L. Landrieu, and N. Chehata. Multi-modal temporal attention models for crop mapping from satellite time series.ISPRS Journal of Photogrammetry and Remote Sensing, 187:294–305, 2022
work page 2022
-
[24]
EnMAP - environmental mapping and analysis program data policy and access
German Aerospace Center (DLR). EnMAP - environmental mapping and analysis program data policy and access. https://www.enmap.org/data/resources/EnMAP_Data_License. pdf, 2023. URL https://www.enmap.org/data_access/. Scientific and commercial use permitted as per the EnMAP Data License Agreement
work page 2023
-
[25]
License agreement regarding the use of the DESIS data for scientific use, 2024
German Aerospace Center (DLR). License agreement regarding the use of the DESIS data for scientific use, 2024. URL https://geoservice.dlr.de/resources/licenses/desis/ DESIS_License_Agreement_for_Scientific_Use.pdf. Free for non-commercial scien- tific research; commercial use managed by Teledyne Brown Engineering
work page 2024
-
[26]
German Aerospace Center (DLR). EOWEB GeoPortal. https://eoweb.dlr.de/egp/, 2024. Accessed: 2025
work page 2024
-
[27]
German Remote Sensing Data Center, 2.7 edition, 2026
German Aerospace Center (DLR).EnMAP Frequently Asked Questions (F AQ). German Remote Sensing Data Center, 2.7 edition, 2026. URL https://www.enmap.org/data/doc/EnMAP_ FAQ.pdf
work page 2026
-
[28]
R. O. Green, N. Mahowald, C. Ung, D. R. Thompson, L. Bator, M. Bennet, M. Bernas, N. Blackway, C. Bradley, J. Cha, et al. The earth surface mineral dust source investigation: An earth science imaging spectroscopy mission. In2020 IEEE aerospace conference, pages 1–15. IEEE, 2020
work page 2020
-
[29]
J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar, et al. Bootstrap your own latent-a new ap- proach to self-supervised learning.Advances in neural information processing systems, 33: 21271–21284, 2020. 12
work page 2020
-
[30]
L. Guanter, H. Kaufmann, K. Segl, S. Foerster, C. Rogass, S. Chabrillat, T. Kuester, A. Hollstein, G. Rossner, C. Chlebek, C. Straif, S. Fischer, S. Schrader, T. Storch, U. Heiden, A. Mueller, M. Bachmann, H. Mühle, R. Müller, M. Habermeyer, A. Ohndorf, J. Hill, H. Buddenbaum, P. Hostert, S. Van der Linden, P. J. Leitão, A. Rabe, R. Doerffer, H. Krasemann...
work page 2015
-
[31]
URLhttps://www.mdpi.com/2072-4292/7/7/8830
doi: 10.3390/rs70708830. URLhttps://www.mdpi.com/2072-4292/7/7/8830
-
[32]
K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022
work page 2022
-
[33]
D. Hong, Z. Han, J. Yao, L. Gao, B. Zhang, A. Plaza, and J. Chanussot. Spectralformer: Rethink- ing hyperspectral image classification with transformers.IEEE Transactions on Geoscience and Remote Sensing, 60:1–15, 2022. doi: 10.1109/TGRS.2021.3130716
-
[34]
D. Hong, B. Zhang, H. Li, Y . Li, J. Yao, C. Li, M. Werner, J. Chanussot, A. Zipf, and X. X. Zhu. Cross-city matters: A multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks.Remote Sensing of Environment, 299:113856, 2023
work page 2023
-
[35]
D. Hong, B. Zhang, X. Li, Y . Li, C. Li, J. Yao, N. Yokoya, H. Li, P. Ghamisi, X. Jia, A. Plaza, P. Gamba, J. A. Benediktsson, and J. Chanussot. Spectralgpt: Spectral remote sensing foundation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5227–5244,
-
[36]
doi: 10.1109/TPAMI.2024.3362475
-
[37]
J. Jakubik, S. Roy, C. Phillips, P. Fraccaro, D. Godwin, B. Zadrozny, D. Szwarcman, C. Gomes, G. Nyirjesy, B. Edwards, et al. Foundation models for generalist geospatial artificial intelligence. arXiv preprint arXiv:2310.18660, 2023
-
[38]
J. Jakubik, F. Yang, B. Blumenstiel, E. Scheurer, R. Sedona, S. Maurogiovanni, J. Bosmans, N. Dionelis, V . Marsocci, N. Kopp, R. Ramachandran, P. Fraccaro, T. Brunschwiler, G. Caval- laro, J. Bernabe-Moreno, and N. Longépé. Terramind: Large-scale generative multimodality for earth observation. InIEEE/CVF International Conference on Computer Vision (ICCV)...
-
[39]
R. Ji, X. Wang, C. Niu, W. Zhang, Y . Mei, and K. Tan. Specaware: A spectral-content aware foundation model for unifying multi-sensor learning in hyperspectral remote sensing mapping. ISPRS Journal of Photogrammetry and Remote Sensing, 234:242–260, 2026. ISSN 0924-2716. doi: https://doi.org/10.1016/j.isprsjprs.2026.02.024. URL https://www.sciencedirect. c...
- [40]
-
[41]
W. Kong, B. Liu, X. Bi, C. Yu, X. Li, and Y . Chen. Hypersl: A spectral foundation model for hyperspectral image interpretation.IEEE Transactions on Geoscience and Remote Sensing, 63: 1–19, 2025. doi: 10.1109/TGRS.2025.3566205
- [42]
- [43]
- [44]
-
[45]
doi: 10.1109/MGRS.2019.2927687. 13
-
[46]
R. M Rustowicz, R. Cheong, L. Wang, S. Ermon, M. Burke, and D. Lobell. Semantic seg- mentation of crop type in africa: A novel dataset and analysis of deep learning methods. In Proceedings of the IEEE/cvf conference on computer vision and pattern recognition workshops, pages 75–82, 2019
work page 2019
- [47]
-
[48]
V . Marsocci, Y . Jia, G. L. Bellier, D. Kerekes, L. Zeng, S. Hafner, S. Gerard, E. Brune, R. Yadav, A. Shibli, H. Fang, Y . Ban, M. Vergauwen, N. Audebert, and A. Nascetti. Pangaea: Assessing geospatial foundation models capabilities through a global and inclusive benchmark.IEEE Geoscience and Remote Sensing Magazine, 14(1):245–285, 2026. doi: 10.1109/MG...
-
[49]
Data use and citation guidance for earth science data, 2025
NASA Earth Science Data and Information System. Data use and citation guidance for earth science data, 2025. URL https://doi.org/10.5067/DOC/ESCO/ESDS-RFC-055. NASA Earth Science data are fully open access without use restrictions, following the ESDS-RFC-055 standard
-
[50]
EMIT L2A estimated surface reflectance and uncertainty and masks 60 m V001
NASA LP DAAC. EMIT L2A estimated surface reflectance and uncertainty and masks 60 m V001. NASA Earthdata Search, 2025
work page 2025
-
[51]
A. Nascetti, R. Yadav, K. Brodt, Q. Qu, H. Fan, Y . Shendryk, I. Shah, and C. Chung. Biomassters: A benchmark dataset for forest biomass estimation using multi-modal satellite time-series. Advances in Neural Information Processing Systems, 36:20409–20420, 2023
work page 2023
-
[52]
V . Nedungadi, A. Kariryaa, S. Oehmcke, S. Belongie, C. Igel, and N. Lang. Mmearth: Exploring multi-modal pretext tasks for geospatial representation learning. InEuropean Conference on Computer Vision, pages 164–182. Springer, 2024
work page 2024
-
[53]
J. Pearlman, P. Barry, C. Segal, J. Shepanski, D. Beiso, and S. Carman. Hyperion, a space- based imaging spectrometer.IEEE Transactions on Geoscience and Remote Sensing, 41(6): 1160–1173, 2003. doi: 10.1109/TGRS.2003.815018
-
[54]
C. Persello, J. Grift, X. Fan, C. Paris, R. Hänsch, M. Koeva, and A. Nelson. Ai4smallfarms: A dataset for crop field delineation in southeast asian smallholder farms.IEEE Geoscience and Remote Sensing Letters, 20:1–5, 2023. doi: 10.1109/LGRS.2023.3323095
-
[55]
C. Rambour, N. Audebert, E. Koeniguer, B. Le Saux, M. Crucianu, and M. Datcu. Flood detec- tion in time series of optical and sar images.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 43(B2):1343–1346, 2020
work page 2020
-
[56]
C. Ryali, Y .-T. Hu, D. Bolya, C. Wei, H. Fan, P.-Y . Huang, V . Aggarwal, A. Chowdhury, O. Poursaeed, J. Hoffman, J. Malik, Y . Li, and C. Feichtenhofer. Hiera: a hierarchical vision transformer without the bells-and-whistles. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023
work page 2023
-
[57]
V . R˚ užiˇcka and A. Markham. Hyperspectralvits: General hyperspectral models for on-board remote sensing.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 18:10241–10253, 2025. doi: 10.1109/JSTARS.2025.3557527
-
[58]
L. Scheibenreif, M. Mommert, and D. Borth. Masked vision transformers for hyperspectral im- age classification. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2166–2176, 2023. doi: 10.1109/CVPRW59228.2023.00210
-
[59]
M. A. Soppa, M. Brell, S. Chabrillat, L. M. Alvarado, P. Gege, S. Plattner, I. Somlai-Schweiger, T. Schroeder, F. Steinmetz, D. Scheffler, et al. Full mission evaluation of enmap water leaving reflectance products using three atmospheric correction processors.Optics Express, 32(16): 28215–28230, 2024
work page 2024
- [60]
-
[61]
X. Sun, P. Wang, W. Lu, Z. Zhu, X. Lu, Q. He, J. Li, X. Rong, Z. Yang, H. Chang, Q. He, G. Yang, R. Wang, J. Lu, and K. Fu. Ringmo: A remote sensing foundation model with masked image modeling.IEEE Transactions on Geoscience and Remote Sensing, 61:1–22, 2023. doi: 10.1109/TGRS.2022.3194732
-
[62]
D. Szwarcman, S. Roy, P. Fraccaro, O. E. Gíslason, B. Blumenstiel, R. Ghosal, P. H. De Oliveira, J. L. de Sousa Almeida, R. Sedona, Y . Kang, et al. Prithvi-eo-2.0: A versatile multitemporal foundation model for earth observation applications.IEEE Transactions on Geoscience and Remote Sensing, 64:1–20, 2025. doi: 10.1109/TGRS.2025.3642610
-
[63]
A. Toker, L. Kondmann, M. Weber, M. Eisenberger, A. Camero, J. Hu, A. P. Hoderlein, Ç. ¸ Senaras, T. Davis, D. Cremers, et al. Dynamicearthnet: Daily multi-spectral satellite dataset for semantic change segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21158–21167, 2022
work page 2022
-
[64]
X.-Y . Tong, G.-S. Xia, and X. X. Zhu. Enabling country-scale land cover mapping with meter- resolution satellite imagery.ISPRS Journal of Photogrammetry and Remote Sensing, 196: 178–196, 2023
work page 2023
-
[65]
Z. H. Tushar and S. Purushotham. Hyperfm: An efficient hyperspectral foundation model with spectral grouping.arXiv preprint arXiv:2604.21127, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[66]
U.S. Geological Survey. Are landsat data in the cloud still considered to be within the public domain?, 2020. URL https://www.usgs.gov/faqs/ are-landsat-data-cloud-still-considered-be-within-public-domain . Ac- cessed: 2026-05-20
work page 2020
-
[67]
SpaceNet: A Remote Sensing Dataset and Challenge Series
A. Van Etten, D. Lindenbaum, and T. M. Bacastow. Spacenet: A remote sensing dataset and challenge series.arXiv preprint arXiv:1807.01232, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[68]
H. V . V o, V . Khalidov, T. Darcet, T. Moutakanni, N. Smetanin, M. Szafraniec, H. Touvron, M. Oquab, A. Joulin, H. Jegou, et al. Automatic data curation for self-supervised learning: A clustering-based approach.Transactions on Machine Learning Research, 2024
work page 2024
-
[69]
L. Waldmann, A. Shah, Y . Wang, N. Lehmann, A. Stewart, Z. Xiong, X. X. Zhu, S. Bauer, and J. Chuang. Panopticon: Advancing any-sensor foundation models for earth observation. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops, pages 2204–2214, 2025
work page 2025
-
[70]
D. Wang, M. Hu, Y . Jin, Y . Miao, J. Yang, Y . Xu, X. Qin, J. Ma, L. Sun, C. Li, C. Fu, H. Chen, C. Han, N. Yokoya, J. Zhang, M. Xu, L. Liu, L. Zhang, C. Wu, B. Du, D. Tao, and L. Zhang. Hypersigma: Hyperspectral intelligence comprehension foundation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(8):6427–6444, 2025. doi: 10.1109...
-
[71]
Y . Wang, C. M. Albrecht, N. A. A. Braham, L. Mou, and X. X. Zhu. Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 10(4):213–247,
-
[72]
doi: 10.1109/MGRS.2022.3198244
-
[73]
Y . Wang, N. A. A. Braham, Z. Xiong, C. Liu, C. M. Albrecht, and X. X. Zhu. Ssl4eo-s12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation [software and data sets].IEEE Geoscience and Remote Sensing Magazine, 11(3):98–106, 2023. doi: 10.1109/MGRS.2023.3281651
-
[74]
Y . Wang, Z. Xiong, C. Liu, A. J. Stewart, T. Dujardin, N. I. Bountos, A. Zavras, F. Gerken, I. Papoutsis, L. Leal-Taixé, and X. X. Zhu. Towards a unified copernicus foundation model for earth vision. In2025 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9888–9899, 2025. doi: 10.1109/ICCV51701.2025.00922
-
[75]
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo. Segformer: Simple and efficient design for semantic segmentation with transformers. InAdvances in Neural Information Processing Systems, volume 34, pages 12077–12090, 2021. 15
work page 2021
- [76]
-
[77]
F. Yao, W. Lu, H. Yang, L. Xu, C. Liu, L. Hu, H. Yu, N. Liu, C. Deng, D. Tang, C. Chen, J. Yu, X. Sun, and K. Fu. Ringmo-sense: Remote sensing foundation model for spatiotemporal prediction via spatiotemporal evolution disentangling.IEEE Transactions on Geoscience and Remote Sensing, 61:1–21, 2023. doi: 10.1109/TGRS.2023.3316166. 16 A Dataset details This...
-
[78]
Quality Control & Patch Extraction •Filter tiles by georeferencing accuracy, cloud cover, and noisy spectral bands •Extract3.84×3.84km HSI patches; discard invalid/NaN patches Unbalanced HSI Dataset EnMAP: 1.8M locsEMIT: 4.1M locsDESIS: 275K locs 2.6M patches 12M patches 447K patches HSI Preprocessing
-
[79]
Spatial Sampling •Retrieve annual AlphaEarth embeddings for each HSI location •Cluster embeddings to select geographically diverse sites; retain all timestamps Dataset Balancing
-
[80]
Temporal Alignment & Pairing •Match HSI with nearest MSI/SAR (S-2, L8/9, S-1;≤5% clouds) •Select up to 4 dates/year with seasonal coverage; deduplicate observations acquired within≤10 days of each other Sentinel-2 Landsat 8/9 Sentinel-1 Final SpectralEarth-MM Dataset(∼2M sites,∼25M files) EnMAP: 1.4M locsEMIT: 1.4M locsDESIS: 275K locs Multimodal Pairing ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.