SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining

Aaron Banze; Conrad M. Albrecht; Jocelyn Chanussot; Julien Mairal; Nassim Ait Ali Braham; Xiao Xiang Zhu

arxiv: 2605.21075 · v1 · pith:5H4JQQ7Cnew · submitted 2026-05-20 · 💻 cs.CV · cs.LG

SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining

Nassim Ait Ali Braham , Aaron Banze , Conrad M. Albrecht , Julien Mairal , Jocelyn Chanussot , Xiao Xiang Zhu This is my paper

Pith reviewed 2026-05-21 05:21 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords hyperspectral imageryfoundation modelsmultimodal pretrainingEarth observationtransformer architecturesensor fusionJEPA objective

0 comments

The pith

SpectralEarth-FM uses a hierarchical transformer with spectral tokenization and cross-sensor fusion to jointly pretrain on hyperspectral imagery and other Earth observation sensors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SpectralEarth-FM as a way to bring hyperspectral imagery into the training of Earth observation foundation models, which have so far relied mostly on multispectral, radar, and derived layers. It does this by building a model that handles inputs with very different numbers of spectral channels through dedicated tokenization, sensor-specific encoders, and a fusion step before a shared encoder. A new dataset called SpectralEarth-MM supplies the training data by aligning hyperspectral observations from three satellites with co-located Sentinel-2, Landsat, land surface temperature, and Sentinel-1 SAR patches at roughly two million global locations. Pretraining follows a JEPA-style objective that forces the model to match representations of the same location seen from different sensors and scales. The resulting model sets new performance records on both dedicated hyperspectral tasks and standard Earth observation benchmarks.

Core claim

SpectralEarth-FM is a hierarchical transformer for multisensor EO input with heterogeneous spectral dimensionality. The architecture combines spectral tokenization for hyperspectral inputs, sensor-specific encoders, a cross-sensor fusion module, and a shared hierarchical encoder, enabling joint processing of HSI and lower-channel observations. Pretraining on the curated SpectralEarth-MM dataset with a Joint-Embedding Predictive Architecture objective produces representations that achieve state-of-the-art results on hyperspectral downstream tasks and standard EO benchmarks under the PANGAEA protocol.

What carries the argument

Cross-sensor fusion module that integrates outputs from sensor-specific encoders before the shared hierarchical encoder in a transformer that also applies spectral tokenization to hyperspectral inputs.

If this is right

Hyperspectral imagery can now be included in the same pretraining pipeline as multispectral and SAR data without requiring separate models.
Representations learned this way improve results on both hyperspectral-specific tasks and conventional EO benchmarks.
A single model can accept inputs from sensors with widely varying channel counts after the fusion stage.
The JEPA-style matching of global and single-sensor local views scales to heterogeneous sensor stacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fusion approach could be tested on temporal sequences to see whether it captures change signals across sensor types.
If the alignment assumption holds, the method might extend to other high-dimensional remote-sensing domains such as atmospheric sounding.
Downstream applications that combine optical and radar data could gain from the joint hyperspectral embeddings without retraining separate heads.

Load-bearing premise

The co-located patches from EnMAP, EMIT, DESIS, Sentinel-2, Landsat, LST and Sentinel-1 supply sufficiently aligned and representative training signal for the fusion module to learn useful joint representations instead of sensor-specific artifacts.

What would settle it

Performance on downstream hyperspectral tasks drops to the level of single-sensor baselines when the cross-sensor fusion module is removed or when training uses only non-overlapping sensor footprints.

Figures

Figures reproduced from arXiv: 2605.21075 by Aaron Banze, Conrad M. Albrecht, Jocelyn Chanussot, Julien Mairal, Nassim Ait Ali Braham, Xiao Xiang Zhu.

**Figure 2.** Figure 2: Spatial coverage of SpectralEarth-MM. Global distribution of HSI anchor patches in SpectralEarth-MM. Colors indicate the HSI sensor associated with each acquisition [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: SpectralEarth-FM architecture. Each available input is mapped to a common spatial token grid. HSI inputs use spectral tokenization before spatial encoding, while lower-dimensional sensors use linear projections. Local hierarchical branches process sensor-specific features before cross-modal fusion. The fused tokens are passed to a shared hierarchical backbone. Cross-sensor fusion After local encoding, the … view at source ↗

**Figure 4.** Figure 4: SpectralEarth-FM pretraining. Global teacher views define a stop-gradient latent target. The student processes global, local, and sensor-dropped views from the same geographic location and predicts the teacher target. SIGReg is applied to the stacked student projections. Global views are spatial crops containing four randomly sampled modalities, with consistent spatial transforms across modalities. Local v… view at source ↗

**Figure 5.** Figure 5: Spectral coverage of the optical sensors and Landsat thermal bands (long wavelength to the [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Construction pipeline for SpectralEarth-MM. HSI acquisitions are used as anchors, [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Examples of co-located observations in SpectralEarth-MM. Each row corresponds to a [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

read the original abstract

Earth observation (EO) foundation models (FMs) are increasingly trained on multisensor data, spanning multispectral imagery (MSI), synthetic aperture radar (SAR), and derived geospatial layers, but hyperspectral imagery (HSI) remains underrepresented. Conversely, existing hyperspectral FMs are trained on HSI alone, leaving joint pretraining and fusion of HSI with co-located EO sensors unexplored. We introduce SpectralEarth-FM, a hierarchical transformer for multisensor EO input with heterogeneous spectral dimensionality. The architecture combines spectral tokenization for hyperspectral inputs, sensor-specific encoders, a cross-sensor fusion module, and a shared hierarchical encoder, enabling joint processing of HSI and lower-channel observations. To pretrain SpectralEarth-FM, we curate SpectralEarth-MM, a dataset that co-locates HSI from three spaceborne sensors (EnMAP, EMIT, DESIS) with Sentinel-2, Landsat-8/9 optical imagery, Landsat land surface temperature (LST), and Sentinel-1 SAR, over common geographic footprints. It comprises approximately 2M globally distributed locations, 25M georeferenced patches, and over 40TB of data. Pretraining uses a Joint-Embedding Predictive Architecture (JEPA)-style objective that matches representations between global views and single-sensor local views from the same location. We evaluate SpectralEarth-FM on hyperspectral downstream tasks and standard EO benchmarks following the PANGAEA protocol, achieving state-of-the-art results across both evaluation settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper adds hyperspectral data to large-scale multimodal EO pretraining through a new dataset and hierarchical fusion architecture, but the gains may partly reflect scale rather than verified cross-sensor alignment.

read the letter

The main point is that SpectralEarth-FM and its accompanying SpectralEarth-MM dataset finally bring hyperspectral imagery into joint pretraining with multispectral, SAR, and temperature data at a scale that matters. They collected roughly 2 million co-located locations and 25 million patches from EnMAP, EMIT, DESIS, Sentinel-2, Landsat, and Sentinel-1, then built a transformer that uses spectral tokenization for the high-dimensional HSI inputs, sensor-specific encoders, a cross-sensor fusion module, and a shared hierarchical encoder. Pretraining follows a JEPA-style objective that matches global multi-sensor views to single-sensor local views from the same spot. That combination of new data resource and architecture is the concrete advance over prior EO foundation models that either ignored HSI or treated it in isolation. The reported state-of-the-art numbers on PANGAEA benchmarks and hyperspectral downstream tasks suggest the setup produces usable representations for remote-sensing tasks. The dataset curation itself is a solid piece of work; assembling and georeferencing that volume of multi-sensor patches is not trivial and gives the community something usable to build on. The soft spot is the alignment question. Different sensors have mismatched native resolutions, overpass times, cloud conditions, and atmospheric correction pipelines, so the fusion module could be learning sensor-specific artifacts instead of genuine joint signals. The paper would be stronger with explicit temporal-matching ablations or quantitative alignment metrics on subsets of the data. Without those, it is hard to separate the contribution of the fusion step from the simple effect of training on more data. This work is aimed at the remote-sensing and geospatial foundation-model community. Readers who need a large multimodal EO dataset or ideas for handling heterogeneous spectral inputs will find practical value. It deserves a serious referee because the scale and the gap it addresses are real, even if some validation details need tightening. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces SpectralEarth-FM, a hierarchical transformer architecture for multisensor Earth observation pretraining that incorporates hyperspectral imagery (HSI) from EnMAP, EMIT, and DESIS alongside Sentinel-2, Landsat, LST, and Sentinel-1 data. It curates the SpectralEarth-MM dataset of approximately 2M globally distributed co-located patches and pretrains using a JEPA-style objective that matches global multi-sensor views to single-sensor local views from the same location. The model is evaluated on hyperspectral downstream tasks and PANGAEA benchmarks, with claims of state-of-the-art results in both settings.

Significance. If the performance claims hold after addressing alignment concerns, this would be a meaningful advance in multimodal EO foundation models by integrating previously underrepresented HSI data into joint pretraining. The large-scale dataset curation and the sensor-specific encoder plus cross-sensor fusion design represent concrete contributions that could improve cross-modal representations for remote sensing applications.

major comments (2)

[§3] §3 (Dataset Curation): The description of SpectralEarth-MM provides no quantitative alignment metrics (e.g., mean temporal offset between HSI and MSI/SAR acquisitions, spatial registration RMSE, or cloud-cover overlap statistics). Because the JEPA objective relies on the assumption that co-located patches supply aligned multi-sensor signals for the fusion module to learn joint rather than artifact-driven representations, the absence of these metrics leaves open the possibility that reported gains reflect dataset scale or sensor-specific biases instead of genuine multimodal fusion.
[§5] §5 (Experiments): The manuscript claims state-of-the-art results on PANGAEA and hyperspectral tasks but does not report full baseline tables, ablation studies isolating the cross-sensor fusion module, number of random seeds, or error bars. Without these, it is impossible to verify that the gains are robust to baseline choices, data splits, or the specific alignment properties of the curated patches.

minor comments (2)

[§2] The notation for the hierarchical encoder and fusion module could be clarified with an explicit diagram showing token flow between sensor-specific encoders and the shared backbone.
A few figure captions (e.g., Figure 3) omit the exact number of patches or geographic distribution statistics shown in the plots.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address the major concerns point by point below, agreeing where revisions are needed to improve clarity and rigor.

read point-by-point responses

Referee: [§3] §3 (Dataset Curation): The description of SpectralEarth-MM provides no quantitative alignment metrics (e.g., mean temporal offset between HSI and MSI/SAR acquisitions, spatial registration RMSE, or cloud-cover overlap statistics). Because the JEPA objective relies on the assumption that co-located patches supply aligned multi-sensor signals for the fusion module to learn joint rather than artifact-driven representations, the absence of these metrics leaves open the possibility that reported gains reflect dataset scale or sensor-specific biases instead of genuine multimodal fusion.

Authors: We agree that quantitative alignment metrics are important to substantiate the quality of the SpectralEarth-MM dataset and the validity of the JEPA pretraining objective. Although the dataset was curated using georeferenced patches from overlapping sensor footprints with efforts to minimize temporal discrepancies, we did not include explicit statistics in the original submission. In the revised manuscript, we will add these metrics to Section 3, including average temporal offsets between acquisitions, spatial registration accuracy from the source metadata, and cloud cover overlap percentages. This will allow readers to better assess the alignment quality. revision: yes
Referee: [§5] §5 (Experiments): The manuscript claims state-of-the-art results on PANGAEA and hyperspectral tasks but does not report full baseline tables, ablation studies isolating the cross-sensor fusion module, number of random seeds, or error bars. Without these, it is impossible to verify that the gains are robust to baseline choices, data splits, or the specific alignment properties of the curated patches.

Authors: We acknowledge that additional details on the experimental setup and results would enhance the verifiability of our claims. We will expand Section 5 to include complete baseline comparison tables, ablation studies specifically isolating the contribution of the cross-sensor fusion module, and report performance metrics averaged over multiple random seeds with standard error bars. These additions will demonstrate the robustness of the reported improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation applies established JEPA objective to new multimodal dataset and architecture.

full rationale

The paper's central chain consists of curating SpectralEarth-MM (co-located HSI/MSI/SAR patches), defining a hierarchical transformer with sensor-specific encoders plus cross-sensor fusion, and applying a JEPA-style matching objective between global multi-sensor views and single-sensor local views. This objective is explicitly drawn from prior literature rather than derived within the paper, and the reported SOTA results on hyperspectral and PANGAEA benchmarks are presented as empirical outcomes of training on the new ~2M-location dataset. No equations, parameter fits, or self-citations are shown that reduce the architecture, objective, or performance claims to tautological inputs by construction. The derivation remains self-contained with independent content from the dataset curation and architectural choices.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; concrete free parameters, axioms and invented entities cannot be enumerated without the methods and architecture sections. The central claim rests on the unstated assumption that sensor-specific encoders plus a shared hierarchical encoder can be jointly optimized without destructive interference.

pith-pipeline@v0.9.0 · 5819 in / 1194 out tokens · 23770 ms · 2026-05-21T05:21:30.017421+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 4 internal anchors

[1]

Alonso, M

K. Alonso, M. Bachmann, K. Burch, E. Carmona, D. Cerra, R. De los Reyes, D. Dietrich, U. Heiden, A. Hölderlin, J. Ickes, et al. Data products, quality and validation of the dlr earth sensing imaging spectrometer (desis).Sensors, 19(20):4471, 2019

work page 2019
[2]

Assran et al

M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, Y . LeCun, and N. Ballas. Self-supervised learning from images with a joint-embedding predictive architecture.arXiv preprint arXiv:2301.08243, 2023

work page arXiv 2023
[3]

Astruc, N

G. Astruc, N. Gonthier, C. Mallet, and L. Landrieu. Omnisat: Self-supervised modality fusion for earth observation. InEuropean Conference on Computer Vision, pages 409–427. Springer, 2024

work page 2024
[4]

Astruc, N

G. Astruc, N. Gonthier, C. Mallet, and L. Landrieu. Anysat: One earth observation model for many resolutions, scales, and modalities. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 19530–19540, 2025

work page 2025
[5]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

K. Ayush, B. Uzkent, C. Meng, K. Tanmay, M. Burke, D. Lobell, and S. Ermon. Geography- aware self-supervised learning. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 10161–10170, 2021. doi: 10.1109/ICCV48922.2021.01002

work page doi:10.1109/iccv48922.2021.01002 2021
[6]

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

R. Balestriero and Y . LeCun. Lejepa: Provable and scalable self-supervised learning without the heuristics.arXiv preprint arXiv:2511.08544, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

F. Bastani, P. Wolters, R. Gupta, J. Ferdinando, and A. Kembhavi. Satlaspretrain: A large-scale dataset for remote sensing image understanding. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 16726–16736, 2023. doi: 10.1109/ICCV51070.2023.01538

work page doi:10.1109/iccv51070.2023.01538 2023
[8]

Baumann, L

A. Baumann, L. Ayala, S. Seidlitz, J. Sellner, A. Studier-Fischer, B. Özdemir, L. Maier-hein, and S. Ilic. CARL: Camera-agnostic representation learning for spectral image analysis. In The F ourteenth International Conference on Learning Representations, 2026. URL https: //openreview.net/forum?id=TpbhS1yfz0

work page 2026
[9]

Blumenstiel, P

B. Blumenstiel, P. Fraccaro, V . Marsocci, J. Jakubik, S. Maurogiovanni, M. Czerkawski, R. Sedona, G. Cavallaro, T. Brunschwiler, J. Bernabe-Moreno, et al. Terramesh: A planetary mosaic of multimodal earth observation data.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025

work page 2025
[10]

N. A. A. Braham, C. M. Albrecht, J. Mairal, J. Chanussot, Y . Wang, and X. X. Zhu. Spectralearth: Training hyperspectral foundation models at scale.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 18:16780–16797, 2025. doi: 10.1109/JSTARS.2025. 3581451

work page doi:10.1109/jstars.2025 2025
[11]

C. F. Brown, M. R. Kazmierski, V . J. Pasquarella, W. J. Rucklidge, M. Samsikova, C. Zhang, E. Shelhamer, E. Lahera, O. Wiles, S. Ilyushchenko, et al. Alphaearth foundations: An embedding field model for accurate and efficient global mapping from sparse label data.arXiv preprint arXiv:2507.22291, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

H. Chen, W. Zhao, T. Xu, G. Shi, S. Zhou, P. Liu, and J. Li. Spectral-wise implicit neural representation for hyperspectral image reconstruction.IEEE Transactions on Circuits and Systems for Video Technology, 34(5):3714–3727, 2024. doi: 10.1109/TCSVT.2023.3318366

work page doi:10.1109/tcsvt.2023.3318366 2024
[13]

Derf: Decomposed radiance fields,

X. Chen and K. He. Exploring simple siamese representation learning. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15745–15753, 2021. doi: 10.1109/ CVPR46437.2021.01549

work page arXiv 2021
[14]

Y . Cong, S. Khanna, C. Meng, P. Liu, E. Rozi, Y . He, M. Burke, D. B. Lobell, and S. Ermon. SatMAE: Pre-training transformers for temporal and multi-spectral satellite imagery. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors,Advances in Neural Information Processing Systems, 2022. URLhttps://openreview.net/forum?id=WBhqzpF6KYH. 11

work page 2022
[15]

M. S. Danish, M. A. Munir, S. R. A. Shah, M. H. Khan, R. M. Anwer, J. Laaksonen, F. S. Khan, and S. Khan. TerraFM: A scalable foundation model for unified multisensor earth observation. InThe F ourteenth International Conference on Learning Representations, 2026

work page 2026
[16]

Copernicus legal notice: Free, full and open access to Sentinel data, 2024

European Union. Copernicus legal notice: Free, full and open access to Sentinel data, 2024. URL https://www.copernicus.eu/en/terms-use/how-access-data . Covers Sentinel- 1 and Sentinel-2 data access and exploitation for any public or private organization

work page 2024
[17]

Forgaard, J

T. Forgaard, J. H. Reksten, A. U. Waldeland, V . Marsocci, N. Longépé, M. Kampffmeyer, and A.-B. Salberg. Thor: A versatile foundation model for earth observation climate and society applications.arXiv preprint arXiv:2601.16011, 2026

work page arXiv 2026
[18]

Francis and M

A. Francis and M. Czerkawski. Major tom: Expandable datasets for earth observation. In2024 IEEE International Geoscience and Remote Sensing Symposium, pages 2935–2940, 2024. doi: 10.1109/IGARSS53475.2024.10640760

work page doi:10.1109/igarss53475.2024.10640760 2024
[19]

M. H. P. Fuchs and B. Demir. Hyspecnet-11k: a large-scale hyperspectral dataset for benchmarking learning-based hyperspectral image compression methods. In2023 IEEE International Geoscience and Remote Sensing Symposium, pages 1779–1782, 2023. doi: 10.1109/IGARSS52108.2023.10283385

work page doi:10.1109/igarss52108.2023.10283385 2023
[20]

Fuller, K

A. Fuller, K. Millard, and J. Green. Croma: Remote sensing representations with contrastive radar-optical masked autoencoders. In A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Infor- mation Processing Systems, volume 36, pages 5506–5538. Curran Associates, Inc.,

work page
[21]

URL https://proceedings.neurips.cc/paper_files/paper/2023/file/ 11822e84689e631615199db3b75cd0e4-Paper-Conference.pdf

work page 2023
[22]

V . S. F. Garnot and L. Landrieu. Panoptic segmentation of satellite image time series with convolutional temporal attention networks. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4872–4881, 2021

work page 2021
[23]

V . S. F. Garnot, L. Landrieu, and N. Chehata. Multi-modal temporal attention models for crop mapping from satellite time series.ISPRS Journal of Photogrammetry and Remote Sensing, 187:294–305, 2022

work page 2022
[24]

EnMAP - environmental mapping and analysis program data policy and access

German Aerospace Center (DLR). EnMAP - environmental mapping and analysis program data policy and access. https://www.enmap.org/data/resources/EnMAP_Data_License. pdf, 2023. URL https://www.enmap.org/data_access/. Scientific and commercial use permitted as per the EnMAP Data License Agreement

work page 2023
[25]

License agreement regarding the use of the DESIS data for scientific use, 2024

German Aerospace Center (DLR). License agreement regarding the use of the DESIS data for scientific use, 2024. URL https://geoservice.dlr.de/resources/licenses/desis/ DESIS_License_Agreement_for_Scientific_Use.pdf. Free for non-commercial scien- tific research; commercial use managed by Teledyne Brown Engineering

work page 2024
[26]

EOWEB GeoPortal

German Aerospace Center (DLR). EOWEB GeoPortal. https://eoweb.dlr.de/egp/, 2024. Accessed: 2025

work page 2024
[27]

German Remote Sensing Data Center, 2.7 edition, 2026

German Aerospace Center (DLR).EnMAP Frequently Asked Questions (F AQ). German Remote Sensing Data Center, 2.7 edition, 2026. URL https://www.enmap.org/data/doc/EnMAP_ FAQ.pdf

work page 2026
[28]

R. O. Green, N. Mahowald, C. Ung, D. R. Thompson, L. Bator, M. Bennet, M. Bernas, N. Blackway, C. Bradley, J. Cha, et al. The earth surface mineral dust source investigation: An earth science imaging spectroscopy mission. In2020 IEEE aerospace conference, pages 1–15. IEEE, 2020

work page 2020
[29]

Grill, F

J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar, et al. Bootstrap your own latent-a new ap- proach to self-supervised learning.Advances in neural information processing systems, 33: 21271–21284, 2020. 12

work page 2020
[30]

Guanter, H

L. Guanter, H. Kaufmann, K. Segl, S. Foerster, C. Rogass, S. Chabrillat, T. Kuester, A. Hollstein, G. Rossner, C. Chlebek, C. Straif, S. Fischer, S. Schrader, T. Storch, U. Heiden, A. Mueller, M. Bachmann, H. Mühle, R. Müller, M. Habermeyer, A. Ohndorf, J. Hill, H. Buddenbaum, P. Hostert, S. Van der Linden, P. J. Leitão, A. Rabe, R. Doerffer, H. Krasemann...

work page 2015
[31]

URLhttps://www.mdpi.com/2072-4292/7/7/8830

doi: 10.3390/rs70708830. URLhttps://www.mdpi.com/2072-4292/7/7/8830

work page doi:10.3390/rs70708830 2072
[32]

K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022

work page 2022
[33]

D. Hong, Z. Han, J. Yao, L. Gao, B. Zhang, A. Plaza, and J. Chanussot. Spectralformer: Rethink- ing hyperspectral image classification with transformers.IEEE Transactions on Geoscience and Remote Sensing, 60:1–15, 2022. doi: 10.1109/TGRS.2021.3130716

work page doi:10.1109/tgrs.2021.3130716 2022
[34]

D. Hong, B. Zhang, H. Li, Y . Li, J. Yao, C. Li, M. Werner, J. Chanussot, A. Zipf, and X. X. Zhu. Cross-city matters: A multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks.Remote Sensing of Environment, 299:113856, 2023

work page 2023
[35]

D. Hong, B. Zhang, X. Li, Y . Li, C. Li, J. Yao, N. Yokoya, H. Li, P. Ghamisi, X. Jia, A. Plaza, P. Gamba, J. A. Benediktsson, and J. Chanussot. Spectralgpt: Spectral remote sensing foundation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5227–5244,

work page
[36]

doi: 10.1109/TPAMI.2024.3362475

work page doi:10.1109/tpami.2024.3362475 2024
[37]

org/abs/2310.18660

J. Jakubik, S. Roy, C. Phillips, P. Fraccaro, D. Godwin, B. Zadrozny, D. Szwarcman, C. Gomes, G. Nyirjesy, B. Edwards, et al. Foundation models for generalist geospatial artificial intelligence. arXiv preprint arXiv:2310.18660, 2023

work page arXiv 2023
[38]

Jakubik, F

J. Jakubik, F. Yang, B. Blumenstiel, E. Scheurer, R. Sedona, S. Maurogiovanni, J. Bosmans, N. Dionelis, V . Marsocci, N. Kopp, R. Ramachandran, P. Fraccaro, T. Brunschwiler, G. Caval- laro, J. Bernabe-Moreno, and N. Longépé. Terramind: Large-scale generative multimodality for earth observation. InIEEE/CVF International Conference on Computer Vision (ICCV)...

work page doi:10.1109/iccv51701.2025.00693 2025
[39]

R. Ji, X. Wang, C. Niu, W. Zhang, Y . Mei, and K. Tan. Specaware: A spectral-content aware foundation model for unifying multi-sensor learning in hyperspectral remote sensing mapping. ISPRS Journal of Photogrammetry and Remote Sensing, 234:242–260, 2026. ISSN 0924-2716. doi: https://doi.org/10.1016/j.isprsjprs.2026.02.024. URL https://www.sciencedirect. c...

work page doi:10.1016/j.isprsjprs.2026.02.024 2026
[40]

Kikaki, I

K. Kikaki, I. Kakogeorgiou, I. Hoteit, and K. Karantzalos. Detecting marine pollutants and sea surface features with deep learning in sentinel-2 imagery.ISPRS Journal of Photogrammetry and Remote Sensing, 210:39–54, 2024

work page 2024
[41]

W. Kong, B. Liu, X. Bi, C. Yu, X. Li, and Y . Chen. Hypersl: A spectral foundation model for hyperspectral image interpretation.IEEE Transactions on Geoscience and Remote Sensing, 63: 1–19, 2025. doi: 10.1109/TGRS.2025.3566205

work page doi:10.1109/tgrs.2025.3566205 2025
[42]

Krutz, R

D. Krutz, R. Müller, U. Knodt, B. Günther, I. Walter, I. Sebastian, T. Säuberlich, R. Reulke, E. Carmona, A. Eckardt, et al. The instrument design of the dlr earth sensing imaging spectrom- eter (desis).Sensors, 19(7):1622, 2019

work page 2019
[43]

J. A. Leonardi, J. Jakubik, P. Fraccaro, and M. A. Brovelli. Spectral gaps and spatial priors: Studying hyperspectral downstream adaptation using terramind.arXiv preprint arXiv:2603.06690, 2026

work page arXiv 2026
[44]

Liu, D.-X

Y .-N. Liu, D.-X. Sun, X.-N. Hu, X. Ye, Y .-D. Li, S.-F. Liu, K.-Q. Cao, M.-Y . Chai, W.-Y .-N. Zhou, J. Zhang, Y . Zhang, W.-W. Sun, and L.-L. Jiao. The advanced hyperspectral imager: Aboard china’s gaofen-5 satellite.IEEE Geoscience and Remote Sensing Magazine, 7(4):23–32,

work page
[45]

doi: 10.1109/MGRS.2019.2927687. 13

work page doi:10.1109/mgrs.2019.2927687 2019
[46]

M Rustowicz, R

R. M Rustowicz, R. Cheong, L. Wang, S. Ermon, M. Burke, and D. Lobell. Semantic seg- mentation of crop type in africa: A novel dataset and analysis of deep learning methods. In Proceedings of the IEEE/cvf conference on computer vision and pattern recognition workshops, pages 75–82, 2019

work page 2019
[47]

Manas, A

O. Manas, A. Lacoste, X. Giró-i Nieto, D. Vazquez, and P. Rodriguez. Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data. InProceedings of the IEEE/CVF international conference on computer vision, pages 9414–9423, 2021

work page 2021
[48]

Marsocci, Y

V . Marsocci, Y . Jia, G. L. Bellier, D. Kerekes, L. Zeng, S. Hafner, S. Gerard, E. Brune, R. Yadav, A. Shibli, H. Fang, Y . Ban, M. Vergauwen, N. Audebert, and A. Nascetti. Pangaea: Assessing geospatial foundation models capabilities through a global and inclusive benchmark.IEEE Geoscience and Remote Sensing Magazine, 14(1):245–285, 2026. doi: 10.1109/MG...

work page doi:10.1109/mgrs.2025 2026
[49]

Data use and citation guidance for earth science data, 2025

NASA Earth Science Data and Information System. Data use and citation guidance for earth science data, 2025. URL https://doi.org/10.5067/DOC/ESCO/ESDS-RFC-055. NASA Earth Science data are fully open access without use restrictions, following the ESDS-RFC-055 standard

work page doi:10.5067/doc/esco/esds-rfc-055 2025
[50]

EMIT L2A estimated surface reflectance and uncertainty and masks 60 m V001

NASA LP DAAC. EMIT L2A estimated surface reflectance and uncertainty and masks 60 m V001. NASA Earthdata Search, 2025

work page 2025
[51]

Nascetti, R

A. Nascetti, R. Yadav, K. Brodt, Q. Qu, H. Fan, Y . Shendryk, I. Shah, and C. Chung. Biomassters: A benchmark dataset for forest biomass estimation using multi-modal satellite time-series. Advances in Neural Information Processing Systems, 36:20409–20420, 2023

work page 2023
[52]

Nedungadi, A

V . Nedungadi, A. Kariryaa, S. Oehmcke, S. Belongie, C. Igel, and N. Lang. Mmearth: Exploring multi-modal pretext tasks for geospatial representation learning. InEuropean Conference on Computer Vision, pages 164–182. Springer, 2024

work page 2024
[53]

Pearlman, P

J. Pearlman, P. Barry, C. Segal, J. Shepanski, D. Beiso, and S. Carman. Hyperion, a space- based imaging spectrometer.IEEE Transactions on Geoscience and Remote Sensing, 41(6): 1160–1173, 2003. doi: 10.1109/TGRS.2003.815018

work page doi:10.1109/tgrs.2003.815018 2003
[54]

Persello, J

C. Persello, J. Grift, X. Fan, C. Paris, R. Hänsch, M. Koeva, and A. Nelson. Ai4smallfarms: A dataset for crop field delineation in southeast asian smallholder farms.IEEE Geoscience and Remote Sensing Letters, 20:1–5, 2023. doi: 10.1109/LGRS.2023.3323095

work page doi:10.1109/lgrs.2023.3323095 2023
[55]

Rambour, N

C. Rambour, N. Audebert, E. Koeniguer, B. Le Saux, M. Crucianu, and M. Datcu. Flood detec- tion in time series of optical and sar images.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 43(B2):1343–1346, 2020

work page 2020
[56]

Ryali, Y .-T

C. Ryali, Y .-T. Hu, D. Bolya, C. Wei, H. Fan, P.-Y . Huang, V . Aggarwal, A. Chowdhury, O. Poursaeed, J. Hoffman, J. Malik, Y . Li, and C. Feichtenhofer. Hiera: a hierarchical vision transformer without the bells-and-whistles. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

work page 2023
[57]

R˚ užiˇcka and A

V . R˚ užiˇcka and A. Markham. Hyperspectralvits: General hyperspectral models for on-board remote sensing.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 18:10241–10253, 2025. doi: 10.1109/JSTARS.2025.3557527

work page doi:10.1109/jstars.2025.3557527 2025
[58]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

L. Scheibenreif, M. Mommert, and D. Borth. Masked vision transformers for hyperspectral im- age classification. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2166–2176, 2023. doi: 10.1109/CVPRW59228.2023.00210

work page doi:10.1109/cvprw59228.2023.00210 2023
[59]

M. A. Soppa, M. Brell, S. Chabrillat, L. M. Alvarado, P. Gege, S. Plattner, I. Somlai-Schweiger, T. Schroeder, F. Steinmetz, D. Scheffler, et al. Full mission evaluation of enmap water leaving reflectance products using three atmospheric correction processors.Optics Express, 32(16): 28215–28230, 2024

work page 2024
[60]

Sumbul, C

G. Sumbul, C. Xu, E. Dalsasso, and D. Tuia. Smarties: Spectrum-aware multi-sensor auto- encoder for remote sensing images. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5569–5578, 2025. 14

work page 2025
[61]

X. Sun, P. Wang, W. Lu, Z. Zhu, X. Lu, Q. He, J. Li, X. Rong, Z. Yang, H. Chang, Q. He, G. Yang, R. Wang, J. Lu, and K. Fu. Ringmo: A remote sensing foundation model with masked image modeling.IEEE Transactions on Geoscience and Remote Sensing, 61:1–22, 2023. doi: 10.1109/TGRS.2022.3194732

work page doi:10.1109/tgrs.2022.3194732 2023
[62]

Prithvi-

D. Szwarcman, S. Roy, P. Fraccaro, O. E. Gíslason, B. Blumenstiel, R. Ghosal, P. H. De Oliveira, J. L. de Sousa Almeida, R. Sedona, Y . Kang, et al. Prithvi-eo-2.0: A versatile multitemporal foundation model for earth observation applications.IEEE Transactions on Geoscience and Remote Sensing, 64:1–20, 2025. doi: 10.1109/TGRS.2025.3642610

work page doi:10.1109/tgrs.2025.3642610 2025
[63]

Toker, L

A. Toker, L. Kondmann, M. Weber, M. Eisenberger, A. Camero, J. Hu, A. P. Hoderlein, Ç. ¸ Senaras, T. Davis, D. Cremers, et al. Dynamicearthnet: Daily multi-spectral satellite dataset for semantic change segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21158–21167, 2022

work page 2022
[64]

Tong, G.-S

X.-Y . Tong, G.-S. Xia, and X. X. Zhu. Enabling country-scale land cover mapping with meter- resolution satellite imagery.ISPRS Journal of Photogrammetry and Remote Sensing, 196: 178–196, 2023

work page 2023
[65]

Z. H. Tushar and S. Purushotham. Hyperfm: An efficient hyperspectral foundation model with spectral grouping.arXiv preprint arXiv:2604.21127, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[66]

Geological Survey

U.S. Geological Survey. Are landsat data in the cloud still considered to be within the public domain?, 2020. URL https://www.usgs.gov/faqs/ are-landsat-data-cloud-still-considered-be-within-public-domain . Ac- cessed: 2026-05-20

work page 2020
[67]

SpaceNet: A Remote Sensing Dataset and Challenge Series

A. Van Etten, D. Lindenbaum, and T. M. Bacastow. Spacenet: A remote sensing dataset and challenge series.arXiv preprint arXiv:1807.01232, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[68]

H. V . V o, V . Khalidov, T. Darcet, T. Moutakanni, N. Smetanin, M. Szafraniec, H. Touvron, M. Oquab, A. Joulin, H. Jegou, et al. Automatic data curation for self-supervised learning: A clustering-based approach.Transactions on Machine Learning Research, 2024

work page 2024
[69]

Waldmann, A

L. Waldmann, A. Shah, Y . Wang, N. Lehmann, A. Stewart, Z. Xiong, X. X. Zhu, S. Bauer, and J. Chuang. Panopticon: Advancing any-sensor foundation models for earth observation. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops, pages 2204–2214, 2025

work page 2025
[70]

D. Wang, M. Hu, Y . Jin, Y . Miao, J. Yang, Y . Xu, X. Qin, J. Ma, L. Sun, C. Li, C. Fu, H. Chen, C. Han, N. Yokoya, J. Zhang, M. Xu, L. Liu, L. Zhang, C. Wu, B. Du, D. Tao, and L. Zhang. Hypersigma: Hyperspectral intelligence comprehension foundation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(8):6427–6444, 2025. doi: 10.1109...

work page doi:10.1109/tpami.2025.3557581 2025
[71]

Y . Wang, C. M. Albrecht, N. A. A. Braham, L. Mou, and X. X. Zhu. Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 10(4):213–247,

work page
[72]

doi: 10.1109/MGRS.2022.3198244

work page doi:10.1109/mgrs.2022.3198244 2022
[73]

Y . Wang, N. A. A. Braham, Z. Xiong, C. Liu, C. M. Albrecht, and X. X. Zhu. Ssl4eo-s12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation [software and data sets].IEEE Geoscience and Remote Sensing Magazine, 11(3):98–106, 2023. doi: 10.1109/MGRS.2023.3281651

work page doi:10.1109/mgrs.2023.3281651 2023
[74]

Y . Wang, Z. Xiong, C. Liu, A. J. Stewart, T. Dujardin, N. I. Bountos, A. Zavras, F. Gerken, I. Papoutsis, L. Leal-Taixé, and X. X. Zhu. Towards a unified copernicus foundation model for earth vision. In2025 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9888–9899, 2025. doi: 10.1109/ICCV51701.2025.00922

work page doi:10.1109/iccv51701.2025.00922 2025
[75]

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo. Segformer: Simple and efficient design for semantic segmentation with transformers. InAdvances in Neural Information Processing Systems, volume 34, pages 12077–12090, 2021. 15

work page 2021
[76]

Xiong, Y

Z. Xiong, Y . Wang, F. Zhang, A. J. Stewart, J. Hanna, D. Borth, I. Papoutsis, B. L. Saux, G. Camps-Valls, and X. X. Zhu. Neural plasticity-inspired multimodal foundation model for earth observation.arXiv preprint arXiv:2403.15356, 2024

work page arXiv 2024
[77]

F. Yao, W. Lu, H. Yang, L. Xu, C. Liu, L. Hu, H. Yu, N. Liu, C. Deng, D. Tang, C. Chen, J. Yu, X. Sun, and K. Fu. Ringmo-sense: Remote sensing foundation model for spatiotemporal prediction via spatiotemporal evolution disentangling.IEEE Transactions on Geoscience and Remote Sensing, 61:1–21, 2023. doi: 10.1109/TGRS.2023.3316166. 16 A Dataset details This...

work page doi:10.1109/tgrs.2023.3316166 2023
[78]

Quality Control & Patch Extraction •Filter tiles by georeferencing accuracy, cloud cover, and noisy spectral bands •Extract3.84×3.84km HSI patches; discard invalid/NaN patches Unbalanced HSI Dataset EnMAP: 1.8M locsEMIT: 4.1M locsDESIS: 275K locs 2.6M patches 12M patches 447K patches HSI Preprocessing

work page
[79]

Spatial Sampling •Retrieve annual AlphaEarth embeddings for each HSI location •Cluster embeddings to select geographically diverse sites; retain all timestamps Dataset Balancing

work page
[80]

HSI acquisitions are used as anchors, filtered, patchified, grouped by location, rebalanced, and paired with co-located MSI, SAR, and LST observations

Temporal Alignment & Pairing •Match HSI with nearest MSI/SAR (S-2, L8/9, S-1;≤5% clouds) •Select up to 4 dates/year with seasonal coverage; deduplicate observations acquired within≤10 days of each other Sentinel-2 Landsat 8/9 Sentinel-1 Final SpectralEarth-MM Dataset(∼2M sites,∼25M files) EnMAP: 1.4M locsEMIT: 1.4M locsDESIS: 275K locs Multimodal Pairing ...

work page

Showing first 80 references.

[1] [1]

Alonso, M

K. Alonso, M. Bachmann, K. Burch, E. Carmona, D. Cerra, R. De los Reyes, D. Dietrich, U. Heiden, A. Hölderlin, J. Ickes, et al. Data products, quality and validation of the dlr earth sensing imaging spectrometer (desis).Sensors, 19(20):4471, 2019

work page 2019

[2] [2]

Assran et al

M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, Y . LeCun, and N. Ballas. Self-supervised learning from images with a joint-embedding predictive architecture.arXiv preprint arXiv:2301.08243, 2023

work page arXiv 2023

[3] [3]

Astruc, N

G. Astruc, N. Gonthier, C. Mallet, and L. Landrieu. Omnisat: Self-supervised modality fusion for earth observation. InEuropean Conference on Computer Vision, pages 409–427. Springer, 2024

work page 2024

[4] [4]

Astruc, N

G. Astruc, N. Gonthier, C. Mallet, and L. Landrieu. Anysat: One earth observation model for many resolutions, scales, and modalities. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 19530–19540, 2025

work page 2025

[5] [5]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

K. Ayush, B. Uzkent, C. Meng, K. Tanmay, M. Burke, D. Lobell, and S. Ermon. Geography- aware self-supervised learning. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 10161–10170, 2021. doi: 10.1109/ICCV48922.2021.01002

work page doi:10.1109/iccv48922.2021.01002 2021

[6] [6]

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

R. Balestriero and Y . LeCun. Lejepa: Provable and scalable self-supervised learning without the heuristics.arXiv preprint arXiv:2511.08544, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

F. Bastani, P. Wolters, R. Gupta, J. Ferdinando, and A. Kembhavi. Satlaspretrain: A large-scale dataset for remote sensing image understanding. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 16726–16736, 2023. doi: 10.1109/ICCV51070.2023.01538

work page doi:10.1109/iccv51070.2023.01538 2023

[8] [8]

Baumann, L

A. Baumann, L. Ayala, S. Seidlitz, J. Sellner, A. Studier-Fischer, B. Özdemir, L. Maier-hein, and S. Ilic. CARL: Camera-agnostic representation learning for spectral image analysis. In The F ourteenth International Conference on Learning Representations, 2026. URL https: //openreview.net/forum?id=TpbhS1yfz0

work page 2026

[9] [9]

Blumenstiel, P

B. Blumenstiel, P. Fraccaro, V . Marsocci, J. Jakubik, S. Maurogiovanni, M. Czerkawski, R. Sedona, G. Cavallaro, T. Brunschwiler, J. Bernabe-Moreno, et al. Terramesh: A planetary mosaic of multimodal earth observation data.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025

work page 2025

[10] [10]

N. A. A. Braham, C. M. Albrecht, J. Mairal, J. Chanussot, Y . Wang, and X. X. Zhu. Spectralearth: Training hyperspectral foundation models at scale.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 18:16780–16797, 2025. doi: 10.1109/JSTARS.2025. 3581451

work page doi:10.1109/jstars.2025 2025

[11] [11]

C. F. Brown, M. R. Kazmierski, V . J. Pasquarella, W. J. Rucklidge, M. Samsikova, C. Zhang, E. Shelhamer, E. Lahera, O. Wiles, S. Ilyushchenko, et al. Alphaearth foundations: An embedding field model for accurate and efficient global mapping from sparse label data.arXiv preprint arXiv:2507.22291, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[12] [12]

H. Chen, W. Zhao, T. Xu, G. Shi, S. Zhou, P. Liu, and J. Li. Spectral-wise implicit neural representation for hyperspectral image reconstruction.IEEE Transactions on Circuits and Systems for Video Technology, 34(5):3714–3727, 2024. doi: 10.1109/TCSVT.2023.3318366

work page doi:10.1109/tcsvt.2023.3318366 2024

[13] [13]

Derf: Decomposed radiance fields,

X. Chen and K. He. Exploring simple siamese representation learning. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15745–15753, 2021. doi: 10.1109/ CVPR46437.2021.01549

work page arXiv 2021

[14] [14]

Y . Cong, S. Khanna, C. Meng, P. Liu, E. Rozi, Y . He, M. Burke, D. B. Lobell, and S. Ermon. SatMAE: Pre-training transformers for temporal and multi-spectral satellite imagery. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors,Advances in Neural Information Processing Systems, 2022. URLhttps://openreview.net/forum?id=WBhqzpF6KYH. 11

work page 2022

[15] [15]

M. S. Danish, M. A. Munir, S. R. A. Shah, M. H. Khan, R. M. Anwer, J. Laaksonen, F. S. Khan, and S. Khan. TerraFM: A scalable foundation model for unified multisensor earth observation. InThe F ourteenth International Conference on Learning Representations, 2026

work page 2026

[16] [16]

Copernicus legal notice: Free, full and open access to Sentinel data, 2024

European Union. Copernicus legal notice: Free, full and open access to Sentinel data, 2024. URL https://www.copernicus.eu/en/terms-use/how-access-data . Covers Sentinel- 1 and Sentinel-2 data access and exploitation for any public or private organization

work page 2024

[17] [17]

Forgaard, J

T. Forgaard, J. H. Reksten, A. U. Waldeland, V . Marsocci, N. Longépé, M. Kampffmeyer, and A.-B. Salberg. Thor: A versatile foundation model for earth observation climate and society applications.arXiv preprint arXiv:2601.16011, 2026

work page arXiv 2026

[18] [18]

Francis and M

A. Francis and M. Czerkawski. Major tom: Expandable datasets for earth observation. In2024 IEEE International Geoscience and Remote Sensing Symposium, pages 2935–2940, 2024. doi: 10.1109/IGARSS53475.2024.10640760

work page doi:10.1109/igarss53475.2024.10640760 2024

[19] [19]

M. H. P. Fuchs and B. Demir. Hyspecnet-11k: a large-scale hyperspectral dataset for benchmarking learning-based hyperspectral image compression methods. In2023 IEEE International Geoscience and Remote Sensing Symposium, pages 1779–1782, 2023. doi: 10.1109/IGARSS52108.2023.10283385

work page doi:10.1109/igarss52108.2023.10283385 2023

[20] [20]

Fuller, K

A. Fuller, K. Millard, and J. Green. Croma: Remote sensing representations with contrastive radar-optical masked autoencoders. In A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Infor- mation Processing Systems, volume 36, pages 5506–5538. Curran Associates, Inc.,

work page

[21] [21]

URL https://proceedings.neurips.cc/paper_files/paper/2023/file/ 11822e84689e631615199db3b75cd0e4-Paper-Conference.pdf

work page 2023

[22] [22]

V . S. F. Garnot and L. Landrieu. Panoptic segmentation of satellite image time series with convolutional temporal attention networks. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4872–4881, 2021

work page 2021

[23] [23]

V . S. F. Garnot, L. Landrieu, and N. Chehata. Multi-modal temporal attention models for crop mapping from satellite time series.ISPRS Journal of Photogrammetry and Remote Sensing, 187:294–305, 2022

work page 2022

[24] [24]

EnMAP - environmental mapping and analysis program data policy and access

German Aerospace Center (DLR). EnMAP - environmental mapping and analysis program data policy and access. https://www.enmap.org/data/resources/EnMAP_Data_License. pdf, 2023. URL https://www.enmap.org/data_access/. Scientific and commercial use permitted as per the EnMAP Data License Agreement

work page 2023

[25] [25]

License agreement regarding the use of the DESIS data for scientific use, 2024

German Aerospace Center (DLR). License agreement regarding the use of the DESIS data for scientific use, 2024. URL https://geoservice.dlr.de/resources/licenses/desis/ DESIS_License_Agreement_for_Scientific_Use.pdf. Free for non-commercial scien- tific research; commercial use managed by Teledyne Brown Engineering

work page 2024

[26] [26]

EOWEB GeoPortal

German Aerospace Center (DLR). EOWEB GeoPortal. https://eoweb.dlr.de/egp/, 2024. Accessed: 2025

work page 2024

[27] [27]

German Remote Sensing Data Center, 2.7 edition, 2026

German Aerospace Center (DLR).EnMAP Frequently Asked Questions (F AQ). German Remote Sensing Data Center, 2.7 edition, 2026. URL https://www.enmap.org/data/doc/EnMAP_ FAQ.pdf

work page 2026

[28] [28]

R. O. Green, N. Mahowald, C. Ung, D. R. Thompson, L. Bator, M. Bennet, M. Bernas, N. Blackway, C. Bradley, J. Cha, et al. The earth surface mineral dust source investigation: An earth science imaging spectroscopy mission. In2020 IEEE aerospace conference, pages 1–15. IEEE, 2020

work page 2020

[29] [29]

Grill, F

J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar, et al. Bootstrap your own latent-a new ap- proach to self-supervised learning.Advances in neural information processing systems, 33: 21271–21284, 2020. 12

work page 2020

[30] [30]

Guanter, H

L. Guanter, H. Kaufmann, K. Segl, S. Foerster, C. Rogass, S. Chabrillat, T. Kuester, A. Hollstein, G. Rossner, C. Chlebek, C. Straif, S. Fischer, S. Schrader, T. Storch, U. Heiden, A. Mueller, M. Bachmann, H. Mühle, R. Müller, M. Habermeyer, A. Ohndorf, J. Hill, H. Buddenbaum, P. Hostert, S. Van der Linden, P. J. Leitão, A. Rabe, R. Doerffer, H. Krasemann...

work page 2015

[31] [31]

URLhttps://www.mdpi.com/2072-4292/7/7/8830

doi: 10.3390/rs70708830. URLhttps://www.mdpi.com/2072-4292/7/7/8830

work page doi:10.3390/rs70708830 2072

[32] [32]

K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022

work page 2022

[33] [33]

D. Hong, Z. Han, J. Yao, L. Gao, B. Zhang, A. Plaza, and J. Chanussot. Spectralformer: Rethink- ing hyperspectral image classification with transformers.IEEE Transactions on Geoscience and Remote Sensing, 60:1–15, 2022. doi: 10.1109/TGRS.2021.3130716

work page doi:10.1109/tgrs.2021.3130716 2022

[34] [34]

D. Hong, B. Zhang, H. Li, Y . Li, J. Yao, C. Li, M. Werner, J. Chanussot, A. Zipf, and X. X. Zhu. Cross-city matters: A multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks.Remote Sensing of Environment, 299:113856, 2023

work page 2023

[35] [35]

D. Hong, B. Zhang, X. Li, Y . Li, C. Li, J. Yao, N. Yokoya, H. Li, P. Ghamisi, X. Jia, A. Plaza, P. Gamba, J. A. Benediktsson, and J. Chanussot. Spectralgpt: Spectral remote sensing foundation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5227–5244,

work page

[36] [36]

doi: 10.1109/TPAMI.2024.3362475

work page doi:10.1109/tpami.2024.3362475 2024

[37] [37]

org/abs/2310.18660

J. Jakubik, S. Roy, C. Phillips, P. Fraccaro, D. Godwin, B. Zadrozny, D. Szwarcman, C. Gomes, G. Nyirjesy, B. Edwards, et al. Foundation models for generalist geospatial artificial intelligence. arXiv preprint arXiv:2310.18660, 2023

work page arXiv 2023

[38] [38]

Jakubik, F

J. Jakubik, F. Yang, B. Blumenstiel, E. Scheurer, R. Sedona, S. Maurogiovanni, J. Bosmans, N. Dionelis, V . Marsocci, N. Kopp, R. Ramachandran, P. Fraccaro, T. Brunschwiler, G. Caval- laro, J. Bernabe-Moreno, and N. Longépé. Terramind: Large-scale generative multimodality for earth observation. InIEEE/CVF International Conference on Computer Vision (ICCV)...

work page doi:10.1109/iccv51701.2025.00693 2025

[39] [39]

R. Ji, X. Wang, C. Niu, W. Zhang, Y . Mei, and K. Tan. Specaware: A spectral-content aware foundation model for unifying multi-sensor learning in hyperspectral remote sensing mapping. ISPRS Journal of Photogrammetry and Remote Sensing, 234:242–260, 2026. ISSN 0924-2716. doi: https://doi.org/10.1016/j.isprsjprs.2026.02.024. URL https://www.sciencedirect. c...

work page doi:10.1016/j.isprsjprs.2026.02.024 2026

[40] [40]

Kikaki, I

K. Kikaki, I. Kakogeorgiou, I. Hoteit, and K. Karantzalos. Detecting marine pollutants and sea surface features with deep learning in sentinel-2 imagery.ISPRS Journal of Photogrammetry and Remote Sensing, 210:39–54, 2024

work page 2024

[41] [41]

W. Kong, B. Liu, X. Bi, C. Yu, X. Li, and Y . Chen. Hypersl: A spectral foundation model for hyperspectral image interpretation.IEEE Transactions on Geoscience and Remote Sensing, 63: 1–19, 2025. doi: 10.1109/TGRS.2025.3566205

work page doi:10.1109/tgrs.2025.3566205 2025

[42] [42]

Krutz, R

D. Krutz, R. Müller, U. Knodt, B. Günther, I. Walter, I. Sebastian, T. Säuberlich, R. Reulke, E. Carmona, A. Eckardt, et al. The instrument design of the dlr earth sensing imaging spectrom- eter (desis).Sensors, 19(7):1622, 2019

work page 2019

[43] [43]

J. A. Leonardi, J. Jakubik, P. Fraccaro, and M. A. Brovelli. Spectral gaps and spatial priors: Studying hyperspectral downstream adaptation using terramind.arXiv preprint arXiv:2603.06690, 2026

work page arXiv 2026

[44] [44]

Liu, D.-X

Y .-N. Liu, D.-X. Sun, X.-N. Hu, X. Ye, Y .-D. Li, S.-F. Liu, K.-Q. Cao, M.-Y . Chai, W.-Y .-N. Zhou, J. Zhang, Y . Zhang, W.-W. Sun, and L.-L. Jiao. The advanced hyperspectral imager: Aboard china’s gaofen-5 satellite.IEEE Geoscience and Remote Sensing Magazine, 7(4):23–32,

work page

[45] [45]

doi: 10.1109/MGRS.2019.2927687. 13

work page doi:10.1109/mgrs.2019.2927687 2019

[46] [46]

M Rustowicz, R

R. M Rustowicz, R. Cheong, L. Wang, S. Ermon, M. Burke, and D. Lobell. Semantic seg- mentation of crop type in africa: A novel dataset and analysis of deep learning methods. In Proceedings of the IEEE/cvf conference on computer vision and pattern recognition workshops, pages 75–82, 2019

work page 2019

[47] [47]

Manas, A

O. Manas, A. Lacoste, X. Giró-i Nieto, D. Vazquez, and P. Rodriguez. Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data. InProceedings of the IEEE/CVF international conference on computer vision, pages 9414–9423, 2021

work page 2021

[48] [48]

Marsocci, Y

V . Marsocci, Y . Jia, G. L. Bellier, D. Kerekes, L. Zeng, S. Hafner, S. Gerard, E. Brune, R. Yadav, A. Shibli, H. Fang, Y . Ban, M. Vergauwen, N. Audebert, and A. Nascetti. Pangaea: Assessing geospatial foundation models capabilities through a global and inclusive benchmark.IEEE Geoscience and Remote Sensing Magazine, 14(1):245–285, 2026. doi: 10.1109/MG...

work page doi:10.1109/mgrs.2025 2026

[49] [49]

Data use and citation guidance for earth science data, 2025

NASA Earth Science Data and Information System. Data use and citation guidance for earth science data, 2025. URL https://doi.org/10.5067/DOC/ESCO/ESDS-RFC-055. NASA Earth Science data are fully open access without use restrictions, following the ESDS-RFC-055 standard

work page doi:10.5067/doc/esco/esds-rfc-055 2025

[50] [50]

EMIT L2A estimated surface reflectance and uncertainty and masks 60 m V001

NASA LP DAAC. EMIT L2A estimated surface reflectance and uncertainty and masks 60 m V001. NASA Earthdata Search, 2025

work page 2025

[51] [51]

Nascetti, R

A. Nascetti, R. Yadav, K. Brodt, Q. Qu, H. Fan, Y . Shendryk, I. Shah, and C. Chung. Biomassters: A benchmark dataset for forest biomass estimation using multi-modal satellite time-series. Advances in Neural Information Processing Systems, 36:20409–20420, 2023

work page 2023

[52] [52]

Nedungadi, A

V . Nedungadi, A. Kariryaa, S. Oehmcke, S. Belongie, C. Igel, and N. Lang. Mmearth: Exploring multi-modal pretext tasks for geospatial representation learning. InEuropean Conference on Computer Vision, pages 164–182. Springer, 2024

work page 2024

[53] [53]

Pearlman, P

J. Pearlman, P. Barry, C. Segal, J. Shepanski, D. Beiso, and S. Carman. Hyperion, a space- based imaging spectrometer.IEEE Transactions on Geoscience and Remote Sensing, 41(6): 1160–1173, 2003. doi: 10.1109/TGRS.2003.815018

work page doi:10.1109/tgrs.2003.815018 2003

[54] [54]

Persello, J

C. Persello, J. Grift, X. Fan, C. Paris, R. Hänsch, M. Koeva, and A. Nelson. Ai4smallfarms: A dataset for crop field delineation in southeast asian smallholder farms.IEEE Geoscience and Remote Sensing Letters, 20:1–5, 2023. doi: 10.1109/LGRS.2023.3323095

work page doi:10.1109/lgrs.2023.3323095 2023

[55] [55]

Rambour, N

C. Rambour, N. Audebert, E. Koeniguer, B. Le Saux, M. Crucianu, and M. Datcu. Flood detec- tion in time series of optical and sar images.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 43(B2):1343–1346, 2020

work page 2020

[56] [56]

Ryali, Y .-T

C. Ryali, Y .-T. Hu, D. Bolya, C. Wei, H. Fan, P.-Y . Huang, V . Aggarwal, A. Chowdhury, O. Poursaeed, J. Hoffman, J. Malik, Y . Li, and C. Feichtenhofer. Hiera: a hierarchical vision transformer without the bells-and-whistles. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

work page 2023

[57] [57]

R˚ užiˇcka and A

V . R˚ užiˇcka and A. Markham. Hyperspectralvits: General hyperspectral models for on-board remote sensing.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 18:10241–10253, 2025. doi: 10.1109/JSTARS.2025.3557527

work page doi:10.1109/jstars.2025.3557527 2025

[58] [58]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

L. Scheibenreif, M. Mommert, and D. Borth. Masked vision transformers for hyperspectral im- age classification. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2166–2176, 2023. doi: 10.1109/CVPRW59228.2023.00210

work page doi:10.1109/cvprw59228.2023.00210 2023

[59] [59]

M. A. Soppa, M. Brell, S. Chabrillat, L. M. Alvarado, P. Gege, S. Plattner, I. Somlai-Schweiger, T. Schroeder, F. Steinmetz, D. Scheffler, et al. Full mission evaluation of enmap water leaving reflectance products using three atmospheric correction processors.Optics Express, 32(16): 28215–28230, 2024

work page 2024

[60] [60]

Sumbul, C

G. Sumbul, C. Xu, E. Dalsasso, and D. Tuia. Smarties: Spectrum-aware multi-sensor auto- encoder for remote sensing images. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5569–5578, 2025. 14

work page 2025

[61] [61]

X. Sun, P. Wang, W. Lu, Z. Zhu, X. Lu, Q. He, J. Li, X. Rong, Z. Yang, H. Chang, Q. He, G. Yang, R. Wang, J. Lu, and K. Fu. Ringmo: A remote sensing foundation model with masked image modeling.IEEE Transactions on Geoscience and Remote Sensing, 61:1–22, 2023. doi: 10.1109/TGRS.2022.3194732

work page doi:10.1109/tgrs.2022.3194732 2023

[62] [62]

Prithvi-

D. Szwarcman, S. Roy, P. Fraccaro, O. E. Gíslason, B. Blumenstiel, R. Ghosal, P. H. De Oliveira, J. L. de Sousa Almeida, R. Sedona, Y . Kang, et al. Prithvi-eo-2.0: A versatile multitemporal foundation model for earth observation applications.IEEE Transactions on Geoscience and Remote Sensing, 64:1–20, 2025. doi: 10.1109/TGRS.2025.3642610

work page doi:10.1109/tgrs.2025.3642610 2025

[63] [63]

Toker, L

A. Toker, L. Kondmann, M. Weber, M. Eisenberger, A. Camero, J. Hu, A. P. Hoderlein, Ç. ¸ Senaras, T. Davis, D. Cremers, et al. Dynamicearthnet: Daily multi-spectral satellite dataset for semantic change segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21158–21167, 2022

work page 2022

[64] [64]

Tong, G.-S

X.-Y . Tong, G.-S. Xia, and X. X. Zhu. Enabling country-scale land cover mapping with meter- resolution satellite imagery.ISPRS Journal of Photogrammetry and Remote Sensing, 196: 178–196, 2023

work page 2023

[65] [65]

Z. H. Tushar and S. Purushotham. Hyperfm: An efficient hyperspectral foundation model with spectral grouping.arXiv preprint arXiv:2604.21127, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[66] [66]

Geological Survey

U.S. Geological Survey. Are landsat data in the cloud still considered to be within the public domain?, 2020. URL https://www.usgs.gov/faqs/ are-landsat-data-cloud-still-considered-be-within-public-domain . Ac- cessed: 2026-05-20

work page 2020

[67] [67]

SpaceNet: A Remote Sensing Dataset and Challenge Series

A. Van Etten, D. Lindenbaum, and T. M. Bacastow. Spacenet: A remote sensing dataset and challenge series.arXiv preprint arXiv:1807.01232, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[68] [68]

H. V . V o, V . Khalidov, T. Darcet, T. Moutakanni, N. Smetanin, M. Szafraniec, H. Touvron, M. Oquab, A. Joulin, H. Jegou, et al. Automatic data curation for self-supervised learning: A clustering-based approach.Transactions on Machine Learning Research, 2024

work page 2024

[69] [69]

Waldmann, A

L. Waldmann, A. Shah, Y . Wang, N. Lehmann, A. Stewart, Z. Xiong, X. X. Zhu, S. Bauer, and J. Chuang. Panopticon: Advancing any-sensor foundation models for earth observation. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops, pages 2204–2214, 2025

work page 2025

[70] [70]

D. Wang, M. Hu, Y . Jin, Y . Miao, J. Yang, Y . Xu, X. Qin, J. Ma, L. Sun, C. Li, C. Fu, H. Chen, C. Han, N. Yokoya, J. Zhang, M. Xu, L. Liu, L. Zhang, C. Wu, B. Du, D. Tao, and L. Zhang. Hypersigma: Hyperspectral intelligence comprehension foundation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(8):6427–6444, 2025. doi: 10.1109...

work page doi:10.1109/tpami.2025.3557581 2025

[71] [71]

Y . Wang, C. M. Albrecht, N. A. A. Braham, L. Mou, and X. X. Zhu. Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 10(4):213–247,

work page

[72] [72]

doi: 10.1109/MGRS.2022.3198244

work page doi:10.1109/mgrs.2022.3198244 2022

[73] [73]

Y . Wang, N. A. A. Braham, Z. Xiong, C. Liu, C. M. Albrecht, and X. X. Zhu. Ssl4eo-s12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation [software and data sets].IEEE Geoscience and Remote Sensing Magazine, 11(3):98–106, 2023. doi: 10.1109/MGRS.2023.3281651

work page doi:10.1109/mgrs.2023.3281651 2023

[74] [74]

Y . Wang, Z. Xiong, C. Liu, A. J. Stewart, T. Dujardin, N. I. Bountos, A. Zavras, F. Gerken, I. Papoutsis, L. Leal-Taixé, and X. X. Zhu. Towards a unified copernicus foundation model for earth vision. In2025 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9888–9899, 2025. doi: 10.1109/ICCV51701.2025.00922

work page doi:10.1109/iccv51701.2025.00922 2025

[75] [75]

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo. Segformer: Simple and efficient design for semantic segmentation with transformers. InAdvances in Neural Information Processing Systems, volume 34, pages 12077–12090, 2021. 15

work page 2021

[76] [76]

Xiong, Y

Z. Xiong, Y . Wang, F. Zhang, A. J. Stewart, J. Hanna, D. Borth, I. Papoutsis, B. L. Saux, G. Camps-Valls, and X. X. Zhu. Neural plasticity-inspired multimodal foundation model for earth observation.arXiv preprint arXiv:2403.15356, 2024

work page arXiv 2024

[77] [77]

F. Yao, W. Lu, H. Yang, L. Xu, C. Liu, L. Hu, H. Yu, N. Liu, C. Deng, D. Tang, C. Chen, J. Yu, X. Sun, and K. Fu. Ringmo-sense: Remote sensing foundation model for spatiotemporal prediction via spatiotemporal evolution disentangling.IEEE Transactions on Geoscience and Remote Sensing, 61:1–21, 2023. doi: 10.1109/TGRS.2023.3316166. 16 A Dataset details This...

work page doi:10.1109/tgrs.2023.3316166 2023

[78] [78]

Quality Control & Patch Extraction •Filter tiles by georeferencing accuracy, cloud cover, and noisy spectral bands •Extract3.84×3.84km HSI patches; discard invalid/NaN patches Unbalanced HSI Dataset EnMAP: 1.8M locsEMIT: 4.1M locsDESIS: 275K locs 2.6M patches 12M patches 447K patches HSI Preprocessing

work page

[79] [79]

Spatial Sampling •Retrieve annual AlphaEarth embeddings for each HSI location •Cluster embeddings to select geographically diverse sites; retain all timestamps Dataset Balancing

work page

[80] [80]

HSI acquisitions are used as anchors, filtered, patchified, grouped by location, rebalanced, and paired with co-located MSI, SAR, and LST observations

Temporal Alignment & Pairing •Match HSI with nearest MSI/SAR (S-2, L8/9, S-1;≤5% clouds) •Select up to 4 dates/year with seasonal coverage; deduplicate observations acquired within≤10 days of each other Sentinel-2 Landsat 8/9 Sentinel-1 Final SpectralEarth-MM Dataset(∼2M sites,∼25M files) EnMAP: 1.4M locsEMIT: 1.4M locsDESIS: 275K locs Multimodal Pairing ...

work page