arxiv: 2604.21104 · v1 · submitted 2026-04-22 · 💻 cs.CV · cs.LG

Recognition: unknown

Pretrain Where? Investigating How Pretraining Data Diversity Impacts Geospatial Foundation Model Performance

Amandeep Kaur , Mirali Purohit , Gedeon Muhawenayo , Esther Rolf , Hannah Kerner

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:03 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords geospatial foundation modelspretraining dataspectral diversitygeographic compositionremote sensingdownstream performancedata diversity

0 comments

The pith

The pretraining dataset from Europe outperformed global and continent-specific datasets in geospatial foundation models, with only spectral diversity strongly correlating to performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates the impact of pretraining data's geographic composition on geospatial foundation model performance. It created global and per-continent pretraining datasets and tested them on corresponding downstream tasks. The Europe pretraining dataset performed best on both global and local evaluations. Analysis showed that spectral diversity in the data had a strong correlation with performance, unlike diversity in continents, biomes, or landcover.

Core claim

Pretraining on data sampled only from Europe produces models that outperform those pretrained on global data or data from any other continent, when evaluated on both global and regional downstream tasks. Among measures of diversity, spectral diversity alone shows strong correlation with these performance gains.

What carries the argument

Comparison of pretraining datasets varying in geographic composition, with correlation analysis against diversity metrics focused on spectral variation.

Load-bearing premise

Differences in geographic composition and diversity metrics drive performance differences, assuming dataset size, image quality, and training procedures are equivalent across all pretraining sets.

What would settle it

A controlled experiment where two datasets match in all respects except spectral diversity, with the higher spectral diversity one showing no performance gain, would falsify the correlation claim.

Figures

Figures reproduced from arXiv: 2604.21104 by Amandeep Kaur, Esther Rolf, Gedeon Muhawenayo, Hannah Kerner, Mirali Purohit.

**Figure 2.** Figure 2: Proposed pipeline to evaluate the performance of different pretraining datasets on a diverse set of downstream tasks. The [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Performance comparison on FMoW subsets across pre [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Performance comparison of our pretraining datasets [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Correlation plots between mean model performance and diversity measures: continent, biomes, landcover and spectral diversity. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Rows show data samples from the One-hot-Africa, [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Performance comparison on ForTy global subsets across [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Performance comparison on MOSAIKS population den [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Correlation plots between mean model performance and [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

read the original abstract

New geospatial foundation models introduce a new model architecture and pretraining dataset, often sampled using different notions of data diversity. Performance differences are largely attributed to the model architecture or input modalities, while the role of the pretraining dataset is rarely studied. To address this research gap, we conducted a systematic study on how the geographic composition of pretraining data affects a model's downstream performance. We created global and per-continent pretraining datasets and evaluated them on global and per-continent downstream datasets. We found that the pretraining dataset from Europe outperformed global and continent-specific pretraining datasets on both global and local downstream evaluations. To investigate the factors influencing a pretraining dataset's downstream performance, we analysed 10 pretraining datasets using diversity across continents, biomes, landcover and spectral values. We found that only spectral diversity was strongly correlated with performance, while others were weakly correlated. This finding establishes a new dimension of diversity to be accounted for when creating a high-performing pretraining dataset. We open-sourced 7 new pretraining datasets, pretrained models, and our experimental framework at https://github.com/kerner-lab/pretrain-where.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Europe-only pretraining beat global and other continent sets for geospatial models, with spectral diversity as the standout correlate, but unequal data volumes remain a possible confound.

read the letter

The paper's main result is that pretraining a geospatial foundation model on data from Europe alone produced stronger downstream performance than either a global mix or datasets drawn from other individual continents, and that among several diversity measures only spectral diversity tracked closely with those gains. They built global and per-continent pretraining collections, ran the same evaluations on both global and continent-specific downstream tasks, and then checked ten datasets for diversity across continents, biomes, land cover, and spectral values. The Europe win on both global and local tests plus the spectral correlation are the concrete new pieces. This kind of controlled comparison on geographic composition has not been common in the remote-sensing foundation-model literature, so the work fills a stated gap with direct experiments rather than speculation. Opening the seven new datasets, the pretrained models, and the full experimental code is also useful; it lets others inspect the actual data volumes and rerun the correlations. The focus stays on data curation rather than architecture, which is a reasonable shift. The soft spot is the missing information on dataset scale. The abstract and summary give no patch counts, total pixels, or training-step budgets for the different pretraining sets, so it is possible the Europe collection simply contained more or higher-quality samples. If volume or acquisition differences were not equalized, those factors could drive both the ranking and the spectral correlation instead of the reported diversity metrics. No error bars or statistical tests are mentioned either, which leaves the strength of the correlations unclear. This is aimed at researchers who build or select pretraining data for earth-observation models. Anyone working on dataset design for environmental or infrastructure applications would find the comparisons practical. It deserves peer review because the question is relevant and the experimental framing is direct enough to be worth referee scrutiny on the controls and statistics.

Referee Report

2 major / 2 minor

Summary. The paper conducts a systematic empirical study on how the geographic composition of pretraining data affects geospatial foundation model performance. The authors construct global and per-continent pretraining datasets, pretrain models on them, and evaluate on both global and continent-specific downstream tasks. They report that the Europe-only pretraining dataset outperforms the global dataset and other continent-specific datasets on both global and local evaluations. Analysis of 10 pretraining datasets for diversity metrics across continents, biomes, landcover, and spectral values finds that only spectral diversity is strongly correlated with downstream performance, while the others are weakly correlated. The work open-sources 7 new pretraining datasets, pretrained models, and the experimental framework.

Significance. If the central findings hold after addressing potential confounders, the result highlights spectral diversity as an under-appreciated dimension for curating effective pretraining data in geospatial foundation models, beyond simple geographic coverage. The open-sourcing of datasets, models, and code is a clear strength that supports reproducibility and enables follow-on work in data-centric model development.

major comments (2)

[Section 3] Section 3 (Pretraining Datasets construction): The manuscript does not report patch counts, total pixels, or sampling volumes for the global, Europe, Africa, Asia, etc. pretraining datasets, nor confirm that training steps and data volumes are equalized across conditions. This is load-bearing for the central claim, as the Europe-outperformance result and the spectral-diversity correlation could be explained by unequal dataset sizes or image quality rather than geographic composition or the reported diversity metrics.
[Results] Results section (performance tables/figures): No error bars, standard deviations, or statistical significance tests are mentioned for the downstream evaluation differences (e.g., Europe vs. global). Without these, it is difficult to assess whether the reported outperformance is robust or could arise from training variance.

minor comments (2)

[Section 4] A table listing all 10 analyzed pretraining datasets with their exact geographic coverage, size, and diversity metric values would improve clarity and allow readers to verify the correlation analysis.
[Abstract and Section 4] The abstract states that 'only spectral diversity was strongly correlated' but does not specify the correlation coefficient or p-value; adding these quantitative details in the main text would strengthen the claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important aspects of experimental rigor and statistical reporting. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Section 3] Section 3 (Pretraining Datasets construction): The manuscript does not report patch counts, total pixels, or sampling volumes for the global, Europe, Africa, Asia, etc. pretraining datasets, nor confirm that training steps and data volumes are equalized across conditions. This is load-bearing for the central claim, as the Europe-outperformance result and the spectral-diversity correlation could be explained by unequal dataset sizes or image quality rather than geographic composition or the reported diversity metrics.

Authors: We agree that these details are necessary to support the central claims. In the revised manuscript, we will add a dedicated table in Section 3 reporting the exact patch counts, total pixels, and sampling volumes for the global and each continent-specific pretraining dataset. We will also explicitly confirm that all pretraining runs were performed with identical training steps, batch sizes, and optimizer configurations to equalize data exposure and compute. All datasets were derived from the same underlying sources (e.g., Sentinel-2) with consistent preprocessing pipelines, which minimizes systematic differences in image quality. We will add a brief discussion of any remaining potential confounders. revision: yes
Referee: [Results] Results section (performance tables/figures): No error bars, standard deviations, or statistical significance tests are mentioned for the downstream evaluation differences (e.g., Europe vs. global). Without these, it is difficult to assess whether the reported outperformance is robust or could arise from training variance.

Authors: We acknowledge the value of statistical reporting for assessing robustness. In the revised results section, we will include error bars (standard deviations) on all performance tables and figures, computed across multiple independent runs with different random seeds. We will also report the results of statistical significance tests (e.g., paired t-tests with p-values) for the key comparisons, including Europe versus global pretraining. These additions will allow readers to better evaluate the reliability of the observed differences. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparisons with no derivations or self-referential reductions

full rationale

The paper conducts an empirical study by constructing global and per-continent pretraining datasets, training models under each condition, evaluating on global and local downstream tasks, and computing correlations between performance and measured diversity metrics (continents, biomes, landcover, spectral). No equations, ansatzes, uniqueness theorems, or fitted parameters are invoked; claims rest on direct experimental outcomes and observed correlations rather than any chain that reduces a result to its own inputs by construction. The work is self-contained against external benchmarks via open-sourced datasets and models.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on experimental comparisons rather than mathematical derivations; the main unstated premises concern the validity of the chosen diversity metrics and the assumption that downstream accuracy faithfully reflects pretraining quality.

axioms (1)

domain assumption Downstream task performance reliably indicates the quality of pretraining data composition
The study equates higher downstream accuracy with better pretraining datasets without additional validation of this proxy.

pith-pipeline@v0.9.0 · 5517 in / 1244 out tokens · 64658 ms · 2026-05-10T00:03:04.866101+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

No One Knows the State of the Art in Geospatial Foundation Models
cs.CV 2026-05 accept novelty 6.0

An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.

Reference graph

Works this paper leans on

42 extracted references · 1 canonical work pages · cited by 1 Pith paper

[1]

Anysat: One earth observation model for many resolutions, scales, and modalities

Guillaume Astruc, Nicolas Gonthier, Clément Mallet, and Loic Landrieu. Anysat: One earth observation model for many resolutions, scales, and modalities. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19530–19540, 2025. 2

2025
[2]

Satlaspretrain: A large- scale dataset for remote sensing image understanding

Favyen Bastani, Piper Wolters, Ritwik Gupta, Joe Ferdi- nando, and Aniruddha Kembhavi. Satlaspretrain: A large- scale dataset for remote sensing image understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16772–16782, 2023. 1, 2

2023
[3]

Mapping on a budget: Optimizing spatial data collection for ml

Livia Betti, Farooq Sanni, Gnouyaro Sogoyou, Togbe Ag- bagla, Cullen Molitor, Tamma Carleton, and Esther Rolf. Mapping on a budget: Optimizing spatial data collection for ml. InProceedings of the AAAI Conference on Artificial In- telligence, 2026. 3

2026
[4]

S. T. Brown, P. Buitrago, E. Hanna, S. Sanielevici, R. Scibek, and N. A. Nystrom. Bridges-2: A platform for rapidly- evolving and data intensive research. InPractice and Expe- rience in Advanced Research Computing, pages 1–4, 2021. 1

2021
[5]

Functional map of the world

Gordon Christie, Neil Fendley, James Wilson, and Ryan Mukherjee. Functional map of the world. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 2

2018
[6]

Lobell, and Stefano Ermon

Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David B. Lobell, and Stefano Ermon. SatMAE: Pre-training transformers for tem- poral and multi-spectral satellite imagery. InAdvances in Neural Information Processing Systems, 2022. 1, 4, 5

2022
[7]

BERT: Pre-training of deep bidirectional trans- formers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional trans- formers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the As- sociation for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapo...

2019
[8]

Olson, Anup R

Eric Dinerstein, David P. Olson, Anup R. Joshi, Carly Vynne, Neil D. Burgess, Eric D. Wikramanayake, Nathan R. Hahn, Suzanne Palminteri, Prashant Hedao, Reed F. Noss, Matt Hansen, Harvey Locke, Erle C. Ellis, Benjamin S Jones, Charles Victor Barber, R. Hayes, Cyril F. Kormos, Vance G. Martin, Eileen Crist, Wes Sechrest, Lori Price, Jonathan E. M. Baillie,...

2017
[9]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, 9 Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representa- tions, 2021. 1

2021
[10]

Moein Shariatnia, Hanie Sedghi, and Ludwig Schmidt

Rahim Entezari, Mitchell Wortsman, Olga Saukh, M. Moein Shariatnia, Hanie Sedghi, and Ludwig Schmidt. The role of pre-training data in transfer learning. InICLR 2023 Work- shop on Multimodal Representation Learning: Perks and Pitfalls, 2023. 2, 4, 6

2023
[11]

Data determines distributional robustness in contrastive language image pre-training (CLIP)

Alex Fang, Gabriel Ilharco, Mitchell Wortsman, Yuhao Wan, Vaishaal Shankar, Achal Dave, and Ludwig Schmidt. Data determines distributional robustness in contrastive language image pre-training (CLIP). InProceedings of the 39th In- ternational Conference on Machine Learning, pages 6216–
[12]

Major tom: Ex- pandable datasets for earth observation.IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Sympo- sium, pages 2935–2940, 2024

Alistair Francis and Mikolaj Czerkawski. Major tom: Ex- pandable datasets for earth observation.IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Sympo- sium, pages 2935–2940, 2024. 1, 2, 6

2024
[13]

Croma: Remote sensing representations with contrastive radar- optical masked autoencoders.Advances in Neural Informa- tion Processing Systems, 36:5506–5538, 2023

Anthony Fuller, Koreen Millard, and James Green. Croma: Remote sensing representations with contrastive radar- optical masked autoencoders.Advances in Neural Informa- tion Processing Systems, 36:5506–5538, 2023. 1, 4

2023
[14]

Bridging remote sensors with multisensor geospatial foundation models

Boran Han, Shuai Zhang, Xingjian Shi, and Markus Reich- stein. Bridging remote sensors with multisensor geospatial foundation models. InProceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 27852– 27862, 2024. 4

2024
[15]

Universal language model fine-tuning for text classification

Jeremy Howard and Sebastian Ruder. Universal language model fine-tuning for text classification. InAnnual Meeting of the Association for Computational Linguistics, 2018. 1

2018
[16]

Jakubik, S

Johannes Jakubik, Sujit Roy, CE Phillips, Paolo Fraccaro, Denys Godwin, Bianca Zadrozny, Daniela Szwarcman, Car- los Gomes, Gabby Nyirjesy, Blair Edwards, et al. Foundation models for generalist geospatial artificial intelligence.arXiv preprint arXiv:2310.18660, 2023. 1, 4

work page arXiv 2023
[17]

Douglas M. Jennewein, Johnathan Lee, Chris Kurtz, Will Dizon, Ian Shaeffer, Alan Chapman, Alejandro Chiquete, Josh Burks, Amber Carlson, Natalie Mason, Arhat Kobwala, Thirugnanam Jagadeesan, Praful Barghav, Torey Battelle, Rebecca Belshe, Debra McCaffrey, Marisa Brazil, Chai- tanya Inumella, Kirby Kuznia, Jade Buzinski, Sean Dudley, Dhruvil Shah, Gil Spey...

2023
[18]

Not every tree is a for- est: Benchmarking forest types from satellite remote sens- ing

Yuchang Jiang and Maxim Neumann. Not every tree is a for- est: Benchmarking forest types from satellite remote sens- ing. InIGARSS 2025 - 2025 IEEE International Geoscience and Remote Sensing Symposium, pages 808–813, 2025. 4

2025
[19]

GEO-bench: Toward foundation models for earth monitoring

Alexandre Lacoste, Nils Lehmann, Pau Rodriguez, Evan David Sherwin, Hannah Kerner, Björn Lütjens, Jeremy Andrew Irvin, David Dao, Hamed Alemohammad, Alexandre Drouin, Mehmet Gunturkun, Gabriel Huang, David Vazquez, Dava Newman, Yoshua Bengio, Stefano Er- mon, and Xiao Xiang Zhu. GEO-bench: Toward foundation models for earth monitoring. InThirty-seventh Co...

2023
[20]

Masked angle-aware autoencoder for remote sensing images

Zhihao Li, Biao Hou, Siteng Ma, Zitong Wu, Xianpeng Guo, Bo Ren, and Licheng Jiao. Masked angle-aware autoencoder for remote sensing images. InEuropean Conference on Com- puter Vision, pages 260–278. Springer, 2024. 4

2024
[21]

A pre- trainer’s guide to training data: Measuring the effects of data age, domain coverage, quality, & toxicity

Shayne Longpre, Gregory Yauney, Emily Reif, Katherine Lee, Adam Roberts, Barret Zoph, Denny Zhou, Jason Wei, Kevin Robinson, David Mimno, and Daphne Ippolito. A pre- trainer’s guide to training data: Measuring the effects of data age, domain coverage, quality, & toxicity. InProceedings of the 2024 Conference of the North American Chapter of the Associatio...

2024
[22]

Seasonal contrast: Un- supervised pre-training from uncurated remote sensing data

Oscar Mañas, Alexandre Lacoste, Xavier Giró-i Nieto, David Vazquez, and Pau Rodríguez. Seasonal contrast: Un- supervised pre-training from uncurated remote sensing data. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9414–9423, 2021. 1, 2, 6

2021
[23]

Towards geospatial foundation models via con- tinual pretraining

Matías Mendieta, Boran Han, Xingjian Shi, Yi Zhu, and Chen Chen. Towards geospatial foundation models via con- tinual pretraining. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 16806–16816, 2023. 2

2023
[24]

Mmearth: Ex- ploring multi-modal pretext tasks for geospatial representa- tion learning

Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie, Christian Igel, and Nico Lang. Mmearth: Ex- ploring multi-modal pretext tasks for geospatial representa- tion learning. InEuropean Conference on Computer Vision, pages 164–182. Springer, 2024. 1, 2, 4

2024
[25]

Rethinking transformers pre-training for multi- spectral satellite imagery

Mubashir Noman, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, and Fahad Shah- baz Khan. Rethinking transformers pre-training for multi- spectral satellite imagery. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27811–27819, 2024. 4

2024
[26]

Zimmermann

Elena Plekhanova, Damien Robert, Johannes Dollinger, Emilia Arens, Philipp Brun, Jan Dirk Wegner, and Niklaus E. Zimmermann. Ssl4eco: A global seasonal dataset for geospatial foundation models in ecology. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2428–2439, 2025. 1, 2, 3, 5, 6, 4

2025
[27]

How does the spatial distribution of pre-training data affect geospatial foundation models? InWorkshop on Preparing Good Data for Generative AI: Challenges and Ap- proaches, 2025

Mirali Purohit, Gedeon Muhawenayo, Esther Rolf, and Han- nah Kerner. How does the spatial distribution of pre-training data affect geospatial foundation models? InWorkshop on Preparing Good Data for Generative AI: Challenges and Ap- proaches, 2025. 3, 6

2025
[28]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 4 10

2021
[29]

On the connection between pre- training data diversity and fine-tuning robustness.Advances in Neural Information Processing Systems, 36:66426–66437,

Vivek Ramanujan, Thao Nguyen, Sewoong Oh, Ali Farhadi, and Ludwig Schmidt. On the connection between pre- training data diversity and fine-tuning robustness.Advances in Neural Information Processing Systems, 36:66426–66437,
[30]

Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning

Colorado J Reed, Ritwik Gupta, Shufan Li, Sarah Brock- man, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell. Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4088– 4099, 2023. 4

2023
[31]

A generalizable and accessible ap- proach to machine learning with global satellite imagery.Na- ture communications, 12(1):4392, 2021

Esther Rolf, Jonathan Proctor, Tamma Carleton, Ian Bol- liger, Vaishaal Shankar, Miyabi Ishihara, Benjamin Recht, and Solomon Hsiang. A generalizable and accessible ap- proach to machine learning with global satellite imagery.Na- ture communications, 12(1):4392, 2021. 1, 4

2021
[32]

Representation matters: Assessing the im- portance of subgroup allocations in training data

Esther Rolf, Theodora T Worledge, Benjamin Recht, and Michael Jordan. Representation matters: Assessing the im- portance of subgroup allocations in training data. InPro- ceedings of the 38th International Conference on Machine Learning, pages 9040–9051. PMLR, 2021. 3

2021
[33]

Bigearthnet: A large-scale benchmark archive for remote sensing image understanding

Gencer Sumbul, Marcela Charfuelan, Begüm Demir, and V olker Markl. Bigearthnet: A large-scale benchmark archive for remote sensing image understanding. InIGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, pages 5901–5904, 2019. 2

2019
[34]

Prithvi-eo-2.0: A versatile multitem- poral foundation model for earth observation applications

Daniela Szwarcman, Sujit Roy, Paolo Fraccaro, Þorsteinn Elí Gíslason, Benedikt Blumenstiel, Rinki Ghosal, Pedro Hen- rique de Oliveira, Joao Lucas de Sousa Almeida, Rocco Se- dona, Yanghui Kang, Srija Chakraborty, Sizhe Wang, Car- los Gomes, Ankur Kumar, Vishal Gaur, Myscon Truong, Denys Godwin, Sam Khallaghi, Hyunho Lee, Chia-Yu Hsu, Ata Akbari Asanjan, ...

2026
[35]

Lightweight, pre-trained transformers for remote sensing timeseries

Gabriel Tseng, Ruben Cartuyvels, Ivan Zvonkov, Mirali Purohit, David Rolnick, and Hannah R Kerner. Lightweight, pre-trained transformers for remote sensing timeseries. In NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning, 2023. 4

2023
[36]

Galileo: Learning global & local features of many remote sens- ing modalities

Gabriel Tseng, Anthony Fuller, Marlena Reil, Henry Her- zog, Patrick Beukema, Favyen Bastani, James R Green, Evan Shelhamer, Hannah Kerner, and David Rolnick. Galileo: Learning global & local features of many remote sens- ing modalities. InProceedings of the 42nd International Conference on Machine Learning, pages 60280–60300. PMLR, 2025. 1, 2, 4

2025
[37]

Panopticon: Advancing any-sensor foundation models for earth observation

Leonard Waldmann, Ando Shah, Yi Wang, Nils Lehmann, Adam Stewart, Zhitong Xiong, Xiao Xiang Zhu, Stefan Bauer, and John Chuang. Panopticon: Advancing any-sensor foundation models for earth observation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2229–2239, 2025. 2

2025
[38]

Albrecht, and Xiao Xiang Zhu

Yi Wang, Nassim Ait Ali Braham, Zhitong Xiong, Cheny- ing Liu, Conrad M. Albrecht, and Xiao Xiang Zhu. Ssl4eo- s12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation [software and data sets].IEEE Geoscience and Remote Sensing Magazine, 11(3):98–106, 2023. 2, 5

2023
[39]

Stewart, Thomas Dujardin, Nikolaos Ioannis Bountos, Angelos Za- vras, Franziska Gerken, Ioannis Papoutsis, Laura Leal-Taixé, and Xiao Xiang Zhu

Yi Wang, Zhitong Xiong, Chenying Liu, Adam J. Stewart, Thomas Dujardin, Nikolaos Ioannis Bountos, Angelos Za- vras, Franziska Gerken, Ioannis Papoutsis, Laura Leal-Taixé, and Xiao Xiang Zhu. Towards a unified copernicus founda- tion model for earth vision. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9888–9899, 2...

2025
[40]

Esa worldcover 10 m 2021 v200, 2022

Daniele Zanaga, Ruben Van De Kerchove, Dirk Daems, Wanda De Keersmaecker, Carsten Brockmann, Grit Kirches, Jan Wevers, Oliver Cartus, Maurizio Santoro, Steffen Fritz, et al. Esa worldcover 10 m 2021 v200, 2022. 5 11 Pretrain Where? Investigating How Pretraining Data Diversity Impacts Geospatial Foundation Model Performance Supplementary Material A. Additi...

2021
[41]

FMoW- Sentinel differs from the datasets we created as our im- ages were chosen uniformly at random (UAR)

Scenes: FMoW-Sentinel is a 62-class scene classifica- tion dataset with images of urban structures. FMoW- Sentinel differs from the datasets we created as our im- ages were chosen uniformly at random (UAR). Our pre- training captured random scenes without any structure
[42]

But the SatMAE dataset pipelines randomly scale and crop images to a final size of96×96

Image size: FMoW-Sentinel, being a scene classifica- tion dataset, has images of random sizes, with variable heights and widths ranging from 50 to 500 pixels. But the SatMAE dataset pipelines randomly scale and crop images to a final size of96×96. To limit the dataset size, we restricted the downloaded image resolution to 96×96. Despite the reduced resolu...

2024