How to Embed Matters: Evaluation of EO Embedding Design Choices
Pith reviewed 2026-05-15 13:38 UTC · model grok-4.3
The pith
Embedding design choices in geospatial foundation models shape performance on earth observation tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Experiments across multiple models establish that transformer backbones paired with mean pooling deliver strong default embeddings, intermediate ResNet layers can surpass final-layer features, self-supervised pretraining objectives display task-dependent strengths, and fusing embeddings from different objectives improves robustness on earth observation benchmarks.
What carries the argument
Comparative evaluation of embedding strategies covering backbone type, representation depth, spatial aggregation method, and objective combination.
Load-bearing premise
The performance patterns seen on the benchmark dataset generalize to other earth observation data, sensors, and tasks.
What would settle it
A new dataset or task where mean pooling on transformers consistently underperforms alternatives or where combining embeddings fails to increase robustness would disprove the reported trends.
Figures
read the original abstract
Earth observation (EO) missions produce petabytes of multispectral imagery, increasingly analyzed using large Geospatial Foundation Models (GeoFMs). Alongside end-to-end adaptation, workflows make growing use of intermediate representations as task-agnostic embeddings, enabling models to compute representations once and reuse them across downstream tasks. Consequently, when GeoFMs act as feature extractors, decisions about how representations are obtained, aggregated, and combined affect downstream performance and pipeline scalability. Understanding these trade-offs is essential for scalable embedding-based EO workflows, where compact embeddings can replace raw data while remaining broadly useful. We present a systematic analysis of embedding design in GeoFM-based EO workflows. Leveraging NeuCo-Bench, we study how backbone architecture, pretraining strategy, representation depth, spatial aggregation, and representation combination influence EO task performance. We demonstrate the usability of GeoFM embeddings by aggregating them into fixed-size representations more than 500x smaller than the raw input data. Across models, we find consistent trends: transformer backbones with mean pooling provide strong default embeddings, intermediate ResNet layers can outperform final layers, self-supervised objectives exhibit task-specific strengths, and combining embeddings from different objectives often improves robustness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates embedding design choices for Geospatial Foundation Models (GeoFMs) in Earth Observation (EO) workflows using the NeuCo-Bench benchmark. It systematically analyzes the effects of backbone architecture, pretraining strategy, representation depth, spatial aggregation, and embedding combination on downstream task performance. The central findings are that transformer backbones with mean pooling provide strong default embeddings, intermediate ResNet layers can outperform final layers, self-supervised objectives have task-specific strengths, and combining embeddings from different objectives improves robustness, all while achieving over 500x compression compared to raw data.
Significance. If the reported trends hold beyond the evaluated benchmark, this work offers actionable insights for scalable EO pipelines that leverage GeoFMs as fixed feature extractors. The emphasis on compact, reusable embeddings addresses key challenges in handling petabyte-scale multispectral imagery, potentially guiding practitioners toward more efficient and robust workflows.
major comments (2)
- Results section: The claims of 'consistent trends' (transformer+mean pooling as default, intermediate ResNet layers outperforming final layers, task-specific self-supervised strengths, and robustness from combinations) are presented without error bars, statistical significance tests, or details on dataset splits and controls for confounding factors, leaving the empirical support for these load-bearing findings only moderately substantiated.
- Discussion section: The positioning of NeuCo-Bench trends as relevant to general GeoFM workflows (including 500x compression benefits) is not supported by any cross-dataset or cross-sensor validation; if NeuCo-Bench shares unaccounted biases in sensor characteristics or label distributions, the reported design preferences may not generalize and undermine the claimed utility.
minor comments (2)
- Abstract: Expand 'EO' and 'GeoFM' on first use and provide a brief parenthetical definition of NeuCo-Bench for readers unfamiliar with the benchmark.
- Methods: Include explicit pseudocode or equations for the spatial aggregation (e.g., mean pooling) and embedding combination procedures to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive comments. We address each major comment below, indicating the revisions we plan to make to strengthen the manuscript.
read point-by-point responses
-
Referee: Results section: The claims of 'consistent trends' (transformer+mean pooling as default, intermediate ResNet layers outperforming final layers, task-specific self-supervised strengths, and robustness from combinations) are presented without error bars, statistical significance tests, or details on dataset splits and controls for confounding factors, leaving the empirical support for these load-bearing findings only moderately substantiated.
Authors: We agree that the empirical support can be strengthened by including error bars, statistical tests, and additional methodological details. In the revised version, we will report error bars based on multiple random seeds or cross-validation folds for the key performance metrics. We will also include statistical significance tests (e.g., paired t-tests) for the main comparisons supporting our 'consistent trends' claims. Furthermore, we will expand the experimental setup section to provide complete details on dataset splits, preprocessing, and any controls for confounding factors. These additions will make the results more robustly substantiated. revision: yes
-
Referee: Discussion section: The positioning of NeuCo-Bench trends as relevant to general GeoFM workflows (including 500x compression benefits) is not supported by any cross-dataset or cross-sensor validation; if NeuCo-Bench shares unaccounted biases in sensor characteristics or label distributions, the reported design preferences may not generalize and undermine the claimed utility.
Authors: We acknowledge this as a valid limitation of the current study. While NeuCo-Bench includes a variety of EO tasks and sensor modalities to promote diversity, we agree that broader cross-dataset and cross-sensor validation would enhance generalizability claims. In the revision, we will temper the language in the discussion to specify that the observed trends hold within the NeuCo-Bench benchmark and discuss potential biases related to sensor characteristics and label distributions. We will retain the 500x compression claim as it is a direct comparison of embedding size to raw data size, independent of the specific benchmark, but clarify its applicability to embedding-based workflows in general. We will also add a section on limitations and future work to address generalization. revision: partial
Circularity Check
No significant circularity: pure empirical benchmarking study
full rationale
The paper conducts a systematic empirical evaluation of EO embedding design choices (backbone architecture, pretraining strategy, representation depth, spatial aggregation, and combination) by measuring downstream task performance on the external NeuCo-Bench benchmark. No mathematical derivations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations are present. All reported trends (e.g., transformer+mean pooling as strong default, intermediate ResNet layers outperforming final layers) are direct observations from benchmark metrics and do not reduce to the paper's own inputs by construction. The analysis is self-contained against the stated benchmark without circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption NeuCo-Bench is representative of real-world EO tasks and datasets.
Reference graph
Works this paper leans on
-
[1]
Benedikt Blumenstiel, Paolo Fraccaro, Valerio Marsocci, Johannes Jakubik, Stefano Maurogiovanni, Mikolaj Cz- erkawski, Rocco Sedona, Gabriele Cavallaro, Thomas Brun- schwiler, Juan Bernabe-Moreno, and Nicolas Long´ep´e. Ter- ramesh: A planetary mosaic of multimodal earth observation data.arXiv preprint arXiv:2504.11172, 2025. 5
-
[2]
Ssl4eo-s12 v1. 1: A multimodal, multiseasonal dataset for pretraining, updated,
Benedikt Blumenstiel, Nassim Ait Ali Braham, Conrad M. Albrecht, Stefano Maurogiovanni, and Paolo Fraccaro. Ssl4eo-s12 v1.1: A multimodal, multiseasonal dataset for pretraining, updated.arXiv preprint arXiv:2503.00168,
-
[3]
On the Opportunities and Risks of Foundation Models
Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Alt- man, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, and oth- ers. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021. 1
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[4]
Christopher F. Brown, Michal R. Kazmierski, Valerie J. Pasquarella, William J. Rucklidge, Masha Samsikova, Chen- hui Zhang, Evan Shelhamer, Estefania Lahera, Olivia Wiles, Simon Ilyushchenko, Noel Gorelick, Lihui Lydia Zhang, Sophia Alj, Emily Schechter, Sean Askay, Oliver Guinan, Rebecca Moore, Alexis Boukouvalas, and Pushmeet Kohli. AlphaEarth Foundatio...
-
[5]
Emerg- ing properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 2, 3
work page 2021
-
[6]
Global and dense embeddings of Earth: Major TOM floating in the latent space,
Mikolaj Czerkawski, Marcin Kluczek, J ¨A Bojanowski, and others. Global and dense embeddings of earth: Ma- jor tom floating in the latent space.arXiv preprint arXiv:2412.05600, 2024. 2
-
[7]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[8]
SSL4EO-S12-downstream.https : / / huggingface
Embed2Scale. SSL4EO-S12-downstream.https : / / huggingface . co / datasets / embed2scale / SSL4EO - S12 - downstream, 2025. Hugging Face dataset. 3
work page 2025
- [9]
-
[10]
Accessed: 2025-08-02. 1
work page 2025
-
[11]
Li- saius, Markus Immitzer, Toby Jackson, James Ball, David A
Zhengpeng Feng, Clement Atzberger, Sadiq Jaffer, Jovana Knezevic, Silja Sormunen, Robin Young, Madeline C. Li- saius, Markus Immitzer, Toby Jackson, James Ball, David A. Coomes, Anil Madhavapeddy, Andrew Blake, and Srini- vasan Keshav. TESSERA: Temporal embeddings of surface spectra for earth representation and analysis, 2025. 2
work page 2025
-
[12]
Ter- ratorch: The geospatial foundation models toolkit.arXiv preprint arXiv:2503.20563, 2025
Carlos Gomes, Benedikt Blumenstiel, Joao Lucas de Sousa Almeida, Pedro Henrique de Oliveira, Paolo Frac- caro, Francesc Marti Escofet, Daniela Szwarcman, Naomi Simumba, Romeo Kienzler, and Bianca Zadrozny. Ter- ratorch: The geospatial foundation models toolkit.arXiv preprint arXiv:2503.20563, 2025. 3
-
[13]
Hua-Dong Guo, Li Zhang, and Lan-Wei Zhu. Earth obser- vation big data for climate change research.Advances in Climate Change Research, 6(2):108–117, 2015. Publisher: Elsevier. 1
work page 2015
-
[14]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 3
work page 2016
-
[15]
Momentum contrast for unsupervised visual rep- resentation learning
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual rep- resentation learning. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 9729–9738, 2020. 2, 3
work page 2020
-
[16]
Masked autoencoders are scalable vision learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022. 2, 3
work page 2022
-
[17]
Thorsten Hoeser and Claudia Kuenzer. Object detection and image segmentation with deep learning on earth observation data: A review-part i: Evolution and recent trends.Remote Sensing, 12(10):1667, 2020. Publisher: MDPI. 2
work page 2020
-
[18]
TerraMind-1.0-small.https : / / huggingface
IBM ESA Geospatial. TerraMind-1.0-small.https : / / huggingface . co / ibm - esa - geospatial / TerraMind- 1.0- small, 2025. Hugging Face model release. 3
work page 2025
-
[19]
Terramind: Large-scale generative multi- modality for earth observation
Johannes Jakubik, Felix Yang, Benedikt Blumenstiel, Erik Scheurer, Rocco Sedona, Stefano Maurogiovanni, Jente Bosmans, Nikolaos Dionelis, Valerio Marsocci, Niklas Kopp, Rahul Ramachandran, Paolo Fraccaro, Thomas Brun- schwiler, Gabriele Cavallaro, Juan Bernabe-Moreno, and Nicolas Long´ep´e. Terramind: Large-scale generative multi- modality for earth obser...
-
[20]
Satclip: Global, general- purpose location embeddings with satellite imagery
Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, and Marc Rußwurm. Satclip: Global, general- purpose location embeddings with satellite imagery. InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 4347–4355, 2025. Issue: 4. 2
work page 2025
-
[21]
Earth embeddings: Towards ai-centric representations of our planet.EarthArXiv preprint, 2025
Konstantin Klemmer, Esther Rolf, Marc Rußwurm, Gus- tau Camps-Valls, Mikolaj Czerkawski, Stefano Ermon, Alis- tair Francis, Nathan Jacobs, Hannah Kerner, Lester Mackey, Gengchen Mai, Oisin Mac Aodha, Markus Reichstein, Caleb Robinson, David Rolnick, Evan Shelhamer, Vincent Sitz- mann, Devis Tuia, and Xiao Xiang Zhu. Earth embeddings: Towards ai-centric re...
work page 2025
-
[22]
Alexandre Lacoste, Nils Lehmann, Pau Rodriguez, Evan Sherwin, Hannah Kerner, Bj¨orn L¨utjens, Jeremy Irvin, David Dao, Hamed Alemohammad, Alexandre Drouin, and others. Geo-bench: Toward foundation models for earth monitor- ing.Advances in Neural Information Processing Systems, 36:51080–51093, 2023. 2
work page 2023
-
[23]
PANGAEA: A global and inclusive benchmark for geospatial foundation models,
Valerio Marsocci, Yuru Jia, Georges Le Bellier, David Kerekes, Liang Zeng, Sebastian Hafner, Sebastian Gerard, Eric Brune, Ritu Yadav, Ali Shibli, and others. Pangaea: A global and inclusive benchmark for geospatial foundation models.arXiv preprint arXiv:2412.04204, 2024. 2
-
[24]
Rethinking transformers pre-training for multi- spectral satellite imagery
Mubashir Noman, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, and Fahad Shah- baz Khan. Rethinking transformers pre-training for multi- spectral satellite imagery. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27811–27819, 2024. 8
work page 2024
-
[25]
Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. Do vision trans- formers see like convolutional neural networks?Advances in neural information processing systems, 34:12116–12128,
-
[26]
GEO- Bench-2: From performance to capability, rethinking eval- uation in geospatial AI, 2026
Naomi Simumba, Nils Lehmann, Paolo Fraccaro, Hamed Alemohammad, Geeth De Mel, Salman Khan, Manil Maskey, Nicolas Longepe, Xiao Xiang Zhu, Hannah Kerner, Juan Bernabe-Moreno, and Alexandre Lacoste. GEO- Bench-2: From performance to capability, rethinking eval- uation in geospatial AI, 2026. 2
work page 2026
-
[27]
Stewart, Caleb Robinson, Isaac A
Adam J. Stewart, Caleb Robinson, Isaac A. Corley, Anthony Ortiz, Juan M. Lavista Ferres, and Arindam Banerjee. Torch- Geo: Deep learning with geospatial data.ACM Trans. Spa- tial Algorithms Syst., 11(4):1–28, 2025. 3
work page 2025
- [28]
-
[29]
Albrecht, Nassim Ait Ali Braham, Lichao Mou, and Xiao Xiang Zhu
Yi Wang, Conrad M. Albrecht, Nassim Ait Ali Braham, Lichao Mou, and Xiao Xiang Zhu. Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 11(3):22–51, 2023. 1
work page 2023
-
[30]
Y . Wang, N. A. A. Braham, Z. Xiong, C. Liu, C. M. Albrecht, and X. X. Zhu. Ssl4eo-s12: A large-scale multimodal, mul- titemporal dataset for self-supervised learning in earth ob- servation.IEEE Geosci. Remote Sens. Mag., 11(3):98–106,
-
[31]
Decou- pling common and unique representations for multimodal self-supervised learning
Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Chenying Liu, Zhitong Xiong, and Xiao Xiang Zhu. Decou- pling common and unique representations for multimodal self-supervised learning. InEuropean Conference on Com- puter Vision, pages 286–303. Springer, 2024. 2, 3
work page 2024
-
[32]
Yi Wang, Conrad M Albrecht, and Xiao Xiang Zhu. Multi- label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining.IEEE Transactions on Geoscience and Remote Sensing, 2024. Publisher: IEEE. 2, 3
work page 2024
-
[33]
Yi Wang, Hugo Hern ´andez Hern´andez, Conrad M Albrecht, and Xiao Xiang Zhu. Feature guided masked autoencoder for self-supervised learning in remote sensing.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024. Publisher: IEEE. 2, 3
work page 2024
-
[34]
R. Wilkinson, M.M. Mleczko, R.J.W. Brewin, K.J. Gaston, M. Mueller, J.D. Shutler, X. Yan, and K. Anderson. Environ- mental impacts of earth observation data in the constellation and cloud computing era.Science of The Total Environment, 909:168584, 2024. 1, 2
work page 2024
-
[35]
Aoran Xiao, Weihao Xuan, Junjue Wang, Jiaxing Huang, Dacheng Tao, Shijian Lu, and Naoto Yokoya. Foundation models for remote sensing and earth observation: A sur- vey.IEEE Geoscience and Remote Sensing Magazine, 13 (4):297–324, 2025. 1, 2
work page 2025
-
[36]
Earthnets: Empowering ai in earth obser- vation.arXiv preprint arXiv:2210.04936, 2022
Zhitong Xiong, Fahong Zhang, Yi Wang, Yilei Shi, and Xiao Xiang Zhu. Earthnets: Empowering ai in earth obser- vation.arXiv preprint arXiv:2210.04936, 2022. 2
-
[37]
Zhitong Xiong, Yi Wang, Fahong Zhang, Adam J. Stewart, Jo¨elle Hanna, Damian Borth, Ioannis Papoutsis, Bertrand Le Saux, Gustau Camps-Valls, and Xiao Xiang Zhu. Neural plasticity-inspired multimodal foundation model for earth observation, 2024. 2
work page 2024
-
[38]
Xiao Xiang Zhu, Devis Tuia, Lichao Mou, Gui-Song Xia, Liangpei Zhang, Feng Xu, and Friedrich Fraundorfer. Deep learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 5(4):8–36, 2017. 1 How to Embed Matters: Evaluation of EO Embedding Design Choices Supplementary Material In the supplementary material, we provide additional re- s...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.