pith. sign in

arxiv: 2606.02374 · v1 · pith:LEF57U6Rnew · submitted 2026-06-01 · 💻 cs.AI

Spatial Representation Learning Beyond Pixels: Unifying Raster Data and Vector Semantics for Human-Centric Geospatial Foundation Models

Pith reviewed 2026-06-28 14:21 UTC · model grok-4.3

classification 💻 cs.AI
keywords spatial representation learningearth observation foundation modelsraster datavector datamultimodal geospatial learningunified embedding spacegeospatial AI
0
0 comments X

The pith

Earth observation foundation models must integrate raster imagery with vector data in one embedding space to capture both physical patterns and human semantics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that current Earth Observation Foundation Models learn only from raster data such as satellite images and therefore miss the explicit geometry, topology, and semantics carried by vector sources like OpenStreetMap. Raster data supplies continuous physical and spectral information while vector data supplies discrete objects and relational structure that often reflect human rather than purely physical systems. Existing methods handle the two modalities separately and rely on lossy conversions between them. The authors therefore advocate joint Spatial Representation Learning inside a single embedding space that aligns raster perception with vector reasoning. This shift is presented as necessary for building geospatial AI that is more accurate, interpretable, and grounded in the full structure of geographic space.

Core claim

Raster and vector data represent complementary views of geographic space: raster data captures continuous physical and spectral patterns, while vector data encodes discrete objects and their relational structure and often represents more of the human rather than the physical systems. Existing geospatial representation learning paradigms treat these modalities in isolation, relying on imperfect and often lossy transformations to bridge them. The paper calls for a paradigm shift toward joint Spatial Representation Learning in a unified embedding space that integrates raster perception with vector-based reasoning, contending that such integration is essential for developing next-generation geos

What carries the argument

The unified embedding space for Spatial Representation Learning (SRL) that aligns raster perception with vector-based reasoning without intermediate lossy transformations.

If this is right

  • Models will incorporate explicit topology and semantic relationships that are currently ambiguous in imagery alone.
  • Human-centric layers such as demographic or infrastructure data will become native inputs rather than post-hoc overlays.
  • Open vector datasets will move from auxiliary sources to core training material for foundation models.
  • Downstream applications will gain interpretability because decisions can reference discrete objects and relations instead of pixel patterns alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same alignment strategy could be tested on tasks that combine imagery with administrative boundaries or social-media-derived points.
  • Success would reduce the need for manual GIS feature engineering when moving between satellite and map layers.
  • The approach might generalize to other paired continuous and discrete spatial data types, such as LiDAR point clouds paired with cadastral polygons.

Load-bearing premise

Raster and vector geospatial data can be aligned directly in a shared embedding space to produce measurable improvements over treating the modalities separately.

What would settle it

A side-by-side evaluation on several downstream geospatial tasks that shows no consistent accuracy or interpretability gain when models are trained on joint raster-vector embeddings versus separate raster-only and vector-only embeddings.

Figures

Figures reproduced from arXiv: 2606.02374 by Gengchen Mai, Hao Li, Konstantin Klemmer, Song Gao, Steffen Knoblauch, Wenwen Li.

Figure 1
Figure 1. Figure 1: From Silos to Synthesis: Toward human-centric geospatial foundation models through unified SRL across vector and raster data. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
read the original abstract

Earth Observation (EO) has fundamentally transformed the monitoring of environmental processes and human activities up to planetary scale. Recent advances in self-supervised learning have given rise to Earth Observation Foundation Models (EOFMs), which leverage petabyte-scale unlabeled EO data to learn transferable representations across a wide range of downstream geospatial tasks. Despite these advances, current EOFMs remain largely confined to raster modalities, overlooking the rich, structured information encoded in openly-accessible vector data sources such as OpenStreetMap and Overture. Vector data provides explicit and compact representations of geographic entities, including geometry, topology, and semantic relationships, offering critical contextual signals that are often ambiguous or inaccessible in imagery alone. Raster and vector data thus represent complementary views of geographic space: raster data captures continuous physical and spectral patterns, while vector data encodes discrete objects and their relational structure and often represents more of the human rather than the physical systems (e.g. social or demographic data). However, existing geospatial representation learning paradigms treat these modalities in isolation, relying on imperfect and often lossy transformations to bridge them. This perspective paper calls for a paradigm shift toward joint Spatial Representation Learning (SRL) in an unified embedding space that integrate raster perception with vector-based reasoning. Building on emerging efforts in multimodal geospatial learning, we highlight conceptual foundations, technical challenges, and promising directions for aligning heterogeneous spatial data sources. We contend that such integration is essential for developing next-generation geospatial AI systems capable of more accurate, interpretable, and semantically grounded understanding of the Earth.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. This perspective paper argues that current Earth Observation Foundation Models (EOFMs) are limited to raster modalities and advocates a paradigm shift to joint Spatial Representation Learning (SRL) that unifies raster imagery with vector data (e.g., from OpenStreetMap and Overture) in a single embedding space. It claims raster data captures continuous physical/spectral patterns while vector data supplies explicit geometry, topology, semantics, and human-centric relational structure, and that overcoming imperfect cross-modal transformations via unified representations will yield more accurate, interpretable, and semantically grounded geospatial AI.

Significance. If the proposed unification of raster and vector modalities proves feasible, the perspective identifies a meaningful gap in geospatial AI and could help steer the field toward models that better integrate physical and human systems. The manuscript correctly notes the complementarity of the two data types and the limitations of isolated or lossy bridging approaches, which may encourage future multimodal work even in the absence of new technical contributions here.

minor comments (2)
  1. [Abstract] Abstract: the statement that vector data 'often represents more of the human rather than the physical systems' would benefit from one or two concrete examples of the social/demographic attributes referenced.
  2. The manuscript refers to 'emerging efforts in multimodal geospatial learning' and 'promising directions' without naming specific prior works or outlining even high-level technical challenges; adding a short dedicated paragraph or table of cited efforts would improve grounding.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the supportive review, the accurate summary of our perspective paper, and the recommendation of minor revision. We are pleased that the complementarity of raster and vector modalities and the identified gap in current EOFMs are recognized as meaningful.

Circularity Check

0 steps flagged

No significant circularity: perspective piece with no derivations

full rationale

The manuscript is a perspective paper that advocates for joint Spatial Representation Learning integrating raster and vector modalities on conceptual grounds. It contains no equations, algorithms, fitted parameters, predictions, or first-principles derivations whose correctness depends on internal definitions or self-citations. The central claim is a normative call for paradigm shift rather than a technical result that could reduce to its own inputs by construction. No load-bearing steps exist that match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Perspective paper with no new technical claims, derivations, or introduced components; no free parameters, axioms, or invented entities are defined.

pith-pipeline@v0.9.1-grok · 5819 in / 944 out tokens · 24345 ms · 2026-06-28T14:21:19.826740+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 7 canonical work pages · 4 internal anchors

  1. [1]

    Pix2poly: A sequence prediction method for end-to-end polygonal building footprint extraction from remote sensing imagery

    Yeshwanth Kumar Adimoolam, Charalambos Poullis, and Melinos Averkiou. Pix2poly: A sequence prediction method for end-to-end polygonal building footprint extraction from remote sensing imagery. In2025 IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV), pages 8484–8493. IEEE, 2025. 3

  2. [2]

    Lubin Bai, Xiuyuan Zhang, Haoyu Wang, and Shihong Du. Integrating remote sensing with openstreetmap data for com- prehensive scene understanding through multi-modal self- supervised learning.Remote Sensing of Environment, 318: 114573, 2025. 2, 4, 5

  3. [3]

    Geolink: Empow- ering remote sensing foundation model with openstreetmap data

    Lubian Bai, Xiuyuan Zhang, Siqi Zhang, Zepeng Zhang, Haoyu Wang, Wei Qin, and Shihong Du. Geolink: Empow- ering remote sensing foundation model with openstreetmap data. 2025. 2, 4, 5

  4. [4]

    City foundation models for learning general purpose repre- sentations from openstreetmap

    Pasquale Balsebre, Weiming Huang, Gao Cong, and Yi Li. City foundation models for learning general purpose repre- sentations from openstreetmap. InProceedings of the 33rd ACM international conference on information and knowl- edge management, pages 87–97, 2024. 3

  5. [5]

    Bruinsma, Ana Lucic, Megan Stanley, Anna Allen, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan A

    Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Anna Allen, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan A. Weyn, Haiyu Dong, Jayesh K. Gupta, Kit Thambiratnam, Alexander T. Archibald, Chun- Chieh Wu, Elizabeth Heider, Max Welling, Richard E. Turner, and Paris Perdikaris. A foundation model for the earth system.Nature, 641...

  6. [6]

    AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data

    Christopher F. Brown et al. Alphaearth foundations: An embedding field model for accurate and efficient global mapping from sparse label data.arXiv preprint arXiv:2507.22291, 2025. 3

  7. [7]

    Geoclip: Clip-inspired alignment be- tween locations and images for effective worldwide geo- localization, 2023

    Vicente Vivanco Cepeda, Gaurav Kumar Nayak, and Mubarak Shah. Geoclip: Clip-inspired alignment be- tween locations and images for effective worldwide geo- localization, 2023. 5

  8. [8]

    S2vec: Self-supervised geospatial embed- dings for the built environment.ACM Transactions on Spa- tial Algorithms and Systems, 2026

    Shushman Choudhury, Chandrakumari Suvarna, Iveel Tsog- suren, Abdul Rahman Kreidieh, Elad Aharoni, Chun-Ta Lu, and Neha Arora. S2vec: Self-supervised geospatial embed- dings for the built environment.ACM Transactions on Spa- tial Algorithms and Systems, 2026. 3, 5

  9. [9]

    Geo2vec: Shape-and distance-aware neural representation of geospatial entities

    Chen Chu and Cyrus Shahabi. Geo2vec: Shape-and distance-aware neural representation of geospatial entities. InProceedings of the AAAI Conference on Artificial Intelli- gence, pages 18985–18993, 2026. 3

  10. [10]

    Satmae: Pre-training transformers for tem- poral and multi-spectral satellite imagery.Advances in Neu- ral Information Processing Systems, 35:197–211, 2022

    Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David Lobell, and Stefano Ermon. Satmae: Pre-training transformers for tem- poral and multi-spectral satellite imagery.Advances in Neu- ral Information Processing Systems, 35:197–211, 2022. 2

  11. [11]

    Bo- janowski

    Mikolaj Czerkawski, Marcin Kluczek, and J˛ edrzej S. Bo- janowski. Global and dense embeddings of earth: Major tom floating in the latent space. 2024. 3

  12. [12]

    Terrafm: A scalable foundation model for unified multisensor earth observation

    Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Muhammad Haris Khan, Rao Muham- mad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan, and Salman Khan. Terrafm: A scalable foundation model for unified multisensor earth observation. 2025. 3

  13. [13]

    Geobind: Binding text, image, and audio through satellite images

    Aayush Dhakal, Subash Khanal, Srikumar Sastry, Adeel Ah- mad, and Nathan Jacobs. Geobind: Binding text, image, and audio through satellite images. pages 2729–2733, 2024. 3

  14. [14]

    Range: Retrieval aug- mented neural fields for multi-resolution geo-embeddings

    Aayush Dhakal, Srikumar Sastry, Subash Khanal, Adeel Ah- mad, Eric Xing, and Nathan Jacobs. Range: Retrieval aug- mented neural fields for multi-resolution geo-embeddings

  15. [15]

    Climplicit: Climatic implicit embeddings for global ecological tasks

    Johannes Dollinger, Damien Robert, Elena Plekhanova, Lukas Drees, and Jan Dirk Wegner. Climplicit: Climatic implicit embeddings for global ecological tasks. 2025. 2

  16. [16]

    Learning to model the world: A survey of world models in artificial intelligence

    Jiahua Dong, Qi Lyu, Baichen Liu, Xudong Wang, Wenqi Liang, Duzhen Zhang, Jiahang Tu, Hongliu Li, Hanbin Zhao, Henghui Ding, et al. Learning to model the world: A survey of world models in artificial intelligence. 2026. 6

  17. [17]

    Rareflow: Physics-aware flow-matching for cross-sensor super-resolution of rare-earth features.arXiv preprint arXiv:2510.23816, 2025

    Forouzan Fallah, Wenwen Li, Chia-Yu Hsu, Hyunho Lee, and Yezhou Yang. Rareflow: Physics-aware flow-matching for cross-sensor super-resolution of rare-earth features.arXiv preprint arXiv:2510.23816, 2025. 6

  18. [18]

    Asynchronous Remote Sensing Time-Series Fusion for Cloud Removal and Anytime Reconstruction

    Forouzan Fallah, Chia Yu Hsu, Wenwen Li, Anna Liljedahl, and Yezhou Yang. Asynchronous remote sensing time-series fusion for cloud removal and anytime reconstruction.arXiv preprint arXiv:2605.27726, 2026. 6

  19. [19]

    TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

    Zhengpeng Feng, Clement Atzberger, Sadiq Jaffer, Jo- vana Knezevic, Silja Sormunen, Robin Young, Made- line C. Lisaius, Markus Immitzer, Toby Jackson, James 7 Ball, et al. Tessera: Temporal embeddings of surface spec- tra for earth representation and analysis.arXiv preprint arXiv:2506.20380, 2025. 3

  20. [20]

    Thor: A versatile foundation model for earth observation climate and society applications.arXiv preprint arXiv:2601.16011, 2026

    Theodor Forgaard, Jarle H Reksten, Anders U Waldeland, Valerio Marsocci, Nicolas Longépé, Michael Kampffmeyer, and Arnt-Børre Salberg. Thor: A versatile foundation model for earth observation climate and society applications.arXiv preprint arXiv:2601.16011, 2026. 3

  21. [21]

    A survey of uncertainty in deep neural networks.Artificial Intelligence Review, 56, 2023

    Jakob Gawlikowski et al. A survey of uncertainty in deep neural networks.Artificial Intelligence Review, 56, 2023. 6

  22. [22]

    Belenguer-Plomer, Kennedy Adriko, Paolo Fraccaro, Romeo Kienzler, Rania Briq, Sab- rina Benassou, Michele Lazzarini, and Conrad M

    Carlos Gomes, Isabelle Wittmann, Damien Robert, Johannes Jakubik, Tim Reichelt, Stefano Maurogiovanni, Rikard Vinge, Jonas Hurst, Erik Scheurer, Rocco Sedona, Thomas Brunschwiler, Stefan Kesselheim, Matej Bati ˇc, Philip Stier, Jan Dirk Wegner, Gabriele Cavallaro, Edzer Pebesma, Michael Marszalek, Miguel A. Belenguer-Plomer, Kennedy Adriko, Paolo Fraccaro...

  23. [23]

    Shrug-fm: Reliability-aware foundation models for earth observation

    Maria Gonzalez-Calabuig, Kai-Hendrik Cohrs, Vishal Nedungadi, Zuzanna Osika, Ruben Cartuyvels, Steffen Knoblauch, Joppe Massant, Shruti Nath, Patrick Ebel, and Vasileios Sitokonstantinou. Shrug-fm: Reliability-aware foundation models for earth observation. 6

  24. [24]

    Reasoning with language model is planning with world model

    Shibo Hao, Yi Gu, Haodi Ma, Joshua Hong, Zhen Wang, Daisy Wang, and Zhiting Hu. Reasoning with language model is planning with world model. InProceedings of the 2023 Conference on Empirical Methods in Natural Lan- guage Processing, pages 8154–8173, 2023. 6

  25. [25]

    Spectralgpt: Spectral remote sensing foun- dation model.IEEE transactions on pattern analysis and machine intelligence, 46(8):5227–5244, 2024

    Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xi- uping Jia, et al. Spectralgpt: Spectral remote sensing foun- dation model.IEEE transactions on pattern analysis and machine intelligence, 46(8):5227–5244, 2024. 3, 5

  26. [26]

    Terramind: Large-scale generative mul- timodality for earth observation

    Johannes Jakubik, Felix Yang, Benedikt Blumenstiel, Erik Scheurer, Rocco Sedona, Stefano Maurogiovanni, Jente Bosmans, Nikolaos Dionelis, Valerio Marsocci, Niklas Kopp, Rahul Ramachandran, Paolo Fraccaro, Thomas Brun- schwiler, Gabriele Cavallaro, Juan Bernabe-Moreno, and Nicolas Longépé. Terramind: Large-scale generative mul- timodality for earth observa...

  27. [27]

    Yuhan Ji, Song Gao, Ying Nie, Ivan Maji ´c, and Krzysztof Janowicz. Foundation models for geospatial reasoning: as- sessing the capabilities of large language models in under- standing geometries and topological spatial relations.Inter- national Journal of Geographical Information Science, 39 (9):1866–1903, 2025. 4

  28. [28]

    Satclip: Global, general- purpose location embeddings with satellite imagery.Pro- ceedings of the AAAI Conference on Artificial Intelligence, 39(4):4347–4355, 2025

    Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, and Marc Rußwurm. Satclip: Global, general- purpose location embeddings with satellite imagery.Pro- ceedings of the AAAI Conference on Artificial Intelligence, 39(4):4347–4355, 2025. 2, 3, 5, 6

  29. [29]

    Geoai for large-scale image analysis and machine vision: Recent progress of artificial intelligence in geography.ISPRS International Journal of Geo-Information, 11(7):385, 2022

    Wenwen Li and Chia-Yu Hsu. Geoai for large-scale image analysis and machine vision: Recent progress of artificial intelligence in geography.ISPRS International Journal of Geo-Information, 11(7):385, 2022. 6

  30. [30]

    Beyond alphaearth: Toward human-centered geospatial foundation models via poi-guided contrastive learning

    Junyuan Liu, Quan Qin, Guangsheng Dong, Xinglei Wang, Jiazhuang Feng, Zichao Zeng, and Tao Cheng. Beyond alphaearth: Toward human-centered geospatial foundation models via poi-guided contrastive learning. 2025. 2, 3, 6

  31. [31]

    GAIR: Location-aware self-supervised con- trastive pre-training with geo-aligned implicit representa- tions.ISPRS Journal of Photogrammetry and Remote Sens- ing, 237:166–182, 2026

    Zeping Liu, Ni Lao, Zhangyu Wang, Junfeng Jiao, and Gengchen Mai. GAIR: Location-aware self-supervised con- trastive pre-training with geo-aligned implicit representa- tions.ISPRS Journal of Photogrammetry and Remote Sens- ing, 237:166–182, 2026. 5

  32. [32]

    Towards general-purpose representation learning of polygonal geometries.GeoInformatica, 27(2):289–340,

    Gengchen Mai, Chiyu Jiang, Weiwei Sun, Rui Zhu, Yao Xuan, Ling Cai, Krzysztof Janowicz, Stefano Ermon, and Ni Lao. Towards general-purpose representation learning of polygonal geometries.GeoInformatica, 27(2):289–340,

  33. [33]

    Csp: Self-supervised contrastive spatial pre- training for geospatial-visual representations

    Gengchen Mai, Ni Lao, Yutong He, Jiaming Song, and Ste- fano Ermon. Csp: Self-supervised contrastive spatial pre- training for geospatial-visual representations. 2023. 5

  34. [34]

    On the opportunities and challenges of foundation mod- els for geoai (vision paper).ACM Transactions on Spatial Algorithms and Systems, 10(2):1–46, 2024

    Gengchen Mai, Weiming Huang, Jin Sun, Suhang Song, Deepak Mishra, Ninghao Liu, Song Gao, Tianming Liu, Gao Cong, Yingjie Hu, Chris Cundy, Ziyuan Li, Rui Zhu, and Ni Lao. On the opportunities and challenges of foundation mod- els for geoai (vision paper).ACM Transactions on Spatial Algorithms and Systems, 10(2):1–46, 2024. 2, 4

  35. [35]

    Srl: Towards a general-purpose framework for spatial representation learn- ing

    Gengchen Mai, Xiaobai Yao, Yiqun Xie, Jinmeng Rao, Hao Li, Qing Zhu, Ziyuan Li, and Ni Lao. Srl: Towards a general-purpose framework for spatial representation learn- ing. InProceedings of the 32nd ACM international confer- ence on advances in geographic information systems, pages 465–468, 2024. 2, 3, 4, 5

  36. [36]

    Towards the next generation of geospatial artificial in- telligence.International Journal of Applied Earth Observa- tion and Geoinformation, 136:104368, 2025

    Gengchen Mai, Yiqun Xie, Xiaowei Jia, Ni Lao, Jinmeng Rao, Qing Zhu, Zeping Liu, Yao-Yi Chiang, and Junfeng Jiao. Towards the next generation of geospatial artificial in- telligence.International Journal of Applied Earth Observa- tion and Geoinformation, 136:104368, 2025. 2

  37. [37]

    Pangaea: A global and inclusive benchmark for geospatial foundation models

    Valerio Marsocci, Yuru Jia, Georges Le Bellier, David Kerekes, Liang Zeng, Sebastian Hafner, Sebastian Gerard, Eric Brune, Ritu Yadav, Ali Shibli, Heng Fang, Yifang Ban, Maarten Vergauwen, Nicolas Audebert, and Andrea Nascetti. Pangaea: A global and inclusive benchmark for geospatial foundation models. 2024. 6

  38. [38]

    Reed, Ritwik Gupta, Shufan Li, Sarah Brock- man, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell

    Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brock- man, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell. Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning. 2022. 2, 3

  39. [39]

    Ricardo Ribeiro, Alina Trifan, and António JR Neves. A deep learning approach for transportation mode identifica- tion using a transformation of gps trajectory data features into an image representation.International Journal of Data Science and Analytics, 20(2):1023–1032, 2025. 4

  40. [40]

    Geographic location encoding with spherical harmonics and sinusoidal representation networks

    Marc Rußwurm, Konstantin Klemmer, Esther Rolf, Robin Zbinden, and Devis Tuia. Geographic location encoding with spherical harmonics and sinusoidal representation networks

  41. [41]

    Taxabind: A unified embedding 8 space for ecological applications

    Srikumar Sastry, Subash Khanal, Aayush Dhakal, Adeel Ah- mad, and Nathan Jacobs. Taxabind: A unified embedding 8 space for ecological applications. In2025 IEEE/CVF Win- ter Conference on Applications of Computer Vision (WACV), pages 1765–1774. IEEE, 2025. 5

  42. [42]

    Johannes Schmude, Sujit Roy, Will Trojak, Johannes Jaku- bik, Daniel Salles Civitarese, Shraddha Singh, Julian Kuehn- ert, Kumar Ankur, Aman Gupta, Christopher E. Phillips, Romeo Kienzler, Daniela Szwarcman, Vishal Gaur, Rajat Shinde, Rohit Lal, Arlindo Da Silva, Jorge Luis Guevara Diaz, Anne Jones, Simon Pfreundschuh, Amy Lin, Aditi Sheshadri, Udaysankar...

  43. [43]

    Exebench: Benchmarking foundation models on extreme earth events

    Shan Zhao, Zhitong Xiong, Jie Zhao, and Xiao Xiang Zhu. Exebench: Benchmarking foundation models on extreme earth events. 2025. 6

  44. [44]

    Poly2vec: Polymorphic fourier-based encoding of geospatial objects for geoai applications.arXiv preprint arXiv:2408.14806, 2024

    Maria Despoina Siampou, Jialiang Li, John Krumm, Cyrus Shahabi, and Hua Lu. Poly2vec: Polymorphic fourier-based encoding of geospatial objects for geoai applications.arXiv preprint arXiv:2408.14806, 2024. 3

  45. [45]

    Prithvi-eo-2.0: A versatile multi-temporal foun- dation model for earth observation applications.IEEE Trans- actions on Geoscience and Remote Sensing, 2025

    Daniela Szwarcman, Sujit Roy, Paolo Fraccaro, Benedikt Blumenstiel, Rinki Ghosal, Pedro Henrique de Oliveira, Joao Lucas de Sousa Almeida, Rocco Sedona, Yanghui Kang, et al. Prithvi-eo-2.0: A versatile multi-temporal foun- dation model for earth observation applications.IEEE Trans- actions on Geoscience and Remote Sensing, 2025. 3, 6

  46. [46]

    Green, Evan Shelhamer, Hannah Kerner, and David Rolnick

    Gabriel Tseng, Anthony Fuller, Marlena Reil, Henry Herzog, Patrick Beukema, Favyen Bastani, James R. Green, Evan Shelhamer, Hannah Kerner, and David Rolnick. Galileo: Learning global & local features of many remote sensing modalities. 2025. 3

  47. [47]

    Vargas-Munoz, Shivangi Srivastava, Devis Tuia, and Alexandre X

    John E. Vargas-Munoz, Shivangi Srivastava, Devis Tuia, and Alexandre X. Falcao. Openstreetmap: Challenges and op- portunities in machine learning and remote sensing.IEEE Geoscience and Remote Sensing Magazine, 9(1):184–199,

  48. [48]

    Siqin Wang, Tao Hu, Huang Xiao, Yun Li, Ce Zhang, Huan Ning, Rui Zhu, Zhenlong Li, and Xinyue Ye. Gpt, large language models (llms) and generative artificial intelligence (gai) models in geospatial science: a systematic review.In- ternational Journal of Digital Earth, 17(1):2353122, 2024. 3, 5

  49. [49]

    Stewart, Thomas Dujardin, Nikolaos Ioannis Bountos, Angelos Za- vras, Franziska Gerken, Ioannis Papoutsis, Laura Leal-Taixé, and Xiao Xiang Zhu

    Yi Wang, Zhitong Xiong, Chenying Liu, Adam J. Stewart, Thomas Dujardin, Nikolaos Ioannis Bountos, Angelos Za- vras, Franziska Gerken, Ioannis Papoutsis, Laura Leal-Taixé, and Xiao Xiang Zhu. Towards a unified copernicus founda- tion model for earth vision. 2025. 3

  50. [50]

    Skyscript: A large and seman- tically diverse vision-language dataset for remote sensing

    Zhecheng Wang, Rajanie Prabha, Tianyuan Huang, Jiajun Wu, and Ram Rajagopal. Skyscript: A large and seman- tically diverse vision-language dataset for remote sensing

  51. [51]

    Torchspatial: A location encoding framework and benchmark for spatial representation learn- ing

    Nemin Wu, Qian Cao, Zhangyu Wang, Zeping Liu, Yan- lin Qi, Jielu Zhang, Joshua Ni, Xiaobai Yao, Hongxu Ma, Lan Mu, Stefano Ermon, Tanuja Ganu, Akshay Nambi, Ni Lao, and Gengchen Mai. Torchspatial: A location encoding framework and benchmark for spatial representation learn- ing. 2024. 5, 6

  52. [52]

    Reobench: Benchmarking ro- bustness of earth observation foundation models

    Xiang Li, Yong Tao, Siyuan Zhang, Siwei Liu, Zhitong Xiong, Chunbo Luo, Lu Liu, Mykola Pechenizkiy, Xiao Xi- ang Zhu, and Tianjin Huang. Reobench: Benchmarking ro- bustness of earth observation foundation models. 2025. 6

  53. [53]

    Fairness by “where”: A statistically-robust and model-agnostic bi-level learning framework

    Yiqun Xie, Erhu He, Xiaowei Jia, Weiye Chen, Sergii Skakun, Han Bao, Zhe Jiang, Rahul Ghosh, and Praveen Ravirathinam. Fairness by “where”: A statistically-robust and model-agnostic bi-level learning framework. InPro- ceedings of the AAAI conference on artificial intelligence, pages 12208–12216, 2022. 5

  54. [54]

    Stewart, Jie Zhao, Nils Lehmann, Thomas Dujardin, Zhenghang Yuan, Pedram Ghamisi, and Xiao Xiang Zhu

    Zhitong Xiong, Yi Wang, Weikang Yu, Adam J. Stewart, Jie Zhao, Nils Lehmann, Thomas Dujardin, Zhenghang Yuan, Pedram Ghamisi, and Xiao Xiang Zhu. Dofa-clip: Multi- modal vision-language foundation models for earth observa- tion. 2025. 3

  55. [55]

    Poly- gongnn: Representation learning for polygonal geometries with heterogeneous visibility graph

    Dazhou Yu, Yuntong Hu, Yun Li, and Liang Zhao. Poly- gongnn: Representation learning for polygonal geometries with heterogeneous visibility graph. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining, pages 4012–4022, 2024. 4

  56. [56]

    Ot on the map: Quantify- ing domain shifts in geographic space

    Haoran Zhang, Livia Betti, Konstantin Klemmer, Esther Rolf, and David Alvarez-Melis. Ot on the map: Quantify- ing domain shifts in geographic space. 2026. 6

  57. [57]

    Eulerian neural network informed by chemical transport for air quality forecasting

    Xukai Zhang, Shuliang Wang, Guangyin Jin, Ziqiang Yuan, Hanning Yuan, and Sijie Ruan. Eulerian neural network informed by chemical transport for air quality forecasting. Advances in Neural Information Processing Systems, 38: 33576–33601, 2026. 4

  58. [58]

    Skysense v2: A unified foundation model for multi-modal remote sensing

    Yingying Zhang, Lixiang Ru, Kang Wu, Lei Yu, Lei Liang, Yansheng Li, and Jingdong Chen. Skysense v2: A unified foundation model for multi-modal remote sensing. 2025. 3

  59. [59]

    A satellite foundation model for improved wealth monitoring

    Zhuo Zheng, Iván Higuera-Mendieta, Richard Lee, David Newhouse, Talip Kilic, Stefano Ermon, Marshall Burke, and David B Lobell. A satellite foundation model for improved wealth monitoring.arXiv preprint arXiv:2604.23166, 2026. 6

  60. [60]

    Zhiyong Zhou, Cheng Fu, and Robert Weibel. Spagan: A spatially-aware generative adversarial network for building generalization in image maps.International Journal of Ap- plied Earth Observation and Geoinformation, 135:104236,