arxiv: 2605.12542 · v1 · submitted 2026-05-09 · 🌌 astro-ph.IM · astro-ph.EP· cs.LG

Recognition: no theorem link

Earth Science Foundation Models: From Perception to Reasoning and Discovery

Xiangyu Zhao , Bo Liu , Yuehan Zhang , Zelin Song , Wanghan Xu , Feng Liu , Fengxiang Wang , Ben Fei

show 4 more authors

Fenghua Ling Wangxu Wei Wenlong Zhang Xiao-Ming Wu

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:04 UTC · model grok-4.3

classification 🌌 astro-ph.IM astro-ph.EPcs.LG

keywords Earth science foundation modelsmultimodal data integrationperception to reasoningEarth system applicationsdatasets and benchmarksagentic workflowsscientific discovery

0 comments

The pith

Foundation models integrate multimodal Earth data to advance from perception to reasoning and discovery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews how large foundation models combine multi-platform imagery, gridded reanalysis data, geophysical and geochemical observations, and domain text to handle tasks that range from basic perception to advanced scientific discovery in Earth science. It organizes the review along two axes: depth, which follows the progression of model capabilities from simple sensing to multimodal reasoning and agentic workflows, and breadth, which maps applications across the atmosphere, hydrosphere, lithosphere, biosphere, anthroposphere, cryosphere, and their coupled processes. The authors assemble more than 200 datasets and benchmarks to ground the survey and then identify open problems in data heterogeneity, scientific reliability, scalability, and the shift toward autonomous systems. A reader would care because the resulting structure supplies a clear map of where the technology currently stands and what steps are needed to produce trustworthy, actionable AI tools for studying Earth.

Core claim

Large foundation models are transforming Earth science by integrating heterogeneous multimodal data to support tasks ranging from basic perception to advanced scientific discovery. The review traces capability evolution along a depth dimension from perception to multimodal reasoning and agentic workflows, while surveying application breadth across Earth's major spheres and coupled processes. It compiles more than 200 datasets and benchmarks, discusses challenges of multimodal heterogeneity, scientific reliability, continual updating, scalability, and the move to agentic intelligence, and outlines directions toward integrated, trustworthy, and actionable AI Earth scientists.

What carries the argument

Two-dimensional review framework of depth (evolution of capabilities from perception through reasoning to agentic workflows) and breadth (applications across atmosphere, hydrosphere, lithosphere, biosphere, anthroposphere, cryosphere, and coupled Earth systems), used to organize models and datasets.

If this is right

Multimodal integration enables support for tasks from basic perception to advanced scientific discovery across Earth system components.
The compiled collection of more than 200 datasets and benchmarks supplies concrete resources for evaluating and advancing models.
Progress requires explicit attention to data heterogeneity, scientific reliability, and scalability.
The field should move from current foundation models toward agentic and embodied intelligence to produce more actionable Earth science tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The depth-breadth framework could be applied to test whether new models improve performance on coupled Earth system processes that cross multiple spheres.
The assembled benchmarks could be used to run controlled comparisons that measure how well current models handle continual updating with new observations.
Future agentic systems built on this roadmap might be evaluated by their ability to propose and verify hypotheses in specific domains such as extreme weather or ecosystem change.
Gaps revealed by the review could guide targeted data collection efforts for modalities or regions that are currently underrepresented.

Load-bearing premise

That the representative multimodal models chosen for review and the more than 200 compiled datasets and benchmarks are comprehensive and unbiased enough to support a reliable unified roadmap for the whole field.

What would settle it

Identification of one or more major Earth foundation models or key datasets and benchmarks omitted from the compilation whose inclusion would change the stated challenges or the proposed future directions.

Figures

Figures reproduced from arXiv: 2605.12542 by Ben Fei, Bo Liu, Fenghua Ling, Feng Liu, Fengxiang Wang, Wanghan Xu, Wangxu Wei, Wenlong Zhang, Xiangyu Zhao, Xiao-Ming Wu, Yuehan Zhang, Zelin Song.

**Figure 2.** Figure 2: Taxonomy of AI models for Earth science across the three stages of [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of atmosphere science data type, foundation models and applications. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of biosphere science data type, foundation models and applications. [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Illustration of anthroposphere science data type, foundation models and applications. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Illustration of lithosphere science data type, foundation models and applications. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Illustration of hydrosphere&cryosphere science data type, foundation models and applications. [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

read the original abstract

Large foundation models (FMs) are transforming Earth science by integrating heterogeneous multimodal data, such as multi-platform imagery, gridded reanalysis data, diverse geophysical and geochemical observations, and domain-specific text, to support tasks ranging from basic perception to advanced scientific discovery. This paper provides a unified review of Earth science foundation models (Earth FMs) through two complementary dimensions: depth, which traces the evolution of model capabilities from perception to multimodal reasoning and agentic scientific workflows, and breadth, which summarizes their expanding applications across the atmosphere, hydrosphere, lithosphere, biosphere, anthroposphere, and cryosphere, as well as coupled Earth system processes. Using this framework, we review representative multimodal Earth foundation models and compile more than 200 datasets and benchmarks spanning diverse Earth science tasks and modalities. We further discuss key challenges in multimodal data heterogeneity, scientific reliability and continual updating, scalability and sustainability, and the transition from foundation models to agentic and embodied Earth intelligence, and outline future directions toward more integrated, trustworthy, and actionable AI Earth scientists. Overall, this paper offers a structured roadmap for understanding the development of Earth foundation models from both capability depth and application breadth.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward survey that maps Earth science foundation models along a depth axis from perception to reasoning and a breadth axis across Earth spheres, plus a big table of over 200 datasets.

read the letter

The paper's main move is to split the literature into depth (how models move from basic perception through multimodal reasoning to agentic workflows) and breadth (applications in atmosphere, hydrosphere, lithosphere, etc.). It then lists more than 200 datasets and benchmarks. That compilation is the concrete thing a reader can actually use; the framework itself is mostly an organizing lens rather than a new theoretical claim. The writing is clear enough that someone new to the intersection of AI and Earth science can get a quick map of who is doing what and which data sources exist. No new model, no new derivation, and no fresh performance numbers appear. The authors simply synthesize what is already published. That is fine for a survey, but it means the value sits entirely in how complete and unbiased the dataset list turns out to be. If the selection misses important recent work or over-represents certain modalities, the roadmap loses reliability. The abstract gives no error analysis or verification steps, so a referee would need to check the full tables against the cited sources. The challenges section (data heterogeneity, reliability, scalability) is standard and does not break new ground. This paper is for people who want an entry point or a reference list rather than for specialists looking for a technical advance. A serious editor should send it to review because a well-curated dataset compilation in this fast-moving area saves other researchers time, even if the framework is not original. I would not cite it for any empirical result, but I might point a student to the dataset table.

Referee Report

1 major / 2 minor

Summary. The manuscript provides a unified review of Earth science foundation models (Earth FMs) structured along two dimensions: depth, which traces the evolution from perception tasks through multimodal reasoning to agentic scientific workflows, and breadth, which covers applications across the atmosphere, hydrosphere, lithosphere, biosphere, anthroposphere, cryosphere, and coupled Earth system processes. It reviews representative multimodal models, compiles more than 200 datasets and benchmarks, discusses challenges in data heterogeneity, scientific reliability, scalability, and the transition to agentic intelligence, and outlines future directions for integrated, trustworthy AI systems in Earth science.

Significance. If the compilation of models and datasets proves representative, the survey could offer a useful roadmap for researchers working on multimodal AI for Earth science by synthesizing progress from basic perception to discovery-oriented tasks. The framework helps organize a rapidly growing literature and flags important practical challenges such as continual updating and trustworthiness, which are relevant for scientific applications. As a review without new empirical results or derivations, its value rests entirely on the accuracy and balance of the selected examples and tabulated resources.

major comments (1)

Abstract: The central claim that the review compiles more than 200 datasets and benchmarks to support a 'reliable unified roadmap' is load-bearing, yet the manuscript provides no explicit selection criteria, inclusion/exclusion rules, or search methodology for choosing the representative models and datasets. Without this, readers cannot assess completeness or bias.

minor comments (2)

The abstract and introduction could include a short table or figure summarizing the depth/breadth taxonomy to improve readability before the detailed sections.
Ensure that all cited models and datasets in the review sections are cross-referenced to the compiled tables for easy lookup.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive feedback. We address the single major comment below and will incorporate revisions to enhance the transparency of our survey methodology.

read point-by-point responses

Referee: Abstract: The central claim that the review compiles more than 200 datasets and benchmarks to support a 'reliable unified roadmap' is load-bearing, yet the manuscript provides no explicit selection criteria, inclusion/exclusion rules, or search methodology for choosing the representative models and datasets. Without this, readers cannot assess completeness or bias.

Authors: We agree that the absence of explicit selection criteria limits the ability to evaluate the survey's scope and potential biases. In the revised manuscript, we will insert a new subsection (Section 2.1, 'Review Scope and Methodology') immediately following the introduction. This subsection will detail: (1) the literature search protocol (keywords such as 'Earth foundation model', 'multimodal Earth AI', 'geospatial foundation model' queried on arXiv, Google Scholar, and major conferences from 2020–2024); (2) inclusion criteria (peer-reviewed or high-quality preprint works describing multimodal models with at least two Earth-science modalities, or datasets with documented public availability and task annotations); (3) exclusion criteria (purely single-modality perception models, non-public datasets, or works lacking sufficient technical detail); and (4) our approach to ensuring representativeness across the depth (perception-to-agentic) and breadth (six Earth spheres plus coupled processes) dimensions. We will also add a short sentence in the abstract referencing this methodology section. These changes will allow readers to assess completeness while preserving the review's focus on representative rather than exhaustive coverage. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

This manuscript is a survey paper that reviews existing Earth science foundation models and compiles more than 200 external datasets and benchmarks using a depth/breadth organizational framework. It presents no new derivations, equations, predictions, or first-principles results whose validity depends on quantities defined inside the paper itself. All cited models, data sources, and benchmarks originate from prior external literature, and the paper's structure functions purely as an organizing roadmap rather than a self-referential chain. No load-bearing steps reduce by construction to fitted inputs or self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The review rests on standard AI definitions of foundation models and multimodal integration; no free parameters or new entities are introduced.

axioms (1)

domain assumption Earth foundation models can be meaningfully categorized along a depth axis from perception to reasoning and a breadth axis across Earth system spheres
This two-dimensional framework is used to structure the entire review and roadmap as stated in the abstract.

pith-pipeline@v0.9.0 · 5543 in / 1353 out tokens · 57526 ms · 2026-05-14T22:04:45.908253+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

299 extracted references · 299 canonical work pages · 2 internal anchors

[1]

Artificial intelligence for geoscience: Progress, challenges, and perspectives,

T. Zhao, S. Wang, C. Ouyang, M. Chen, C. Liu, J. Zhang, L. Yu, F. Wang, Y. Xie, J. Liet al., “Artificial intelligence for geoscience: Progress, challenges, and perspectives,”The Innovation, vol. 5, no. 5, 2024. 1, 2, 3, 15

work page 2024
[2]

Aurora: A foundation model of the atmosphere,

C. Bodnar, “Aurora: A foundation model of the atmosphere,” in AGU Fall Meeting Abstracts, vol. 2024, 2024, pp. GC21C–03. 1, 4, 6, 8, 11

work page 2024
[3]

Pangu- weather: A 3d high-resolution model for fast and accurate global weather forecast,

K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, “Pangu- weather: A 3d high-resolution model for fast and accurate global weather forecast,”arXiv preprint arXiv:2211.02556, 2022. 1, 4, 6, 8, 11

work page arXiv 2022
[4]

Citygpt: Towards urban iot learning, analysis and interaction with multi-agent system,

Q. Guan, J. Ouyang, D. Wu, and W. Yu, “Citygpt: Towards urban iot learning, analysis and interaction with multi-agent system,” arXiv preprint arXiv:2405.14691, 2024. 1, 11, 14

work page arXiv 2024
[5]

Graphcast: Ai model for faster and more accurate global weather forecasting,

R. Lam, R. Pascanu, M. Puigdomènech Gimenez, S. Agrawal, C. Dapogny, M. Schmidt, T. Keck, M. Mudigonda, P . Brutlag, J. Wanget al., “Graphcast: Ai model for faster and more accurate global weather forecasting,”Science, vol. 382, pp. 1416–1421, 2023. 1, 6, 8, 11

work page 2023
[6]

S-clip: Semi-supervised vision-language learning using few specialist captions,

S. Mo, M. Kim, K. Lee, and J. Shin, “S-clip: Semi-supervised vision-language learning using few specialist captions,”Advances in Neural Information Processing Systems, vol. 36, pp. 61 187–61 212,

work page
[7]

Marinedet: Towards open-marine object detection,

L. Haixin, Z. Ziqiang, M. Zeyu, and S.-K. Yeung, “Marinedet: Towards open-marine object detection,”arXiv preprint arXiv:2310.01931, 2023. 1, 6, 11

work page arXiv 2023
[8]

Trs: Transformers for remote sensing scene classification,

J. Zhang, H. Zhao, and J. Li, “Trs: Transformers for remote sensing scene classification,”Remote Sensing, vol. 13, no. 20, p. 4143, 2021. 1, 4, 6

work page 2021
[9]

Climateagents: A multi-agent research assis- tant for social-climate dynamics analysis,

S. Shan, “Climateagents: A multi-agent research assis- tant for social-climate dynamics analysis,”arXiv preprint arXiv:2603.13840, 2026. 1, 19

work page arXiv 2026
[10]

Openearthagent: A unified framework for tool-augmented geospatial agents,

A. Shabbir, M. U. Sheikh, M. A. Munir, H. Debary, M. Fiaz, M. Z. Zaheer, P . Fraccaro, F. S. Khan, M. H. Khan, X. X. Zhu et al., “Openearthagent: A unified framework for tool-augmented geospatial agents,”arXiv preprint arXiv:2602.17665, 2026. 1, 2, 11, 19

work page arXiv 2026
[11]

Prithvi wxc: Foundation model for weather and climate,

J. Schmude, S. Roy, W. Trojak, J. Jakubik, D. S. Civitarese, S. Singh, J. Kuehnert, K. Ankur, A. Gupta, C. E. Phillipset al., “Prithvi wxc: Foundation model for weather and climate,”arXiv preprint arXiv:2409.13598, 2024. 1, 4, 6, 8, 11

work page arXiv 2024
[12]

Terramind: Large-scale generative multimodality for earth ob- servation,

J. Jakubik, F. Yang, B. Blumenstiel, E. Scheurer, R. Sedona, S. Mau- rogiovanni, J. Bosmans, N. Dionelis, V . Marsocci, N. Koppet al., “Terramind: Large-scale generative multimodality for earth ob- servation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 7383–7394. 1, 2, 11, 18

work page 2025
[13]

GISclaw: A Comprehensive Open-Source LLM Agent System for Realistic Multi-Step Geospatial Analysis

J. Han, J. Lee, Y. Shim, J. Kim, and J.-J. Lee, “Gisclaw: An open-source llm-powered agent system for full-stack geospatial analysis,”arXiv preprint arXiv:2603.26845, 2026. 1, 19

work page internal anchor Pith review Pith/arXiv arXiv 2026
[14]

Earthlink: A self-evolving ai agent for climate science,

Z. Guo, J. Wang, X. Yue, W. Wei, Z. Jiang, W. Xu, B. Fei, W. Zhang, X. Gu, L. Chenget al., “Earthlink: A self-evolving ai agent for climate science,”arXiv e-prints, pp. arXiv–2507, 2025. 1, 19

work page 2025
[15]

Towards vision-language geo-foundation model: A survey. arxiv 2024,

Y. Zhou, L. Feng, Y. Ke, X. Jiang, J. Yan, X. Yang, and W. Zhang, “Towards vision-language geo-foundation model: A survey. arxiv 2024,”arXiv preprint arXiv:2406.09385. 2, 3

work page arXiv 2024
[16]

Foundation models for remote sensing and earth observation: A survey,

A. Xiao, W. Xuan, J. Wang, J. Huang, D. Tao, S. Lu, and N. Yokoya, “Foundation models for remote sensing and earth observation: A survey,”IEEE Geoscience and Remote Sensing Magazine, 2025. 2, 3

work page 2025
[17]

On the foundations of earth foundation models,

X. X. Zhu, Z. Xiong, Y. Wang, A. J. Stewart, K. Heidler, Y. Wang, Z. Yuan, T. Dujardin, Q. Xu, and Y. Shi, “On the foundations of earth foundation models,”Communications Earth & Environment,

work page
[18]

A hierarchical multi-agent system for au- tonomous discovery in geoscientific data archives,

D. Pantiukhin, I. Kuznetsov, B. Shapkin, A. A. Jost, T. Jung, and N. Koldunov, “A hierarchical multi-agent system for au- tonomous discovery in geoscientific data archives,”arXiv preprint arXiv:2602.21351, 2026. 2, 11

work page arXiv 2026
[19]

Foun- dation models in remote sensing: Evolving from unimodality to multimodality,

D. Hong, C. Li, X. Li, G. Camps-Valls, and J. Chanussot, “Foun- dation models in remote sensing: Evolving from unimodality to multimodality,”IEEE Geoscience and Remote Sensing Magazine,

work page
[20]

Towards urban general intelligence: A review and outlook of urban foundation models,

W. Zhang, J. Han, Z. Xu, H. Ni, T. Lyu, H. Liu, and H. Xiong, “Towards urban general intelligence: A review and outlook of urban foundation models,”arXiv preprint arXiv:2402.01749, 2024. 3

work page arXiv 2024
[21]

Two-stream swin transformer with differentiable sobel operator for remote sensing image classification,

S. Hao, B. Wu, K. Zhao, Y. Ye, and W. Wang, “Two-stream swin transformer with differentiable sobel operator for remote sensing image classification,”Remote Sensing, vol. 14, no. 6, p. 1507, 2022. 4, 6

work page 2022
[22]

Homo– heterogenous transformer learning framework for rs scene clas- sification,

J. Ma, M. Li, X. Tang, X. Zhang, F. Liu, and L. Jiao, “Homo– heterogenous transformer learning framework for rs scene clas- sification,”IEEE Journal of Selected Topics in Applied Earth Observa- tions and Remote Sensing, vol. 15, pp. 2223–2239, 2022. 4

work page 2022
[23]

Transformer with transfer cnn for remote-sensing-image object detection,

Q. Li, Y. Chen, and Y. Zeng, “Transformer with transfer cnn for remote-sensing-image object detection,”Remote Sensing, vol. 14, no. 4, p. 984, 2022. 4

work page 2022
[24]

Gansformer: A detection network for aerial images with high performance com- bining convolutional network and transformer,

Y. Zhang, X. Liu, S. Wa, S. Chen, and Q. Ma, “Gansformer: A detection network for aerial images with high performance com- bining convolutional network and transformer,”Remote Sensing, vol. 14, no. 4, p. 923, 2022. 4

work page 2022
[25]

Adt-det: Adaptive dynamic refined single-stage transformer detector for arbitrary- oriented object detection in satellite optical imagery,

Y. Zheng, P . Sun, Z. Zhou, W. Xu, and Q. Ren, “Adt-det: Adaptive dynamic refined single-stage transformer detector for arbitrary- oriented object detection in satellite optical imagery,”Remote Sensing, vol. 13, no. 13, p. 2623, 2021. 4

work page 2021
[26]

Deep multiscale siamese network with parallel convolutional structure and self-attention for change detection,

Q. Guo, J. Zhang, S. Zhu, C. Zhong, and Y. Zhang, “Deep multiscale siamese network with parallel convolutional structure and self-attention for change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–12, 2021. 4

work page 2021
[27]

Resdeepd: A residual super- resolution network for deep downscaling of daily precipitation over india,

S. C. M. Sharma and A. Mitra, “Resdeepd: A residual super- resolution network for deep downscaling of daily precipitation over india,”Environmental Data Science, vol. 1, p. e19, 2022. 4

work page 2022
[28]

Downscal- ing multi-model climate projection ensembles with deep learning (deepesd): contribution to cordex eur-44,

J. Baño-Medina, R. Manzanas, E. Cimadevilla, J. Fernández, J. González-Abad, A. S. Cofiño, and J. M. Gutiérrez, “Downscal- ing multi-model climate projection ensembles with deep learning (deepesd): contribution to cordex eur-44,”Geoscientific Model Development Discussions, vol. 2022, pp. 1–14, 2022. 4

work page 2022
[29]

Inves- tigating two super-resolution methods for downscaling precipi- tation: Esrgan and car,

C. D. Watson, C. Wang, T. Lynar, and K. Weldemariam, “Inves- tigating two super-resolution methods for downscaling precipi- tation: Esrgan and car,”arXiv preprint arXiv:2012.01233, 2020. 4, 6

work page arXiv 2012
[30]

Fast and accurate learned multiresolution dynamical downscaling for precipitation,

J. Wang, Z. Liu, I. Foster, W. Chang, R. Kettimuthu, and V . R. Ko- tamarthi, “Fast and accurate learned multiresolution dynamical downscaling for precipitation,”Geoscientific Model Development Discussions, vol. 2021, pp. 1–24, 2021. 4

work page 2021
[31]

Adversarial super-resolution of climatological wind and solar data,

K. Stengel, A. Glaws, D. Hettinger, and R. N. King, “Adversarial super-resolution of climatological wind and solar data,”Proceed- ings of the National Academy of Sciences, vol. 117, no. 29, pp. 16 805– 16 815, 2020. 4, 6

work page 2020
[32]

A deconvolution technology of microwave radiometer data using convolutional neural networks,

W. Hu, W. Zhang, S. Chen, X. Lv, D. An, and L. Ligthart, “A deconvolution technology of microwave radiometer data using convolutional neural networks,”Remote Sensing, vol. 10, no. 2, p. 275, 2018. 4

work page 2018
[33]

Diffsr: Learning radar reflectivity synthesis via diffusion model from satellite observations,

X. He, Z. Zhou, W. Zhang, X. Zhao, H. Chen, S. Chen, and L. Bai, “Diffsr: Learning radar reflectivity synthesis via diffusion model from satellite observations,” inICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5. 4

work page 2025
[34]

Towards fine-grained classification of climate change related social media text,

R. Vaid, K. Pant, and M. Shrivastava, “Towards fine-grained classification of climate change related social media text,” inPro- ceedings of the 60th annual meeting of the association for computational linguistics: student research workshop, 2022, pp. 434–443. 4, 10

work page 2022
[35]

Few-shot learning for name entity recognition in geological text based on geobert,

H. Liu, Q. Qiu, L. Wu, W. Li, B. Wang, and Y. Zhou, “Few-shot learning for name entity recognition in geological text based on geobert,”Earth Science Informatics, vol. 15, no. 2, pp. 979–991,

work page
[36]

An empirical study of remote sensing pretraining,

D. Wang, J. Zhang, B. Du, G.-S. Xia, and D. Tao, “An empirical study of remote sensing pretraining,”IEEE Transactions on Geo- science and Remote Sensing, vol. 61, pp. 1–20, 2022. 4

work page 2022
[37]

Earthnets: Empowering ai in earth obser- vation.arXiv preprint arXiv:2210.04936, 2022

Z. Xiong, F. Zhang, Y. Wang, Y. Shi, and X. X. Zhu, “Earth- nets: Empowering ai in earth observation,”arXiv preprint arXiv:2210.04936, 2022. 4

work page arXiv 2022
[38]

Anysat: One earth observation model for many resolutions, scales, and modalities,

G. Astruc, N. Gonthier, C. Mallet, and L. Landrieu, “Anysat: One earth observation model for many resolutions, scales, and modalities,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 19 530–19 540. 4

work page 2025
[39]

Fleximo: A flexible remote sensing foundation model,

X. Li, C. Li, P . Ghamisi, and D. Hong, “Fleximo: A flexible remote sensing foundation model,”arXiv preprint arXiv:2503.23844, 2025. 4

work page arXiv 2025
[40]

Spectralearth: Training hyperspectral foundation models at scale,

N. A. A. Braham, C. M. Albrecht, J. Mairal, J. Chanussot, Y. Wang, and X. X. Zhu, “Spectralearth: Training hyperspectral foundation models at scale,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025. 4

work page 2025
[41]

Geolangbind: Unifying earth observation with agglomerative vision-language foundation models,

Z. Xiong, Y. Wang, W. Yu, A. J. Stewart, J. Zhao, N. Lehmann, T. Dujardin, Z. Yuan, P . Ghamisi, and X. X. Zhu, “Geolangbind: Unifying earth observation with agglomerative vision-language foundation models,”arXiv preprint arXiv:2503.06312, 2025. 4

work page arXiv 2025
[42]

Skysense: A multi-modal re- mote sensing foundation model towards universal interpretation for earth observation imagery,

X. Guo, J. Lao, B. Dang, Y. Zhang, L. Yu, L. Ru, L. Zhong, Z. Huang, K. Wu, D. Huet al., “Skysense: A multi-modal re- mote sensing foundation model towards universal interpretation for earth observation imagery,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 27 672–27 683. 4, 18

work page 2024
[43]

In: Proceedings of the 40th International Conference on Machine Learning

T. Nguyen, J. Brandstetter, A. Kapoor, J. K. Gupta, and A. Grover, “Climax: A foundation model for weather and climate,”arXiv preprint arXiv:2301.10343, 2023. 4, 6, 8, 11

work page arXiv 2023
[44]

Weathergfm: Learning a weather generalist foundation model via in-context learning,

X. Zhao, Z. Zhou, W. Zhang, Y. Liu, X. Chen, J. Gong, H. Chen, B. Fei, S. Chen, W. Ouyanget al., “Weathergfm: Learning a weather generalist foundation model via in-context learning,” arXiv preprint arXiv:2411.05420, 2024. 4, 6, 8, 11

work page arXiv 2024
[45]

Mmearth: Exploring multi-modal pretext tasks for geospatial representation learning,

V . Nedungadi, A. Kariryaa, S. Oehmcke, S. Belongie, C. Igel, and N. Lang, “Mmearth: Exploring multi-modal pretext tasks for geospatial representation learning,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 164–182. 4, 6

work page 2024
[46]

Ctxmim: Context-enhanced masked image modeling for remote sensing image understand- ing,

M. Zhang, Q. Liu, and Y. Wang, “Ctxmim: Context-enhanced masked image modeling for remote sensing image understand- ing,”arXiv preprint arXiv:2310.00022, 2023. 4

work page arXiv 2023
[47]

Earthpt: a time series foundation model for earth observation,

M. J. Smith, L. Fleming, and J. E. Geach, “Earthpt: a time series foundation model for earth observation,”arXiv preprint arXiv:2309.07207, 2023. 4, 6

work page arXiv 2023
[48]

Bridging remote sensors with multisensor geospatial foundation models,

B. Han, S. Zhang, X. Shi, and M. Reichstein, “Bridging remote sensors with multisensor geospatial foundation models,” inPro- ceedings of the ieee/cvf conference on computer vision and pattern recognition, 2024, pp. 27 852–27 862. 4

work page 2024
[49]

Mtp: Advancing remote sensing foundation model via multitask pretraining,

D. Wang, J. Zhang, M. Xu, L. Liu, D. Wang, E. Gao, C. Han, H. Guo, B. Du, D. Taoet al., “Mtp: Advancing remote sensing foundation model via multitask pretraining,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 11 632–11 654, 2024. 4

work page 2024
[50]

Spectraldiff: A generative framework for hyperspectral image classification with diffusion models,

N. Chen, J. Yue, L. Fang, and S. Xia, “Spectraldiff: A generative framework for hyperspectral image classification with diffusion models,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–16, 2023. 4

work page 2023
[51]

Geosynth: Contextually-aware high-resolution satellite image synthesis,

S. Sastry, S. Khanal, A. Dhakal, and N. Jacobs, “Geosynth: Contextually-aware high-resolution satellite image synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 460–470. 4

work page 2024
[52]

Diffusion-geo: A two-stage controllable text-to-image generative model for remote sensing scenarios,

M. Cai, W. Zhang, T. Zhang, Y. Zhuang, H. Chen, L. Chen, and C. Li, “Diffusion-geo: A two-stage controllable text-to-image generative model for remote sensing scenarios,” inIGARSS 2024- 2024 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2024, pp. 7003–7006. 4

work page 2024
[53]

Toward artificial general intelligence in hydrogeolog- ical modeling with an integrated latent diffusion framework,

C. Zhan, Z. Dai, J. J. Jiao, M. R. Soltanian, H. Yin, and K. C. Carroll, “Toward artificial general intelligence in hydrogeolog- ical modeling with an integrated latent diffusion framework,” Geophysical Research Letters, vol. 52, no. 3, p. e2024GL114298, 2025. 4

work page 2025
[54]

Tianxing: A linear complexity transformer model with explicit attention decay for global weather forecasting,

S. Yuan, G. Wang, B. Mu, and F. Zhou, “Tianxing: A linear complexity transformer model with explicit attention decay for global weather forecasting,”Advances in Atmospheric Sciences, vol. 42, no. 1, pp. 9–25, 2025. 4

work page 2025
[55]

Climaqa: An automated evaluation framework for climate question answering models,

V . V . Manivannan, Y. Jafari, S. Eranky, S. Ho, R. Yu, D. Watson- Parris, Y. Ma, L. Bergen, and T. Berg-Kirkpatrick, “Climaqa: An automated evaluation framework for climate question answering models,”arXiv preprint arXiv:2410.16701, 2024. 4, 10

work page arXiv 2024
[56]

Oceangpt: A large language model for ocean science tasks,

Z. Bi, N. Zhang, Y. Xue, Y. Ou, D. Ji, G. Zheng, and H. Chen, “Oceangpt: A large language model for ocean science tasks,” arXiv preprint arXiv:2310.02031, 2023. 4, 6, 11

work page arXiv 2023
[57]

K2: A foundation language model for geoscience knowledge understanding and utilization,

C. Deng, T. Zhang, Z. He, Q. Chen, Y. Shi, Y. Xu, L. Fu, W. Zhang, X. Wang, C. Zhouet al., “K2: A foundation language model for geoscience knowledge understanding and utilization,” in Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024, pp. 161–170. 4, 6, 11

work page 2024
[58]

Jiuzhou: open foundation language models and effective pre-training 23 framework for geoscience,

Z. Chen, M. Lin, M. Zang, Z. Wang, J. Li, and Y. Bai, “Jiuzhou: open foundation language models and effective pre-training 23 framework for geoscience,”International Journal of Digital Earth, vol. 18, no. 1, p. 2449708, 2025. 4

work page 2025
[59]

Geogalactica: A scientific large language model in geoscience,

Z. Lin, C. Deng, L. Zhou, T. Zhang, Y. Xu, Y. Xu, Z. He, Y. Shi, B. Dai, Y. Songet al., “Geogalactica: A scientific large language model in geoscience,”arXiv preprint arXiv:2401.00434, 2023. 4, 6

work page arXiv 2023
[60]

Climatechat: De- signing data and methods for instruction tuning llms to answer climate change queries,

Z. Chen, X. Wang, Y. Liao, M. Lin, and Y. Bai, “Climatechat: De- signing data and methods for instruction tuning llms to answer climate change queries,”arXiv preprint arXiv:2506.13796, 2025. 4, 6

work page arXiv 2025
[61]

Geofactory: an llm performance enhancement framework for geoscience factual and inferential tasks,

Z. Chen, X. Wang, X. Zhang, M. Lin, Y. Liao, J. Li, and Y. Bai, “Geofactory: an llm performance enhancement framework for geoscience factual and inferential tasks,”Big Earth Data, pp. 1– 33, 2025. 4, 7

work page 2025
[62]

Lhrs-bot: Em- powering remote sensing with vgi-enhanced large multimodal language model,

D. Muhtar, Z. Li, F. Gu, X. Zhang, and P . Xiao, “Lhrs-bot: Em- powering remote sensing with vgi-enhanced large multimodal language model,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 440–457. 4, 6, 10, 13

work page 2024
[63]

Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision- language understanding,

J. Luo, Z. Pang, Y. Zhang, T. Wang, L. Wang, B. Dang, J. Lao, J. Wang, J. Chen, Y. Tanet al., “Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision- language understanding,”arXiv preprint arXiv:2406.10100, 2024. 4

work page arXiv 2024
[64]

Earthgpt: A universal multimodal large language model for multisensor image comprehension in remote sensing domain,

W. Zhang, M. Cai, T. Zhang, Y. Zhuang, and X. Mao, “Earthgpt: A universal multimodal large language model for multisensor image comprehension in remote sensing domain,”IEEE Transac- tions on Geoscience and Remote Sensing, vol. 62, pp. 1–20, 2024. 4, 6, 10, 13

work page 2024
[65]

Urbench: A comprehensive benchmark for evaluating large multimodal models in multi-view urban scenar- ios,

B. Zhou, H. Yang, D. Chen, J. Ye, T. Bai, J. Yu, S. Zhang, D. Lin, C. He, and W. Li, “Urbench: A comprehensive benchmark for evaluating large multimodal models in multi-view urban scenar- ios,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 10, 2025, pp. 10 707–10 715. 4

work page 2025
[66]

Teochat: A large vision-language assistant for temporal earth observation data,

J. A. Irvin, E. R. Liu, J. C. Chen, I. Dormoy, J. Kim, S. Khanna, Z. Zheng, and S. Ermon, “Teochat: A large vision-language assistant for temporal earth observation data,”arXiv preprint arXiv:2410.06234, 2024. 4, 6

work page arXiv 2024
[67]

Earthdial: Turning multi-sensory earth observations to interac- tive dialogues,

S. Soni, A. Dudhane, H. Debary, M. Fiaz, M. A. Munir, M. S. Danish, P . Fraccaro, C. D. Watson, L. J. Klein, F. S. Khanet al., “Earthdial: Turning multi-sensory earth observations to interac- tive dialogues,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 14 303–14 313. 4

work page 2025
[68]

Cllmate: A multimodal llm for weather and climate events forecasting,

H. Li, Z. Wang, J. Wang, A. K. H. Lau, and H. Qu, “Cllmate: A multimodal llm for weather and climate events forecasting,” arXiv preprint arXiv:2409.19058, 2024. 4, 7

work page arXiv 2024
[69]

Geochat: Grounded large vision-language model for remote sensing,

K. Kuckreja, M. S. Danish, M. Naseer, A. Das, S. Khan, and F. S. Khan, “Geochat: Grounded large vision-language model for remote sensing,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 27 831–27 840. 4, 7

work page 2024
[70]

Fuxi: a cascade machine learning forecasting system for 15-day global weather forecast,

L. Chen, X. Zhong, F. Zhang, Y. Cheng, Y. Xu, Y. Qi, and H. Li, “Fuxi: a cascade machine learning forecasting system for 15-day global weather forecast,”npj climate and atmospheric science, vol. 6, no. 1, p. 190, 2023. 4, 8, 11

work page 2023
[71]

Fengwu: Pushing the skillful global medium-range weather forecast beyond 10 days lead,

K. Chen, T. Han, J. Gong, L. Bai, F. Ling, J.-J. Luo, X. Chen, L. Ma, T. Zhang, R. Suet al., “Fengwu: Pushing the skillful global medium-range weather forecast beyond 10 days lead,” arXiv preprint arXiv:2304.02948, 2023. 4, 6, 8, 11

work page arXiv 2023
[72]

W-mae: Pre- trained weather model with masked autoencoder for multi- variable weather forecasting,

X. Man, C. Zhang, J. Feng, C. Li, and J. Shao, “W-mae: Pre- trained weather model with masked autoencoder for multi- variable weather forecasting,”arXiv preprint arXiv:2304.08754,

work page arXiv
[73]

Earthformer: Exploring space-time transformers for earth system forecasting,

Z. Gao, X. Shi, H. Wang, Y. Zhu, Y. B. Wang, M. Li, and D.- Y. Yeung, “Earthformer: Exploring space-time transformers for earth system forecasting,”Advances in Neural Information Process- ing Systems, vol. 35, pp. 25 390–25 403, 2022. 4

work page 2022
[74]

Preformer: predictive transformer with multi-scale segment-wise correlations for long-term time series forecasting,

D. Du, B. Su, and Z. Wei, “Preformer: predictive transformer with multi-scale segment-wise correlations for long-term time series forecasting,” inICASSP 2023-2023 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2023, pp. 1–5. 4

work page 2023
[75]

Probabilistic weather forecasting with machine learning,

I. Price, A. Sanchez-Gonzalez, F. Alet, T. R. Andersson, A. El- Kadi, D. Masters, T. Ewalds, J. Stott, S. Mohamed, P . Battaglia et al., “Probabilistic weather forecasting with machine learning,” Nature, vol. 637, no. 8044, pp. 84–90, 2025. 4

work page 2025
[76]

Gen- erative emulation of weather forecast ensembles with diffusion models,

L. Li, R. Carver, I. Lopez-Gomez, F. Sha, and J. Anderson, “Gen- erative emulation of weather forecast ensembles with diffusion models,”Science Advances, vol. 10, no. 13, p. eadk4489, 2024. 4

work page 2024
[77]

Contin- uous ensemble weather forecasting with diffusion models,

M. Andrae, T. Landelius, J. Oskarsson, and F. Lindsten, “Contin- uous ensemble weather forecasting with diffusion models,”arXiv preprint arXiv:2410.05431, 2024. 4

work page arXiv 2024
[78]

Cascast: Skillful high-resolution precipitation nowcasting via cascaded modelling,

J. Gong, L. Bai, P . Ye, W. Xu, N. Liu, J. Dai, X. Yang, and W. Ouyang, “Cascast: Skillful high-resolution precipitation now- casting via cascaded modelling,”arXiv preprint arXiv:2402.04290,

work page arXiv
[79]

Postcast: Generalizable postprocessing for precipitation nowcasting via unsupervised blurriness model- ing,

J. Gong, S. Tu, W. Yang, B. Fei, K. Chen, W. Zhang, X. Yang, W. Ouyang, and L. Bai, “Postcast: Generalizable postprocessing for precipitation nowcasting via unsupervised blurriness model- ing,”arXiv preprint arXiv:2410.05805, 2024. 4

work page arXiv 2024
[80]

Diffusion model with detail complement for super-resolution of remote sensing,

J. Liu, Z. Yuan, Z. Pan, Y. Fu, L. Liu, and B. Lu, “Diffusion model with detail complement for super-resolution of remote sensing,” Remote Sensing, vol. 14, no. 19, p. 4834, 2022. 4

work page 2022

Showing first 80 references.