pith. machine review for the scientific record. sign in

arxiv: 2605.12542 · v1 · submitted 2026-05-09 · 🌌 astro-ph.IM · astro-ph.EP· cs.LG

Recognition: no theorem link

Earth Science Foundation Models: From Perception to Reasoning and Discovery

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:04 UTC · model grok-4.3

classification 🌌 astro-ph.IM astro-ph.EPcs.LG
keywords Earth science foundation modelsmultimodal data integrationperception to reasoningEarth system applicationsdatasets and benchmarksagentic workflowsscientific discovery
0
0 comments X

The pith

Foundation models integrate multimodal Earth data to advance from perception to reasoning and discovery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews how large foundation models combine multi-platform imagery, gridded reanalysis data, geophysical and geochemical observations, and domain text to handle tasks that range from basic perception to advanced scientific discovery in Earth science. It organizes the review along two axes: depth, which follows the progression of model capabilities from simple sensing to multimodal reasoning and agentic workflows, and breadth, which maps applications across the atmosphere, hydrosphere, lithosphere, biosphere, anthroposphere, cryosphere, and their coupled processes. The authors assemble more than 200 datasets and benchmarks to ground the survey and then identify open problems in data heterogeneity, scientific reliability, scalability, and the shift toward autonomous systems. A reader would care because the resulting structure supplies a clear map of where the technology currently stands and what steps are needed to produce trustworthy, actionable AI tools for studying Earth.

Core claim

Large foundation models are transforming Earth science by integrating heterogeneous multimodal data to support tasks ranging from basic perception to advanced scientific discovery. The review traces capability evolution along a depth dimension from perception to multimodal reasoning and agentic workflows, while surveying application breadth across Earth's major spheres and coupled processes. It compiles more than 200 datasets and benchmarks, discusses challenges of multimodal heterogeneity, scientific reliability, continual updating, scalability, and the move to agentic intelligence, and outlines directions toward integrated, trustworthy, and actionable AI Earth scientists.

What carries the argument

Two-dimensional review framework of depth (evolution of capabilities from perception through reasoning to agentic workflows) and breadth (applications across atmosphere, hydrosphere, lithosphere, biosphere, anthroposphere, cryosphere, and coupled Earth systems), used to organize models and datasets.

If this is right

  • Multimodal integration enables support for tasks from basic perception to advanced scientific discovery across Earth system components.
  • The compiled collection of more than 200 datasets and benchmarks supplies concrete resources for evaluating and advancing models.
  • Progress requires explicit attention to data heterogeneity, scientific reliability, and scalability.
  • The field should move from current foundation models toward agentic and embodied intelligence to produce more actionable Earth science tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The depth-breadth framework could be applied to test whether new models improve performance on coupled Earth system processes that cross multiple spheres.
  • The assembled benchmarks could be used to run controlled comparisons that measure how well current models handle continual updating with new observations.
  • Future agentic systems built on this roadmap might be evaluated by their ability to propose and verify hypotheses in specific domains such as extreme weather or ecosystem change.
  • Gaps revealed by the review could guide targeted data collection efforts for modalities or regions that are currently underrepresented.

Load-bearing premise

That the representative multimodal models chosen for review and the more than 200 compiled datasets and benchmarks are comprehensive and unbiased enough to support a reliable unified roadmap for the whole field.

What would settle it

Identification of one or more major Earth foundation models or key datasets and benchmarks omitted from the compilation whose inclusion would change the stated challenges or the proposed future directions.

Figures

Figures reproduced from arXiv: 2605.12542 by Ben Fei, Bo Liu, Fenghua Ling, Feng Liu, Fengxiang Wang, Wanghan Xu, Wangxu Wei, Wenlong Zhang, Xiangyu Zhao, Xiao-Ming Wu, Yuehan Zhang, Zelin Song.

Figure 1
Figure 1. Figure 1: The evolutionary roadmap of Earth science AI. The field has progressed from multi-sphere [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Taxonomy of AI models for Earth science across the three stages of [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of atmosphere science data type, foundation models and applications. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of biosphere science data type, foundation models and applications. [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of anthroposphere science data type, foundation models and applications. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of lithosphere science data type, foundation models and applications. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Illustration of hydrosphere&cryosphere science data type, foundation models and applications. [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
read the original abstract

Large foundation models (FMs) are transforming Earth science by integrating heterogeneous multimodal data, such as multi-platform imagery, gridded reanalysis data, diverse geophysical and geochemical observations, and domain-specific text, to support tasks ranging from basic perception to advanced scientific discovery. This paper provides a unified review of Earth science foundation models (Earth FMs) through two complementary dimensions: depth, which traces the evolution of model capabilities from perception to multimodal reasoning and agentic scientific workflows, and breadth, which summarizes their expanding applications across the atmosphere, hydrosphere, lithosphere, biosphere, anthroposphere, and cryosphere, as well as coupled Earth system processes. Using this framework, we review representative multimodal Earth foundation models and compile more than 200 datasets and benchmarks spanning diverse Earth science tasks and modalities. We further discuss key challenges in multimodal data heterogeneity, scientific reliability and continual updating, scalability and sustainability, and the transition from foundation models to agentic and embodied Earth intelligence, and outline future directions toward more integrated, trustworthy, and actionable AI Earth scientists. Overall, this paper offers a structured roadmap for understanding the development of Earth foundation models from both capability depth and application breadth.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript provides a unified review of Earth science foundation models (Earth FMs) structured along two dimensions: depth, which traces the evolution from perception tasks through multimodal reasoning to agentic scientific workflows, and breadth, which covers applications across the atmosphere, hydrosphere, lithosphere, biosphere, anthroposphere, cryosphere, and coupled Earth system processes. It reviews representative multimodal models, compiles more than 200 datasets and benchmarks, discusses challenges in data heterogeneity, scientific reliability, scalability, and the transition to agentic intelligence, and outlines future directions for integrated, trustworthy AI systems in Earth science.

Significance. If the compilation of models and datasets proves representative, the survey could offer a useful roadmap for researchers working on multimodal AI for Earth science by synthesizing progress from basic perception to discovery-oriented tasks. The framework helps organize a rapidly growing literature and flags important practical challenges such as continual updating and trustworthiness, which are relevant for scientific applications. As a review without new empirical results or derivations, its value rests entirely on the accuracy and balance of the selected examples and tabulated resources.

major comments (1)
  1. Abstract: The central claim that the review compiles more than 200 datasets and benchmarks to support a 'reliable unified roadmap' is load-bearing, yet the manuscript provides no explicit selection criteria, inclusion/exclusion rules, or search methodology for choosing the representative models and datasets. Without this, readers cannot assess completeness or bias.
minor comments (2)
  1. The abstract and introduction could include a short table or figure summarizing the depth/breadth taxonomy to improve readability before the detailed sections.
  2. Ensure that all cited models and datasets in the review sections are cross-referenced to the compiled tables for easy lookup.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive feedback. We address the single major comment below and will incorporate revisions to enhance the transparency of our survey methodology.

read point-by-point responses
  1. Referee: Abstract: The central claim that the review compiles more than 200 datasets and benchmarks to support a 'reliable unified roadmap' is load-bearing, yet the manuscript provides no explicit selection criteria, inclusion/exclusion rules, or search methodology for choosing the representative models and datasets. Without this, readers cannot assess completeness or bias.

    Authors: We agree that the absence of explicit selection criteria limits the ability to evaluate the survey's scope and potential biases. In the revised manuscript, we will insert a new subsection (Section 2.1, 'Review Scope and Methodology') immediately following the introduction. This subsection will detail: (1) the literature search protocol (keywords such as 'Earth foundation model', 'multimodal Earth AI', 'geospatial foundation model' queried on arXiv, Google Scholar, and major conferences from 2020–2024); (2) inclusion criteria (peer-reviewed or high-quality preprint works describing multimodal models with at least two Earth-science modalities, or datasets with documented public availability and task annotations); (3) exclusion criteria (purely single-modality perception models, non-public datasets, or works lacking sufficient technical detail); and (4) our approach to ensuring representativeness across the depth (perception-to-agentic) and breadth (six Earth spheres plus coupled processes) dimensions. We will also add a short sentence in the abstract referencing this methodology section. These changes will allow readers to assess completeness while preserving the review's focus on representative rather than exhaustive coverage. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

This manuscript is a survey paper that reviews existing Earth science foundation models and compiles more than 200 external datasets and benchmarks using a depth/breadth organizational framework. It presents no new derivations, equations, predictions, or first-principles results whose validity depends on quantities defined inside the paper itself. All cited models, data sources, and benchmarks originate from prior external literature, and the paper's structure functions purely as an organizing roadmap rather than a self-referential chain. No load-bearing steps reduce by construction to fitted inputs or self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The review rests on standard AI definitions of foundation models and multimodal integration; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption Earth foundation models can be meaningfully categorized along a depth axis from perception to reasoning and a breadth axis across Earth system spheres
    This two-dimensional framework is used to structure the entire review and roadmap as stated in the abstract.

pith-pipeline@v0.9.0 · 5543 in / 1353 out tokens · 57526 ms · 2026-05-14T22:04:45.908253+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

299 extracted references · 299 canonical work pages · 2 internal anchors

  1. [1]

    Artificial intelligence for geoscience: Progress, challenges, and perspectives,

    T. Zhao, S. Wang, C. Ouyang, M. Chen, C. Liu, J. Zhang, L. Yu, F. Wang, Y. Xie, J. Liet al., “Artificial intelligence for geoscience: Progress, challenges, and perspectives,”The Innovation, vol. 5, no. 5, 2024. 1, 2, 3, 15

  2. [2]

    Aurora: A foundation model of the atmosphere,

    C. Bodnar, “Aurora: A foundation model of the atmosphere,” in AGU Fall Meeting Abstracts, vol. 2024, 2024, pp. GC21C–03. 1, 4, 6, 8, 11

  3. [3]

    Pangu- weather: A 3d high-resolution model for fast and accurate global weather forecast,

    K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, “Pangu- weather: A 3d high-resolution model for fast and accurate global weather forecast,”arXiv preprint arXiv:2211.02556, 2022. 1, 4, 6, 8, 11

  4. [4]

    Citygpt: Towards urban iot learning, analysis and interaction with multi-agent system,

    Q. Guan, J. Ouyang, D. Wu, and W. Yu, “Citygpt: Towards urban iot learning, analysis and interaction with multi-agent system,” arXiv preprint arXiv:2405.14691, 2024. 1, 11, 14

  5. [5]

    Graphcast: Ai model for faster and more accurate global weather forecasting,

    R. Lam, R. Pascanu, M. Puigdomènech Gimenez, S. Agrawal, C. Dapogny, M. Schmidt, T. Keck, M. Mudigonda, P . Brutlag, J. Wanget al., “Graphcast: Ai model for faster and more accurate global weather forecasting,”Science, vol. 382, pp. 1416–1421, 2023. 1, 6, 8, 11

  6. [6]

    S-clip: Semi-supervised vision-language learning using few specialist captions,

    S. Mo, M. Kim, K. Lee, and J. Shin, “S-clip: Semi-supervised vision-language learning using few specialist captions,”Advances in Neural Information Processing Systems, vol. 36, pp. 61 187–61 212,

  7. [7]

    Marinedet: Towards open-marine object detection,

    L. Haixin, Z. Ziqiang, M. Zeyu, and S.-K. Yeung, “Marinedet: Towards open-marine object detection,”arXiv preprint arXiv:2310.01931, 2023. 1, 6, 11

  8. [8]

    Trs: Transformers for remote sensing scene classification,

    J. Zhang, H. Zhao, and J. Li, “Trs: Transformers for remote sensing scene classification,”Remote Sensing, vol. 13, no. 20, p. 4143, 2021. 1, 4, 6

  9. [9]

    Climateagents: A multi-agent research assis- tant for social-climate dynamics analysis,

    S. Shan, “Climateagents: A multi-agent research assis- tant for social-climate dynamics analysis,”arXiv preprint arXiv:2603.13840, 2026. 1, 19

  10. [10]

    Openearthagent: A unified framework for tool-augmented geospatial agents,

    A. Shabbir, M. U. Sheikh, M. A. Munir, H. Debary, M. Fiaz, M. Z. Zaheer, P . Fraccaro, F. S. Khan, M. H. Khan, X. X. Zhu et al., “Openearthagent: A unified framework for tool-augmented geospatial agents,”arXiv preprint arXiv:2602.17665, 2026. 1, 2, 11, 19

  11. [11]

    Prithvi wxc: Foundation model for weather and climate,

    J. Schmude, S. Roy, W. Trojak, J. Jakubik, D. S. Civitarese, S. Singh, J. Kuehnert, K. Ankur, A. Gupta, C. E. Phillipset al., “Prithvi wxc: Foundation model for weather and climate,”arXiv preprint arXiv:2409.13598, 2024. 1, 4, 6, 8, 11

  12. [12]

    Terramind: Large-scale generative multimodality for earth ob- servation,

    J. Jakubik, F. Yang, B. Blumenstiel, E. Scheurer, R. Sedona, S. Mau- rogiovanni, J. Bosmans, N. Dionelis, V . Marsocci, N. Koppet al., “Terramind: Large-scale generative multimodality for earth ob- servation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 7383–7394. 1, 2, 11, 18

  13. [13]

    GISclaw: A Comprehensive Open-Source LLM Agent System for Realistic Multi-Step Geospatial Analysis

    J. Han, J. Lee, Y. Shim, J. Kim, and J.-J. Lee, “Gisclaw: An open-source llm-powered agent system for full-stack geospatial analysis,”arXiv preprint arXiv:2603.26845, 2026. 1, 19

  14. [14]

    Earthlink: A self-evolving ai agent for climate science,

    Z. Guo, J. Wang, X. Yue, W. Wei, Z. Jiang, W. Xu, B. Fei, W. Zhang, X. Gu, L. Chenget al., “Earthlink: A self-evolving ai agent for climate science,”arXiv e-prints, pp. arXiv–2507, 2025. 1, 19

  15. [15]

    Towards vision-language geo-foundation model: A survey. arxiv 2024,

    Y. Zhou, L. Feng, Y. Ke, X. Jiang, J. Yan, X. Yang, and W. Zhang, “Towards vision-language geo-foundation model: A survey. arxiv 2024,”arXiv preprint arXiv:2406.09385. 2, 3

  16. [16]

    Foundation models for remote sensing and earth observation: A survey,

    A. Xiao, W. Xuan, J. Wang, J. Huang, D. Tao, S. Lu, and N. Yokoya, “Foundation models for remote sensing and earth observation: A survey,”IEEE Geoscience and Remote Sensing Magazine, 2025. 2, 3

  17. [17]

    On the foundations of earth foundation models,

    X. X. Zhu, Z. Xiong, Y. Wang, A. J. Stewart, K. Heidler, Y. Wang, Z. Yuan, T. Dujardin, Q. Xu, and Y. Shi, “On the foundations of earth foundation models,”Communications Earth & Environment,

  18. [18]

    A hierarchical multi-agent system for au- tonomous discovery in geoscientific data archives,

    D. Pantiukhin, I. Kuznetsov, B. Shapkin, A. A. Jost, T. Jung, and N. Koldunov, “A hierarchical multi-agent system for au- tonomous discovery in geoscientific data archives,”arXiv preprint arXiv:2602.21351, 2026. 2, 11

  19. [19]

    Foun- dation models in remote sensing: Evolving from unimodality to multimodality,

    D. Hong, C. Li, X. Li, G. Camps-Valls, and J. Chanussot, “Foun- dation models in remote sensing: Evolving from unimodality to multimodality,”IEEE Geoscience and Remote Sensing Magazine,

  20. [20]

    Towards urban general intelligence: A review and outlook of urban foundation models,

    W. Zhang, J. Han, Z. Xu, H. Ni, T. Lyu, H. Liu, and H. Xiong, “Towards urban general intelligence: A review and outlook of urban foundation models,”arXiv preprint arXiv:2402.01749, 2024. 3

  21. [21]

    Two-stream swin transformer with differentiable sobel operator for remote sensing image classification,

    S. Hao, B. Wu, K. Zhao, Y. Ye, and W. Wang, “Two-stream swin transformer with differentiable sobel operator for remote sensing image classification,”Remote Sensing, vol. 14, no. 6, p. 1507, 2022. 4, 6

  22. [22]

    Homo– heterogenous transformer learning framework for rs scene clas- sification,

    J. Ma, M. Li, X. Tang, X. Zhang, F. Liu, and L. Jiao, “Homo– heterogenous transformer learning framework for rs scene clas- sification,”IEEE Journal of Selected Topics in Applied Earth Observa- tions and Remote Sensing, vol. 15, pp. 2223–2239, 2022. 4

  23. [23]

    Transformer with transfer cnn for remote-sensing-image object detection,

    Q. Li, Y. Chen, and Y. Zeng, “Transformer with transfer cnn for remote-sensing-image object detection,”Remote Sensing, vol. 14, no. 4, p. 984, 2022. 4

  24. [24]

    Gansformer: A detection network for aerial images with high performance com- bining convolutional network and transformer,

    Y. Zhang, X. Liu, S. Wa, S. Chen, and Q. Ma, “Gansformer: A detection network for aerial images with high performance com- bining convolutional network and transformer,”Remote Sensing, vol. 14, no. 4, p. 923, 2022. 4

  25. [25]

    Adt-det: Adaptive dynamic refined single-stage transformer detector for arbitrary- oriented object detection in satellite optical imagery,

    Y. Zheng, P . Sun, Z. Zhou, W. Xu, and Q. Ren, “Adt-det: Adaptive dynamic refined single-stage transformer detector for arbitrary- oriented object detection in satellite optical imagery,”Remote Sensing, vol. 13, no. 13, p. 2623, 2021. 4

  26. [26]

    Deep multiscale siamese network with parallel convolutional structure and self-attention for change detection,

    Q. Guo, J. Zhang, S. Zhu, C. Zhong, and Y. Zhang, “Deep multiscale siamese network with parallel convolutional structure and self-attention for change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–12, 2021. 4

  27. [27]

    Resdeepd: A residual super- resolution network for deep downscaling of daily precipitation over india,

    S. C. M. Sharma and A. Mitra, “Resdeepd: A residual super- resolution network for deep downscaling of daily precipitation over india,”Environmental Data Science, vol. 1, p. e19, 2022. 4

  28. [28]

    Downscal- ing multi-model climate projection ensembles with deep learning (deepesd): contribution to cordex eur-44,

    J. Baño-Medina, R. Manzanas, E. Cimadevilla, J. Fernández, J. González-Abad, A. S. Cofiño, and J. M. Gutiérrez, “Downscal- ing multi-model climate projection ensembles with deep learning (deepesd): contribution to cordex eur-44,”Geoscientific Model Development Discussions, vol. 2022, pp. 1–14, 2022. 4

  29. [29]

    Inves- tigating two super-resolution methods for downscaling precipi- tation: Esrgan and car,

    C. D. Watson, C. Wang, T. Lynar, and K. Weldemariam, “Inves- tigating two super-resolution methods for downscaling precipi- tation: Esrgan and car,”arXiv preprint arXiv:2012.01233, 2020. 4, 6

  30. [30]

    Fast and accurate learned multiresolution dynamical downscaling for precipitation,

    J. Wang, Z. Liu, I. Foster, W. Chang, R. Kettimuthu, and V . R. Ko- tamarthi, “Fast and accurate learned multiresolution dynamical downscaling for precipitation,”Geoscientific Model Development Discussions, vol. 2021, pp. 1–24, 2021. 4

  31. [31]

    Adversarial super-resolution of climatological wind and solar data,

    K. Stengel, A. Glaws, D. Hettinger, and R. N. King, “Adversarial super-resolution of climatological wind and solar data,”Proceed- ings of the National Academy of Sciences, vol. 117, no. 29, pp. 16 805– 16 815, 2020. 4, 6

  32. [32]

    A deconvolution technology of microwave radiometer data using convolutional neural networks,

    W. Hu, W. Zhang, S. Chen, X. Lv, D. An, and L. Ligthart, “A deconvolution technology of microwave radiometer data using convolutional neural networks,”Remote Sensing, vol. 10, no. 2, p. 275, 2018. 4

  33. [33]

    Diffsr: Learning radar reflectivity synthesis via diffusion model from satellite observations,

    X. He, Z. Zhou, W. Zhang, X. Zhao, H. Chen, S. Chen, and L. Bai, “Diffsr: Learning radar reflectivity synthesis via diffusion model from satellite observations,” inICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5. 4

  34. [34]

    Towards fine-grained classification of climate change related social media text,

    R. Vaid, K. Pant, and M. Shrivastava, “Towards fine-grained classification of climate change related social media text,” inPro- ceedings of the 60th annual meeting of the association for computational linguistics: student research workshop, 2022, pp. 434–443. 4, 10

  35. [35]

    Few-shot learning for name entity recognition in geological text based on geobert,

    H. Liu, Q. Qiu, L. Wu, W. Li, B. Wang, and Y. Zhou, “Few-shot learning for name entity recognition in geological text based on geobert,”Earth Science Informatics, vol. 15, no. 2, pp. 979–991,

  36. [36]

    An empirical study of remote sensing pretraining,

    D. Wang, J. Zhang, B. Du, G.-S. Xia, and D. Tao, “An empirical study of remote sensing pretraining,”IEEE Transactions on Geo- science and Remote Sensing, vol. 61, pp. 1–20, 2022. 4

  37. [37]

    Earthnets: Empowering ai in earth obser- vation.arXiv preprint arXiv:2210.04936, 2022

    Z. Xiong, F. Zhang, Y. Wang, Y. Shi, and X. X. Zhu, “Earth- nets: Empowering ai in earth observation,”arXiv preprint arXiv:2210.04936, 2022. 4

  38. [38]

    Anysat: One earth observation model for many resolutions, scales, and modalities,

    G. Astruc, N. Gonthier, C. Mallet, and L. Landrieu, “Anysat: One earth observation model for many resolutions, scales, and modalities,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 19 530–19 540. 4

  39. [39]

    Fleximo: A flexible remote sensing foundation model,

    X. Li, C. Li, P . Ghamisi, and D. Hong, “Fleximo: A flexible remote sensing foundation model,”arXiv preprint arXiv:2503.23844, 2025. 4

  40. [40]

    Spectralearth: Training hyperspectral foundation models at scale,

    N. A. A. Braham, C. M. Albrecht, J. Mairal, J. Chanussot, Y. Wang, and X. X. Zhu, “Spectralearth: Training hyperspectral foundation models at scale,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025. 4

  41. [41]

    Geolangbind: Unifying earth observation with agglomerative vision-language foundation models,

    Z. Xiong, Y. Wang, W. Yu, A. J. Stewart, J. Zhao, N. Lehmann, T. Dujardin, Z. Yuan, P . Ghamisi, and X. X. Zhu, “Geolangbind: Unifying earth observation with agglomerative vision-language foundation models,”arXiv preprint arXiv:2503.06312, 2025. 4

  42. [42]

    Skysense: A multi-modal re- mote sensing foundation model towards universal interpretation for earth observation imagery,

    X. Guo, J. Lao, B. Dang, Y. Zhang, L. Yu, L. Ru, L. Zhong, Z. Huang, K. Wu, D. Huet al., “Skysense: A multi-modal re- mote sensing foundation model towards universal interpretation for earth observation imagery,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 27 672–27 683. 4, 18

  43. [43]

    In: Proceedings of the 40th International Conference on Machine Learning

    T. Nguyen, J. Brandstetter, A. Kapoor, J. K. Gupta, and A. Grover, “Climax: A foundation model for weather and climate,”arXiv preprint arXiv:2301.10343, 2023. 4, 6, 8, 11

  44. [44]

    Weathergfm: Learning a weather generalist foundation model via in-context learning,

    X. Zhao, Z. Zhou, W. Zhang, Y. Liu, X. Chen, J. Gong, H. Chen, B. Fei, S. Chen, W. Ouyanget al., “Weathergfm: Learning a weather generalist foundation model via in-context learning,” arXiv preprint arXiv:2411.05420, 2024. 4, 6, 8, 11

  45. [45]

    Mmearth: Exploring multi-modal pretext tasks for geospatial representation learning,

    V . Nedungadi, A. Kariryaa, S. Oehmcke, S. Belongie, C. Igel, and N. Lang, “Mmearth: Exploring multi-modal pretext tasks for geospatial representation learning,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 164–182. 4, 6

  46. [46]

    Ctxmim: Context-enhanced masked image modeling for remote sensing image understand- ing,

    M. Zhang, Q. Liu, and Y. Wang, “Ctxmim: Context-enhanced masked image modeling for remote sensing image understand- ing,”arXiv preprint arXiv:2310.00022, 2023. 4

  47. [47]

    Earthpt: a time series foundation model for earth observation,

    M. J. Smith, L. Fleming, and J. E. Geach, “Earthpt: a time series foundation model for earth observation,”arXiv preprint arXiv:2309.07207, 2023. 4, 6

  48. [48]

    Bridging remote sensors with multisensor geospatial foundation models,

    B. Han, S. Zhang, X. Shi, and M. Reichstein, “Bridging remote sensors with multisensor geospatial foundation models,” inPro- ceedings of the ieee/cvf conference on computer vision and pattern recognition, 2024, pp. 27 852–27 862. 4

  49. [49]

    Mtp: Advancing remote sensing foundation model via multitask pretraining,

    D. Wang, J. Zhang, M. Xu, L. Liu, D. Wang, E. Gao, C. Han, H. Guo, B. Du, D. Taoet al., “Mtp: Advancing remote sensing foundation model via multitask pretraining,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 11 632–11 654, 2024. 4

  50. [50]

    Spectraldiff: A generative framework for hyperspectral image classification with diffusion models,

    N. Chen, J. Yue, L. Fang, and S. Xia, “Spectraldiff: A generative framework for hyperspectral image classification with diffusion models,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–16, 2023. 4

  51. [51]

    Geosynth: Contextually-aware high-resolution satellite image synthesis,

    S. Sastry, S. Khanal, A. Dhakal, and N. Jacobs, “Geosynth: Contextually-aware high-resolution satellite image synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 460–470. 4

  52. [52]

    Diffusion-geo: A two-stage controllable text-to-image generative model for remote sensing scenarios,

    M. Cai, W. Zhang, T. Zhang, Y. Zhuang, H. Chen, L. Chen, and C. Li, “Diffusion-geo: A two-stage controllable text-to-image generative model for remote sensing scenarios,” inIGARSS 2024- 2024 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2024, pp. 7003–7006. 4

  53. [53]

    Toward artificial general intelligence in hydrogeolog- ical modeling with an integrated latent diffusion framework,

    C. Zhan, Z. Dai, J. J. Jiao, M. R. Soltanian, H. Yin, and K. C. Carroll, “Toward artificial general intelligence in hydrogeolog- ical modeling with an integrated latent diffusion framework,” Geophysical Research Letters, vol. 52, no. 3, p. e2024GL114298, 2025. 4

  54. [54]

    Tianxing: A linear complexity transformer model with explicit attention decay for global weather forecasting,

    S. Yuan, G. Wang, B. Mu, and F. Zhou, “Tianxing: A linear complexity transformer model with explicit attention decay for global weather forecasting,”Advances in Atmospheric Sciences, vol. 42, no. 1, pp. 9–25, 2025. 4

  55. [55]

    Climaqa: An automated evaluation framework for climate question answering models,

    V . V . Manivannan, Y. Jafari, S. Eranky, S. Ho, R. Yu, D. Watson- Parris, Y. Ma, L. Bergen, and T. Berg-Kirkpatrick, “Climaqa: An automated evaluation framework for climate question answering models,”arXiv preprint arXiv:2410.16701, 2024. 4, 10

  56. [56]

    Oceangpt: A large language model for ocean science tasks,

    Z. Bi, N. Zhang, Y. Xue, Y. Ou, D. Ji, G. Zheng, and H. Chen, “Oceangpt: A large language model for ocean science tasks,” arXiv preprint arXiv:2310.02031, 2023. 4, 6, 11

  57. [57]

    K2: A foundation language model for geoscience knowledge understanding and utilization,

    C. Deng, T. Zhang, Z. He, Q. Chen, Y. Shi, Y. Xu, L. Fu, W. Zhang, X. Wang, C. Zhouet al., “K2: A foundation language model for geoscience knowledge understanding and utilization,” in Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024, pp. 161–170. 4, 6, 11

  58. [58]

    Jiuzhou: open foundation language models and effective pre-training 23 framework for geoscience,

    Z. Chen, M. Lin, M. Zang, Z. Wang, J. Li, and Y. Bai, “Jiuzhou: open foundation language models and effective pre-training 23 framework for geoscience,”International Journal of Digital Earth, vol. 18, no. 1, p. 2449708, 2025. 4

  59. [59]

    Geogalactica: A scientific large language model in geoscience,

    Z. Lin, C. Deng, L. Zhou, T. Zhang, Y. Xu, Y. Xu, Z. He, Y. Shi, B. Dai, Y. Songet al., “Geogalactica: A scientific large language model in geoscience,”arXiv preprint arXiv:2401.00434, 2023. 4, 6

  60. [60]

    Climatechat: De- signing data and methods for instruction tuning llms to answer climate change queries,

    Z. Chen, X. Wang, Y. Liao, M. Lin, and Y. Bai, “Climatechat: De- signing data and methods for instruction tuning llms to answer climate change queries,”arXiv preprint arXiv:2506.13796, 2025. 4, 6

  61. [61]

    Geofactory: an llm performance enhancement framework for geoscience factual and inferential tasks,

    Z. Chen, X. Wang, X. Zhang, M. Lin, Y. Liao, J. Li, and Y. Bai, “Geofactory: an llm performance enhancement framework for geoscience factual and inferential tasks,”Big Earth Data, pp. 1– 33, 2025. 4, 7

  62. [62]

    Lhrs-bot: Em- powering remote sensing with vgi-enhanced large multimodal language model,

    D. Muhtar, Z. Li, F. Gu, X. Zhang, and P . Xiao, “Lhrs-bot: Em- powering remote sensing with vgi-enhanced large multimodal language model,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 440–457. 4, 6, 10, 13

  63. [63]

    Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision- language understanding,

    J. Luo, Z. Pang, Y. Zhang, T. Wang, L. Wang, B. Dang, J. Lao, J. Wang, J. Chen, Y. Tanet al., “Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision- language understanding,”arXiv preprint arXiv:2406.10100, 2024. 4

  64. [64]

    Earthgpt: A universal multimodal large language model for multisensor image comprehension in remote sensing domain,

    W. Zhang, M. Cai, T. Zhang, Y. Zhuang, and X. Mao, “Earthgpt: A universal multimodal large language model for multisensor image comprehension in remote sensing domain,”IEEE Transac- tions on Geoscience and Remote Sensing, vol. 62, pp. 1–20, 2024. 4, 6, 10, 13

  65. [65]

    Urbench: A comprehensive benchmark for evaluating large multimodal models in multi-view urban scenar- ios,

    B. Zhou, H. Yang, D. Chen, J. Ye, T. Bai, J. Yu, S. Zhang, D. Lin, C. He, and W. Li, “Urbench: A comprehensive benchmark for evaluating large multimodal models in multi-view urban scenar- ios,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 10, 2025, pp. 10 707–10 715. 4

  66. [66]

    Teochat: A large vision-language assistant for temporal earth observation data,

    J. A. Irvin, E. R. Liu, J. C. Chen, I. Dormoy, J. Kim, S. Khanna, Z. Zheng, and S. Ermon, “Teochat: A large vision-language assistant for temporal earth observation data,”arXiv preprint arXiv:2410.06234, 2024. 4, 6

  67. [67]

    Earthdial: Turning multi-sensory earth observations to interac- tive dialogues,

    S. Soni, A. Dudhane, H. Debary, M. Fiaz, M. A. Munir, M. S. Danish, P . Fraccaro, C. D. Watson, L. J. Klein, F. S. Khanet al., “Earthdial: Turning multi-sensory earth observations to interac- tive dialogues,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 14 303–14 313. 4

  68. [68]

    Cllmate: A multimodal llm for weather and climate events forecasting,

    H. Li, Z. Wang, J. Wang, A. K. H. Lau, and H. Qu, “Cllmate: A multimodal llm for weather and climate events forecasting,” arXiv preprint arXiv:2409.19058, 2024. 4, 7

  69. [69]

    Geochat: Grounded large vision-language model for remote sensing,

    K. Kuckreja, M. S. Danish, M. Naseer, A. Das, S. Khan, and F. S. Khan, “Geochat: Grounded large vision-language model for remote sensing,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 27 831–27 840. 4, 7

  70. [70]

    Fuxi: a cascade machine learning forecasting system for 15-day global weather forecast,

    L. Chen, X. Zhong, F. Zhang, Y. Cheng, Y. Xu, Y. Qi, and H. Li, “Fuxi: a cascade machine learning forecasting system for 15-day global weather forecast,”npj climate and atmospheric science, vol. 6, no. 1, p. 190, 2023. 4, 8, 11

  71. [71]

    Fengwu: Pushing the skillful global medium-range weather forecast beyond 10 days lead,

    K. Chen, T. Han, J. Gong, L. Bai, F. Ling, J.-J. Luo, X. Chen, L. Ma, T. Zhang, R. Suet al., “Fengwu: Pushing the skillful global medium-range weather forecast beyond 10 days lead,” arXiv preprint arXiv:2304.02948, 2023. 4, 6, 8, 11

  72. [72]

    W-mae: Pre- trained weather model with masked autoencoder for multi- variable weather forecasting,

    X. Man, C. Zhang, J. Feng, C. Li, and J. Shao, “W-mae: Pre- trained weather model with masked autoencoder for multi- variable weather forecasting,”arXiv preprint arXiv:2304.08754,

  73. [73]

    Earthformer: Exploring space-time transformers for earth system forecasting,

    Z. Gao, X. Shi, H. Wang, Y. Zhu, Y. B. Wang, M. Li, and D.- Y. Yeung, “Earthformer: Exploring space-time transformers for earth system forecasting,”Advances in Neural Information Process- ing Systems, vol. 35, pp. 25 390–25 403, 2022. 4

  74. [74]

    Preformer: predictive transformer with multi-scale segment-wise correlations for long-term time series forecasting,

    D. Du, B. Su, and Z. Wei, “Preformer: predictive transformer with multi-scale segment-wise correlations for long-term time series forecasting,” inICASSP 2023-2023 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2023, pp. 1–5. 4

  75. [75]

    Probabilistic weather forecasting with machine learning,

    I. Price, A. Sanchez-Gonzalez, F. Alet, T. R. Andersson, A. El- Kadi, D. Masters, T. Ewalds, J. Stott, S. Mohamed, P . Battaglia et al., “Probabilistic weather forecasting with machine learning,” Nature, vol. 637, no. 8044, pp. 84–90, 2025. 4

  76. [76]

    Gen- erative emulation of weather forecast ensembles with diffusion models,

    L. Li, R. Carver, I. Lopez-Gomez, F. Sha, and J. Anderson, “Gen- erative emulation of weather forecast ensembles with diffusion models,”Science Advances, vol. 10, no. 13, p. eadk4489, 2024. 4

  77. [77]

    Contin- uous ensemble weather forecasting with diffusion models,

    M. Andrae, T. Landelius, J. Oskarsson, and F. Lindsten, “Contin- uous ensemble weather forecasting with diffusion models,”arXiv preprint arXiv:2410.05431, 2024. 4

  78. [78]

    Cascast: Skillful high-resolution precipitation nowcasting via cascaded modelling,

    J. Gong, L. Bai, P . Ye, W. Xu, N. Liu, J. Dai, X. Yang, and W. Ouyang, “Cascast: Skillful high-resolution precipitation now- casting via cascaded modelling,”arXiv preprint arXiv:2402.04290,

  79. [79]

    Postcast: Generalizable postprocessing for precipitation nowcasting via unsupervised blurriness model- ing,

    J. Gong, S. Tu, W. Yang, B. Fei, K. Chen, W. Zhang, X. Yang, W. Ouyang, and L. Bai, “Postcast: Generalizable postprocessing for precipitation nowcasting via unsupervised blurriness model- ing,”arXiv preprint arXiv:2410.05805, 2024. 4

  80. [80]

    Diffusion model with detail complement for super-resolution of remote sensing,

    J. Liu, Z. Yuan, Z. Pan, Y. Fu, L. Liu, and B. Lu, “Diffusion model with detail complement for super-resolution of remote sensing,” Remote Sensing, vol. 14, no. 19, p. 4834, 2022. 4

Showing first 80 references.