A PMP-inspired Evaluation Framework for Assessing Deep-Learning Earth System Models
Pith reviewed 2026-05-21 10:27 UTC · model grok-4.3
The pith
A framework using standardized climate diagnostics evaluates deep-learning Earth system models for their ability to reproduce key climate features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that adapting a collection of established climate diagnostics permits deep-learning Earth system models to be tested for reproduction of climatology, major modes of variability, monsoon systems, and precipitation behavior relative to observations and conventional model benchmarks, revealing strengths in large-scale fields alongside persistent difficulties with precipitation, tropical variability, and long-run stability in some cases.
What carries the argument
The set of standardized diagnostics that quantify a model's skill at reproducing climatology, variability modes, monsoon behavior, and precipitation patterns against observational references and benchmark simulations.
If this is right
- Direct comparison becomes possible between deep-learning models and traditional climate models using the same metrics.
- Strengths appear in several large-scale climate fields and modes of variability.
- Challenges are identified in precipitation simulation, tropical variability, and long-run stability for certain model versions.
- The framework supports guiding future development of deep-learning models toward climate-relevant uses.
- It serves as a step toward establishing trust in these models for Earth system science tasks.
Where Pith is reading between the lines
- The same diagnostics could be applied to hybrid models that combine deep-learning components with physics-based ones to check consistency across approaches.
- If deep-learning models exhibit systematically different error patterns, the framework might need supplementary tests focused on stability over multi-year runs.
- Widespread adoption could allow these models to contribute to ensemble climate projections alongside established simulations.
Load-bearing premise
Diagnostics originally created for traditional physics-based models remain suitable even when applied to deep-learning models that may have different error structures and stability properties.
What would settle it
A deep-learning model that scores well on the framework's metrics but produces unstable or unrealistic behavior in extended free-running simulations would indicate the diagnostics are insufficient.
Figures
read the original abstract
In recent years, Deep-Learning Earth System Models (DL-ESMs) have emerged as promising, computationally efficient complements to traditional Earth system models. Here, we present an evaluation framework for testing DL-ESMs from a climate-model-development perspective using standardized diagnostics from the PCMDI Metrics Package (PMP). This framework allows DL-ESMs, including Ai2's ACE2 and Google's NeuralGCM, to be assessed with metrics that quantify their ability to reproduce climatology, major modes of variability, monsoon behavior, and precipitation variability relative to observational reference datasets and CMIP-class benchmarks. By evaluating DL-ESMs with tools commonly used for traditional models, we extend their assessment beyond short-range forecast skill and toward climate-relevant applications. The results identify encouraging strengths in several large-scale fields and modes of variability, while also highlighting persistent challenges in precipitation, tropical variability, and long-run stability for some model versions. This evaluation is a critical step toward building trust in DL-ESMs, guiding future model development, and clarifying their fit-for-purpose for Earth system science applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a PMP-inspired evaluation framework for deep-learning Earth system models (DL-ESMs), applying standardized PCMDI Metrics Package diagnostics to assess climatology, major modes of variability (e.g., ENSO, MJO), monsoon behavior, and precipitation variability. It evaluates models including Ai2's ACE2 and Google's NeuralGCM against observational references and CMIP benchmarks, reporting strengths in large-scale fields alongside challenges in precipitation, tropical variability, and long-run stability for certain versions. The central claim is that these established diagnostics extend DL-ESM assessment beyond short-range forecasts to climate-relevant applications.
Significance. If the framework holds, it offers a practical bridge between DL model development and traditional climate-model evaluation practices, enabling consistent benchmarking that could guide iterative improvements and clarify suitability for Earth system applications. The use of external observational datasets and CMIP benchmarks avoids circularity and provides reproducible, community-standard metrics.
major comments (2)
- [§4] §4 (Results on long-run stability): The abstract and results note persistent challenges in long-run stability for some DL-ESM versions, yet the manuscript does not demonstrate that the selected PMP diagnostics (e.g., ENSO pattern correlation or monsoon onset metrics) actually detect or quantify accumulating non-physical drift versus merely documenting it post hoc. A concrete test—such as comparing PMP scores on short vs. multi-year integrations for a drifting model version—would strengthen the claim that PMP is sufficient for DL-specific failure modes.
- [§3.2] §3.2 (Metric selection and data): The choice of PMP diagnostics is justified by their use in traditional ESMs, but the paper should explicitly address whether DL-ESM error structures (e.g., spectral artifacts or mode collapse) require supplementary diagnostics beyond the current set; without this, the sufficiency claim for climate-relevant assessment rests on an untested assumption of comparable error structures.
minor comments (2)
- [Figure 2] Figure 2 caption: Clarify the exact observational reference dataset and CMIP ensemble version used for each panel to improve reproducibility.
- [§2.1] §2.1: The description of NeuralGCM and ACE2 model versions lacks explicit citation to the original model papers or release versions; add these for traceability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help clarify how to better demonstrate the value of the PMP-based framework for DL-ESMs. We respond to each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [§4] §4 (Results on long-run stability): The abstract and results note persistent challenges in long-run stability for some DL-ESM versions, yet the manuscript does not demonstrate that the selected PMP diagnostics (e.g., ENSO pattern correlation or monsoon onset metrics) actually detect or quantify accumulating non-physical drift versus merely documenting it post hoc. A concrete test—such as comparing PMP scores on short vs. multi-year integrations for a drifting model version—would strengthen the claim that PMP is sufficient for DL-specific failure modes.
Authors: We agree that an explicit demonstration of the diagnostics' sensitivity to drift would strengthen the paper. The current results apply PMP metrics to long integrations and report stability issues for certain versions, but we did not include a controlled short-versus-long comparison. In the revised manuscript we will add such an analysis for one drifting model version, comparing PMP scores (including ENSO and monsoon metrics) between short-range and multi-year runs to show how the diagnostics capture accumulating non-physical drift. revision: yes
-
Referee: [§3.2] §3.2 (Metric selection and data): The choice of PMP diagnostics is justified by their use in traditional ESMs, but the paper should explicitly address whether DL-ESM error structures (e.g., spectral artifacts or mode collapse) require supplementary diagnostics beyond the current set; without this, the sufficiency claim for climate-relevant assessment rests on an untested assumption of comparable error structures.
Authors: We acknowledge that DL-ESMs can exhibit error structures (such as spectral artifacts) that differ from those of traditional ESMs. The PMP diagnostics were selected for their established relevance to climate processes rather than an assumption of identical error statistics. Our results already show that these metrics successfully flag important deficiencies in precipitation and tropical variability. In the revision we will expand the discussion in §3.2 to explicitly consider potential DL-specific errors, explain the rationale for the current metric set as a standardized bridge to the climate-modeling community, and note that supplementary diagnostics may be added in future extensions of the framework. revision: partial
Circularity Check
No significant circularity: framework applies external PMP metrics to DL-ESMs
full rationale
The paper introduces an evaluation framework that applies the pre-existing PCMDI Metrics Package (PMP) diagnostics—originally developed for physics-based ESMs—to DL-ESMs such as ACE2 and NeuralGCM. All metrics (climatology, modes of variability, monsoon behavior, precipitation) are computed against independent observational reference datasets and CMIP-class benchmarks, with no parameters fitted inside the paper and no predictions derived from internal fits. The central claim is an application of external standardized tools rather than a derivation that reduces to self-definition, fitted inputs, or self-citation chains. Because the assessment is benchmarked entirely outside the paper's own fitted values or assumptions, the evaluation chain remains self-contained and non-circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption PMP diagnostics designed for traditional ESMs are suitable for assessing DL-ESMs
Lean theorems connected to this paper
-
IndisputableMonolith/Costwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
This framework allows DL-ESMs... to be assessed with metrics that quantify their ability to reproduce climatology, major modes of variability, monsoon behavior, and precipitation variability relative to observational reference datasets and CMIP-class benchmarks.
-
IndisputableMonolith/Foundation/AlexanderDualityalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The PMP framework computes hundreds of summary statistics including mean state metrics, variability metrics, and process-oriented diagnostics.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Adler, R. F., Sapiano, M. R., Huffman, G. J., Wang, J. J., Gu, G., Bolvin, D., Chiu, L., Schneider, U., Becker, A., Nelkin, E., Xie, P., Ferraro, R., and Shin, D.-B.: The Global Precipitation Climatology Project (GPCP) monthly analysis (new version 2.3) and a review of 2017 global precipitation, Atmosphere, 9, 138, https://doi.org/10.3390/atmos9040138,
-
[2]
Ahn, M.-S., Kim, D. H., Sperber, K. R., Kang, I.-S., Maloney, E. D., Waliser, D. E., and Hendon, H. H.: MJO simulation in CMIP5 climate models: MJO skill metrics and process-oriented diagnosis, Climate Dynamics, 49, 4023–4045, https://doi.org/10.1007/s00382-017-3558- 4,
-
[3]
Ahn, M.-S., Gleckler, P. J., Lee, J., Pendergrass, A. G., and Jakob, C.: Benchmarking Simulated Precipitation Variability Amplitude across Time Scales, Journal of Climate, 35, 3173–3196, https://doi.org/10.1175/JCLI-D-21-0542.1,
-
[4]
Ai2: ACE2-ERA5 (Revision a4ca6cc), https://doi.org/10.57967/hf/5377,
-
[5]
Back, S.-Y ., Kim, D., and Son, S.-W.: MJO Diversity in CMIP6 Models, Journal of Climate, 37, 4835 – 4850, https://doi.org/10.1175/JCLI- D-23-0656.1,
-
[6]
A., Hassanzadeh, P., Rucker, K., and Shaw, T
Baxter, I., Pahlavan, H. A., Hassanzadeh, P., Rucker, K., and Shaw, T. A.: Benchmarking Atmospheric Circulation Vari- ability in an AI Emulator, ACE2, and a Hybrid Model, NeuralGCM, Geophysical Research Letters, 53, e2025GL119 877, https://doi.org/https://doi.org/10.1029/2025GL119877, e2025GL119877 2025GL119877,
-
[7]
Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., and Tian, Q.: Accurate medium-range global weather forecasting with 3D neural networks, Nature, 619, 533–538, https://doi.org/10.1038/s41586-023-06185-3,
-
[8]
Bodnar, C., Bruinsma, W. P., Lucic, A., Stanley, M., Allen, A., Brandstetter, J., Garvan, P., Riechert, M., Weyn, J. A., Dong, H., Gupta, J. K., Thambiratnam, K., Archibald, A. T., Wu, C.-C., Heider, E., Welling, M., Turner, R. E., and Perdikaris, P.: A foundation model for the Earth system, Nature, 641, 1180–1187, https://doi.org/10.1038/s41586-025-09005-y,
-
[9]
D., Ge, T., Subramaniam, A., Manshausen, P., Gupta, A., Hall, D
Brenowitz, N. D., Ge, T., Subramaniam, A., Manshausen, P., Gupta, A., Hall, D. M., Mardani, M., Vahdat, A., Kashinath, K., and Pritchard, M. S.: Climate in a Bottle: Towards a Generative Foundation Model for the Kilometer-Scale Global Atmosphere, arXiv preprint, https: //arxiv.org/abs/2505.06474,
-
[10]
Bretherton, C., Watt-Meyer, O., Henn, B., and Koldunov, N.: AIMIP Phase 1 Specification, https://github.com/ai2cm/AIMIP, version 1.2.3, accessed 19 February 2026,
work page 2026
-
[11]
D., Popescu, O.-I., Pellicer- Valero, O
Camps-Valls, G., Fernández-Torres, M.-Á., Cohrs, K.-H., Höhl, A., Castelletti, A., Pacal, A., Robin, C., Martinuzzi, F., Papoutsis, I., Prapas, I., Pérez-Aracil, J., Weigel, K., Gonzalez-Calabuig, M., Reichstein, M., Rabel, M., Giuliani, M., Mahecha, M. D., Popescu, O.-I., Pellicer- Valero, O. J., Ouala, S., Salcedo-Sanz, S., Sippel, S., Kondylatos, S., H...
-
[12]
Chien, M.-T., Barnes, E. A., and Maloney, E. D.: Modulation of tropical cyclogenesis on subseasonal-to-interannual timescales in the deep- learning climate emulator ACE2, Machine Learning: Earth, 1, 015 008, https://doi.org/10.1088/3049-4753/adfd61,
-
[13]
K., Watt-Meyer, O., Kwa, A., McGibbon, J., Henn, B., Perkins, W
Clark, S. K., Watt-Meyer, O., Kwa, A., McGibbon, J., Henn, B., Perkins, W. A., Wu, E., Harris, L. M., and Bretherton, C. S.: ACE2- SOM: Coupling an ML Atmospheric Emulator to a Slab Ocean and Learning the Sensitivity of Climate to Changed CO2, Journal of Geophysical Research: Machine Learning and Computation, 2, e2024JH000 575, https://doi.org/https://doi...
-
[14]
26 Coats, S., Smerdon, J. E., Stevenson, S., Fasullo, J. T., Otto-Bliesner, B., and Ault, T. R.: Paleoclimate Constraints on the Spatiotemporal Character of Past and Future Droughts, Journal of Climate, 33, 9883 – 9903, https://doi.org/10.1175/JCLI-D-20-0004.1,
-
[15]
Cresswell-Clay, N., Liu, B., Durran, D. R., Liu, Z., Espinosa, Z. I., Moreno, R. A., and Karlbauer, M.: A Deep Learning Earth System Model for Efficient Simulation of the Observed Climate, AGU Advances, 6, e2025A V001 706, https://doi.org/https://doi.org/10.1029/2025A V001706, e2025A V001706 2025A V001706,
-
[16]
J.: PCMDI/cmor: CMOR version 3.2.2, https://cmor.llnl.gov/, software release, March 2017,
Doutriaux, C., Nadeau, D., Bradshaw, T., Kettleborough, J., Weigel, T., Hogan, E., and Durack, P. J.: PCMDI/cmor: CMOR version 3.2.2, https://cmor.llnl.gov/, software release, March 2017,
work page 2017
-
[17]
Duan, S., Zhang, J., Bonfils, C., and Pallotta, G.: Testing NeuralGCM’s capability to simulate future heatwaves based on the 2021 Pacific Northwest heatwave event, npj Climate and Atmospheric Science, 8, 251, https://doi.org/10.1038/s41612-025-01137-2,
-
[18]
Eyring, V ., Collins, W. D., Gentine, P., Barnes, E. A., Barreiro, M., Beucler, T., Bocquet, M., Bretherton, C. S., Christensen, H. M., Dagon, K., Gagne, D. J., Hall, D., Hammerling, D., Hoyer, S., Iglesias-Suarez, F., Lopez-Gomez, I., McGraw, M. C., Meehl, G. A., Molina, M. J., Monteleoni, C., Mueller, J., Pritchard, M. S., Rolnick, D., Runge, J., Stier,...
-
[19]
Gleckler, P. J., Taylor, K. E., and Doutriaux, C.: Performance metrics for climate models, Journal of Geophysical Research: Atmospheres, 113, https://doi.org/10.1029/2007JD008972,
-
[20]
Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz-Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Sim- mons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., De Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, M....
-
[21]
Jiang, X., Maloney, E., and Su, H.: Large-scale controls of propagation of the Madden-Julian Oscillation, npj Climate and Atmospheric Science, 3, 29, https://doi.org/10.1038/s41612-020-00134-x,
-
[22]
Kageyama, M., Braconnot, P., Chiessi, C. M., Rehfeld, K., Ait Brahim, Y ., Dütsch, M., Gwinneth, B., Hou, A., Loutre, M.-F., Hen- drizan, M., Meissner, K., Mongwe, P., Otto-Bliesner, B., Pezzi, L. P., Rovere, A., Seltzer, A., Sime, L., and Zhu, J.: Lessons from paleoclimates for recent and future climate change: opportunities and insights, Frontiers in Cl...
-
[23]
27 Kent, C., Scaife, A. A., Dunstone, N. J., Smith, D., Hardiman, S. C., Dunstan, T., and Watt-Meyer, O.: Skilful global seasonal predictions from a machine learning weather model trained on reanalysis data, npj Climate and Atmospheric Science, 8, 314, https://doi.org/10.1038/s41612-025-01198-3,
-
[24]
Kochkov, D., Yuval, J., Langmore, I., Norgaard, P., Smith, J., Mooers, G., Klöwer, M., Lottes, J., Rasp, S., Düben, P., Hatfield, S., Battaglia, P., Sanchez-Gonzalez, A., Willson, M., Brenner, M. P., and Hoyer, S.: Neural general circulation models for weather and climate, Nature, 632, 1060–1066, https://doi.org/10.1038/s41586-024-07744-y,
-
[25]
Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu, W., Merose, A., Hoyer, S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S., and Battaglia, P.: GraphCast: Learning skillful medium- range global weather forecasting, https://arxiv.org/abs/2212.12794,
-
[26]
Lee, J., Sperber, K. R., Gleckler, P. J., Bonfils, C., and Taylor, K. E.: Quantifying the agreement between observed and simulated extratropical modes of interannual variability, Climate Dynamics, 52, 4057–4089, https://doi.org/10.1007/s00382-018-4355-4,
-
[27]
J., Ahn, M.-S., Ordonez, A., Ullrich, P
Lee, J., Gleckler, P. J., Ahn, M.-S., Ordonez, A., Ullrich, P. A., Sperber, K. R., Taylor, K. E., Planton, Y . Y ., Guilyardi, E., Durack, P., Bonfils, C., Zelinka, M. D., Chao, L.-W., Dong, B., Doutriaux, C., Zhang, C., V o, T., Boutte, J., Wehner, M. F., Pendergrass, A. G., Kim, D., Xue, Z., Wittenberg, A. T., and Krasting, J.: Systematic and objective ...
-
[28]
Meng, Z., Hakim, G. J., Yang, W., and Vecchi, G. A.: Deep Learning Atmospheric Models Reliably Simulate Out-of-Sample Land Heat and Cold Wave Frequencies, Geophysical Research Letters, 53, e2025GL117 990, https://doi.org/https://doi.org/10.1029/2025GL117990, e2025GL117990 2025GL117990,
-
[29]
Nikumbh, A. C., Lin, P., Paynter, D., and Ming, Y .: Does Increasing Horizontal Resolution Improve the Simulation of Intense Tropical Rain- fall in GFDL’s AM4 Model?, Geophysical Research Letters, 51, e2023GL106 708, https://doi.org/https://doi.org/10.1029/2023GL106708, e2023GL106708 2023GL106708,
-
[30]
Pathak, J., Subramanian, S., Harrington, P., Raja, S., Chattopadhyay, A., Mardani, M., Kurth, T., Hall, D., Li, Z., Azizzadenesheli, K., Hassanzadeh, P., Kashinath, K., and Anandkumar, A.: FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators, https://arxiv.org/abs/2202.11214,
work page internal anchor Pith review Pith/arXiv arXiv
-
[31]
Peings, Y ., Dong, C., Mahesh, A., Pritchard, M., Collins, W., and Magnusdottir, G.: Subseasonal Forecasting and MJO Telecon- nections in Machine Learning Weather Prediction Models, Journal of Geophysical Research: Atmospheres, 131, e2025JD044 910, https://doi.org/https://doi.org/10.1029/2025JD044910, e2025JD044910 2025JD044910,
-
[32]
Pithan, F., Athanase, M., Dahlke, S., Sánchez-Benítez, A., Shupe, M. D., Sledd, A., Streffing, J., Svensson, G., and Jung, T.: Nudging allows direct evaluation of coupled climate models with in situ observations: a case study from the MOSAiC expedition, Geoscientific Model Development, 16, 1857–1873, https://doi.org/10.5194/gmd-16-1857-2023,
-
[33]
Price, I., Sanchez-Gonzalez, A., Alet, F., Andersson, T. R., El-Kadi, A., Masters, D., Ewalds, T., Stott, J., Mohamed, S., Battaglia, P., Lam, R., and Willson, M.: Probabilistic weather forecasting with machine learning, Nature, 637, 84–90, https://doi.org/10.1038/s41586-024- 08252-9,
-
[34]
Rasp, S., Hoyer, S., Merose, A., Langmore, I., Battaglia, P., Russell, T., Sanchez-Gonzalez, A., Yang, V ., Carver, R., Agrawal, S., Chantry, M., Ben Bouallegue, Z., Dueben, P., Bromberg, C., Sisk, J., Barrington, L., Bell, A., and Sha, F.: WeatherBench 2: A Benchmark for the Next Generation of Data-Driven Global Weather Models, Journal of Advances in Mod...
-
[35]
28 Rucker, K., Baxter, I., Hassanzadeh, P., Shaw, T. A., and Pahlavan, H. A.: Benchmarking Regional Thermodynamic Trends in an AI emulator, ACE2, and a hybrid model, NeuralGCM, https://arxiv.org/abs/2511.00274,
-
[36]
Shepherd, T. G., Boyd, E., Calel, R., Chapman, S. C., Dessai, S., Dima-West, I., Fowler, H. J., James, R., Maraun, D., Martius, O., Senior, C. A., Sobel, A. H., and Stainforth, D. A.: Storylines: an alternative approach to representing uncertainty in physical aspects of climate change, Climatic Change, 151, 555–571, https://link.springer.com/article/10.10...
- [37]
-
[38]
Sperber, K. R.: Madden–Julian variability in NCAR CAM2.0 and CCSM2.0, Climate Dynamics, 23, 259–278, https://doi.org/10.1007/s00382-004-0447-4,
-
[39]
Stephens, G. L., L’Ecuyer, T., Forbes, R., Gettlemen, A., Golaz, J.-C., Bodas-Salcedo, A., Suzuki, K., Gabriel, P., and Haynes, J.: The dreary state of precipitation in global models, Journal of Geophysical Research: Atmospheres, 115, https://doi.org/10.1029/2010JD014532,
-
[40]
Taylor, K. E.: Summarizing multiple aspects of model performance in a single diagram, Journal of Geophysical Research, 106, 7183–7192, https://doi.org/10.1029/2000JD900719,
-
[41]
Ullrich, P. A., Barnes, E. A., Collins, W. D., Dagon, K., Duan, S., Elms, J., Lee, J., Leung, L. R., Lu, D., Molina, M. J., O’Brien, T. A., and Rebassoo, F. O.: Recommendations for Comprehensive and Independent Evaluation of Machine Learning-Based Earth System Mod- els, Journal of Geophysical Research: Machine Learning and Computation, 2, e2024JH000 496, ...
-
[42]
Waliser, D., Gleckler, P. J., Ferraro, R., Taylor, K. E., Ames, S., Biard, J., Bosilovich, M. G., Brown, O., Chepfer, H., Cinquini, L., Durack, P. J., Eyring, V ., Mathieu, P.-P., Lee, T., Pinnock, S., Potter, G. L., Rixen, M., Saunders, R., Schulz, J., Thépaut, J.-N., and Tuma, M.: Observations for Model Intercomparison Project (Obs4MIPs): status for CMI...
-
[43]
Wang, B., Kim, H.-J., Kikuchi, K., and Kitoh, A.: Diagnostic metrics for evaluation of annual and diurnal cycles, Climate Dynamics, 37, 941–955, https://doi.org/10.1007/s00382-010-0877-0,
-
[44]
Watson-Parris, D., Rao, Y ., Olivié, D., Seland, Ø., Nowack, P., Camps-Valls, G., Stier, P., Bouabid, S., Dewey, M., Fons, E., Gonzalez, J., Harder, P., Jeggle, K., Lenhardt, J., Manshausen, P., Novitasari, M., Ricard, L., and Roesch, C.: ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections, Journal of Advances in Modeling Earth Systems, 14,...
-
[45]
ClimateBench v1.0: A bench- mark for data-driven climate projections
Watt-Meyer, O., Dresdner, G., McGibbon, J., Clark, S. K., Henn, B., Duncan, J., Brenowitz, N. D., Kashinath, K., Pritchard, M. S., Bonev, B., Peters, M. E., and Bretherton, C. S.: ACE: A fast, skillful learned global atmospheric model for climate prediction, https://arxiv.org/ abs/2310.02074,
-
[46]
Xiang, B., Zhao, M., Held, I. M., and Golaz, J.-C.: Predicting the severity of spurious “double ITCZ” problem in CMIP5 coupled models from AMIP simulations, Geophysical Research Letters, 44, 1520–1527, https://doi.org/https://doi.org/10.1002/2016GL071992,
-
[47]
29 Xie, P., Joyce, R., Wu, S., Yoo, S. H., Yarosh, Y ., Sun, F., and Lin, R.: Reprocessed, bias-corrected CMORPH global high-resolution precip- itation estimates from 1998, Journal of Hydrometeorology, 18, 1617–1641,
work page 1998
-
[48]
Yuval, J., Langmore, I., Kochkov, D., and Hoyer, S.: Neural general circulation models for modeling precipitation, Science Advances, 12, eadv6891, https://doi.org/10.1126/sciadv.adv6891,
-
[49]
arXiv preprint arXiv:2510.02415 , year=
Zhang, B. and Merlis, T. M.: The Equilibrium Response of Atmospheric Machine-Learning Models to Uniform Sea Surface Temperature Warming, https://arxiv.org/abs/2510.02415,
-
[50]
Zhang, G., Rao, M., Yuval, J., and Zhao, M.: Advancing seasonal prediction of tropical cyclone activity with a hybrid AI-physics climate model, Environmental Research Letters, 20, 094 031, https://doi.org/10.1088/1748-9326/adf864, 2025a. Zhang, Q., Cheng, S., Liu, L., Zhang, L., Xu, J., She, D., and Yuan, Z.: Projections of climate change and its impacts ...
-
[51]
and (Lee et al., 2024). The climatology metric is computed with respect to the observed climatological fields provided by the reference dataset product corresponding to each examined variable as reported in Table
work page 2024
-
[52]
and (b) ACE2, (c) NeuralGCM-evap and (d) NeuralGCM-precip. The EWR is defined as the ratio of eastward power (as the average power in the dashed box on the right) to westward power (as the average power in the dashed box on the left) from the 2-dimensional wavenumber–frequency power spectra of daily 10°N-–10°S averaged precipitation in May to October (sha...
work page 2001
-
[53]
The monsoon metrics obtained from observation datasets (GPCP v1.3 and CMORPH v1.0; Joyce et al
45 Figure S23.Comparing the precipitation pentads between model and observations in NeuralGCM-evap. The monsoon metrics obtained from observation datasets (GPCP v1.3 and CMORPH v1.0; Joyce et al. (2004); Xie et al. (2017) and Historical simulation conducted via (a) ACE2 and (b) NeuralGCM-precip. For each model, we analyzed results for six monsoon regions:...
work page 2004
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.