Score-based generative emulation of impact-relevant Earth system model outputs
Pith reviewed 2026-05-18 10:11 UTC · model grok-4.3
The pith
Score-based diffusion on spherical meshes generates monthly climate fields whose distributions match Earth system model outputs with errors small relative to internal variability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A score-based diffusion model operating on a spherical mesh produces joint distributions of impact-relevant monthly climate variables that closely match the outputs of three different Earth system models and reproduce their key forced responses; the emulator's inaccuracies remain small compared with the magnitude of internal variability in the original projections.
What carries the argument
score-based diffusion process on a spherical mesh that learns and samples from the joint distribution of multiple climate fields
If this is right
- Emulators of this type can supply inputs to impact models without waiting for new full ESM runs or new CMIP cycles.
- The approach allows rapid exploration of alternative policy futures while remaining anchored to ESM behavior.
- Diagnostics for densities, correlations, and tails confirm the emulator preserves the statistics most relevant to impact studies.
- The method points toward further development at daily resolution and finer spatial scales.
Where Pith is reading between the lines
- Large ensembles of impact scenarios could be generated at low cost for uncertainty quantification.
- The same spherical diffusion architecture might transfer to other gridded geospatial emulation tasks.
- Bias-aware training or regime-shift handling could reduce the remaining failure cases for variables with strong seasonal changes.
Load-bearing premise
A diffusion model trained on monthly fields from only three Earth system models will generalize to other models, variables, or time periods without large biases caused by the spherical mesh or the choice of training diagnostics.
What would settle it
Generate fields with the emulator for an ESM withheld from training and check whether the distribution of a key impact variable such as extreme precipitation differs from the true ESM output by more than the spread attributable to internal variability.
Figures
read the original abstract
Policy targets evolve faster than the Coupled Model Intercomparison Project cycles, complicating adaptation and mitigation planning that must often contend with outdated projections. Climate model output emulators address this gap by offering inexpensive surrogates that can rapidly explore alternative futures while staying close to Earth System Model (ESM) behavior. The focus is on emulators designed to provide inputs to impact models. Using monthly ESM fields of near-surface temperature, precipitation, relative humidity, and wind speed, it is shown that deep generative models have the potential to model the joint distribution of variables relevant for impacts. The specific model proposed uses score-based diffusion on a spherical mesh and runs on a single mid-range graphical processing unit. A thorough suite of diagnostics is introduced to compare emulator outputs with their parent ESMs, including their probability densities, cross-variable correlations, time of emergence, or tail behavior. The emulator performance is evaluated across three distinct ESMs in both pre-industrial and forced regimes. The results show that the emulator produces distributions that closely match the ESM outputs and captures key forced responses. They also reveal important failure cases, notably for variables with a strong regime shift in the seasonal cycle. Although not a perfect match to the ESM, the inaccuracies of the emulator are small relative to the magnitude of internal variability in ESM projections. This suggests that the generative emulators can be useful in supporting impact assessment. Priorities for future development toward daily resolution, finer spatial scales, and bias-aware training are discussed. Code is made available at https://github.com/shahineb/climemu.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a score-based diffusion model operating on a spherical mesh to emulate monthly near-surface temperature, precipitation, relative humidity, and wind speed fields from Earth System Models (ESMs). Trained on outputs from three ESMs, the emulator is evaluated in both pre-industrial and forced regimes using diagnostics for probability densities, cross-variable correlations, time of emergence, and tail behavior. The central claim is that generated samples closely match ESM distributions and capture key forced responses, with inaccuracies small relative to internal variability despite noted failure cases for variables exhibiting strong seasonal regime shifts; code is released publicly.
Significance. If the central claim holds, the work offers a computationally efficient surrogate for generating impact-relevant climate fields, potentially accelerating exploration of policy scenarios beyond CMIP cycles. Strengths include the multi-ESM evaluation, public code release at https://github.com/shahineb/climemu, and introduction of a diagnostic suite that directly targets impact-model inputs. These elements support reproducibility and practical utility if the error-variability comparison is quantified.
major comments (1)
- [Abstract] Abstract: the claim that 'the inaccuracies of the emulator are small relative to the magnitude of internal variability in ESM projections' is load-bearing for the utility conclusion but lacks quantitative support for the acknowledged failure cases (variables with strong regime shifts in the seasonal cycle). No ratio of emulator bias to ESM standard deviation (or equivalent metric) is reported for precipitation, wind speed, or seasonal transitions in the tails or regime-shift periods, leaving the central claim unsubstantiated for impact assessment.
minor comments (2)
- [Methods] Methods section: full details on the diffusion noise schedule, network architecture, and spherical mesh discretization are referenced but not fully specified in the provided text, hindering independent reproduction.
- [Results] Results: error bars or uncertainty estimates on the reported diagnostics (densities, correlations, time of emergence) are absent, making it difficult to assess whether differences from ESMs are statistically significant.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. The feedback highlights an important opportunity to strengthen the quantitative support for our central claim. We address the major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'the inaccuracies of the emulator are small relative to the magnitude of internal variability in ESM projections' is load-bearing for the utility conclusion but lacks quantitative support for the acknowledged failure cases (variables with strong regime shifts in the seasonal cycle). No ratio of emulator bias to ESM standard deviation (or equivalent metric) is reported for precipitation, wind speed, or seasonal transitions in the tails or regime-shift periods, leaving the central claim unsubstantiated for impact assessment.
Authors: We agree that the abstract claim would benefit from more explicit quantification, particularly for the failure cases involving strong seasonal regime shifts. In the revised manuscript we will add a new supplementary table reporting the ratio of emulator error (using mean absolute deviation in distribution statistics and tail quantiles) to ESM internal variability (standard deviation across years or ensemble members) for all variables, including precipitation and wind speed, with separate entries for seasonal transition months. This will directly address the concern and allow readers to evaluate the claim for impact assessment. While the main text already shows several diagnostics where discrepancies remain smaller than variability, we acknowledge the need for this focused metric in the abstract context. revision: yes
Circularity Check
No significant circularity; evaluation is empirical distributional comparison
full rationale
The paper trains a score-based diffusion model on monthly ESM fields from three models and evaluates generated samples via direct statistical comparisons (densities, correlations, time of emergence, tails) to the parent ESM outputs. These metrics are independent empirical checks rather than quantities derived by construction from fitted parameters or self-citations. No load-bearing step reduces to a self-definition, fitted-input prediction, or uniqueness theorem imported from the authors' prior work. The framework is self-contained against external benchmarks (the ESM data themselves), with acknowledged failure cases treated as empirical observations rather than hidden tautologies.
Axiom & Free-Parameter Ledger
free parameters (1)
- diffusion noise schedule and network hyperparameters
axioms (1)
- domain assumption The joint distribution of monthly climate fields can be learned by score-based diffusion on a spherical mesh.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
score-based diffusion on a spherical mesh... pattern scaling step maps GMST onto regional mean temperature anomalies
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
EMD-to-noise ratio... small relative to the magnitude of internal variability
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Towards accurate extreme event likelihoods from diffusion model climate emulators
Diffusion model climate emulators provide probability density estimates that allow likelihood calculations and odds-ratio-based importance sampling for extreme events such as tropical cyclones.
Reference graph
Works this paper leans on
-
[1]
2 Lea Beusch, Lukas Gudmundsson, and Sonia I Seneviratne. Emulating Earth system model temperatures with MESMER: from global mean temperature trajectories to grid-point-level realizations on land.Earth System Dynamics, 2020. 2, 5 S. Nath, Q. Lejeune, L. Beusch, S. I. Seneviratne, and C.-F. Schleussner. MESMER-M: an Earth system model emulator for spatiall...
work page 2020
-
[2]
2 Shruti Nath, Julie Carreau, Kai Kornhuber, Peter Pfleiderer, Carl-Friedrich Schleussner, and Philippe Naveau. MERCURY: A fast and versatile multi-resolution based global emulator of compound climate hazards.arXiv preprint arXiv:2501.04018, 2024. 2, 4 Sarah Sch¨ ongart, Lukas Gudmundsson, Mathias Hauser, Peter Pfleiderer, Quentin Lejeune, Shruti Nath, So...
-
[3]
3 Keywan Riahi, Detlef P Van Vuuren, Elmar Kriegler, Jae Edmonds, Brian C O’neill, Shinichiro Fujimori, Nico Bauer, Katherine Calvin, Rob Dellink, Oliver Fricko, et al. The Shared Socioeconomic Pathways and their energy, land use, and greenhouse gas emissions implications: An overview.Global environmental change, 42:153–168, 2017. 3 Veronika Eyring, Sandr...
-
[4]
4 N. J. Leach, S. Jenkins, Z. Nicholls, C. J. Smith, J. Lynch, M. Cain, T. Walsh, B. Wu, J. Tsutsui, and M. R. Allen. FaIRv2.0.0: a generalized impulse response model for climate uncertainty and future scenario exploration.Geoscientific Model Development, 2021. 4 Sonia Seneviratne, Yann Quilcaille, Michael Windisch, Lukas Gudmundsson, Bianca Biess, Felix ...
work page 2021
-
[5]
Performance of pattern-scaled climate projections under high-end warming
4 25 Timothy J Osborn, Craig J Wallace, Jason A Lowe, and Dan Bernie. Performance of pattern-scaled climate projections under high-end warming. Part I: surface air temperature over land.Journal of Climate, 31 (14):5667–5680, 2018. 5 Paolo Giani, Raffaele Ferrari, and Noelle Eckley Selin. Origin and limits of invariant warming patterns in climate models.in...
work page 2018
-
[6]
5 June-Yi Lee, Jochem Marotzke, Govindasamy Bala, Long Cao, Susanna Corti, John P Dunne, Francois Engelbrecht, Erich Fischer, John C Fyfe, Christopher Jones, et al. Future global climate: scenario-based projections and near-term information.Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of t...
work page 2021
-
[7]
5, 15 Duncan Watson-Parris, Yuhan Rao, Dirk Olivi´ e, Øyvind Seland, Peer Nowack, Gustau Camps-Valls, Philip Stier, Shahine Bouabid, Maura Dewey, Emilie Fons, et al. ClimateBench v1. 0: A Benchmark for Data- Driven Climate Projections.Journal of Advances in Modeling Earth Systems, 14(10):e2021MS002954,
-
[8]
5 Timothy J Osborn, Craig J Wallace, Ian C Harris, and Thomas M Melvin. Pattern scaling using ClimGen: monthly-resolution future climate scenarios including changes in the variability of precipitation.Climatic Change, 134:353–369, 2016. 5 C Tebaldi, A Armbruster, HP Engler, and R Link. Emulating climate extreme indices.Environmental Research Letters, 15(7...
-
[9]
14 27 Yue Dong, Andrew G Pauling, Shaina Sadai, and Kyle C Armour. Antarctic ice-sheet meltwater reduces transient warming and climate sensitivity through the sea-surface temperature pattern effect.Geophysical Research Letters, 49(24):e2022GL101249, 2022. 14 Kevin E Trenberth. Changes in precipitation with climate change.Climate research, 47(1-2):123–138,...
-
[10]
Transferring climate change physical knowledge
19, 21 Francesco Immorlano, Veronika Eyring, Thomas le Monnier de Gouville, Gabriele Accarino, Donatello Elia, Stephan Mandt, Giovanni Aloisio, and Pierre Gentine. Transferring climate change physical knowledge. Proceedings of the National Academy of Sciences, 122(15):e2413503122, 2025. 20 Mengze Wang, Benedikt Barthel Sorensen, and Themistoklis Sapsis. G...
-
[11]
EnScale: Temporally-consistent multivariate generative downscaling via proper scoring rules
20 Karyn Tabor and John W Williams. Globally downscaled climate projections for assessing the conservation impacts of climate change.Ecological Applications, 20(2):554–565, 2010. 20 Colin R Mahony, Tongli Wang, Andreas Hamann, and Alex J Cannon. A global climate model ensemble for downscaled monthly climate normals over North America.International Journal...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.5334/jors.148 2010
-
[12]
22 Met Office.Cartopy: a cartographic python library with a matplotlib interface. Exeter, Devon, 2010 - 2015. URLhttp://scitools.org.uk/cartopy. 22 Michael L. Waskom. seaborn: statistical data visualization.Journal of Open Source Software, 6(60):3021,
work page 2010
-
[13]
L., 2021, @doi [Journal of Open Source Software] 10.21105/joss.03021 , 6, 3021
doi: 10.21105/joss.03021. URLhttps://doi.org/10.21105/joss.03021. 22 Patrick Kidger and Cristian Garcia. Equinox: neural networks in JAX via callable PyTrees and filtered transformations.Differentiable Programming workshop at Neural Information Processing Systems 2021,
-
[14]
The DeepMind JAX Ecosystem, 2020
22 DeepMind, Igor Babuschkin, Kate Baumli, Alison Bell, Surya Bhupatiraju, Jake Bruce, Peter Buchlovsky, David Budden, Trevor Cai, Aidan Clark, Ivo Danihelka, Antoine Dedieu, Claudio Fantacci, Jonathan Godwin, Chris Jones, Ross Hemsley, Tom Hennigan, Matteo Hessel, Shaobo Hou, Steven Kapturowski, Thomas Keck, Iurii Kemaev, Michael King, Markus Kunesch, Le...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.