pith. sign in

arxiv: 2510.04358 · v2 · submitted 2025-10-05 · ⚛️ physics.ao-ph · stat.AP· stat.ML

Score-based generative emulation of impact-relevant Earth system model outputs

Pith reviewed 2026-05-18 10:11 UTC · model grok-4.3

classification ⚛️ physics.ao-ph stat.APstat.ML
keywords climate model emulationscore-based diffusionEarth system modelsgenerative modelsimpact assessmentspherical meshmonthly climate fields
0
0 comments X p. Extension

The pith

Score-based diffusion on spherical meshes generates monthly climate fields whose distributions match Earth system model outputs with errors small relative to internal variability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether deep generative models can serve as fast, inexpensive surrogates for Earth system models when supplying inputs to impact assessments. It trains a score-based diffusion model on monthly fields of near-surface temperature, precipitation, relative humidity, and wind speed drawn from three ESMs in both pre-industrial and forced regimes. A set of diagnostics then checks whether the emulator reproduces probability densities, cross-variable correlations, time of emergence, and tail behavior. The central result is that the generated distributions stay close to the parent models and capture the main forced responses, even though certain variables with strong seasonal regime shifts remain imperfect. Because the remaining discrepancies are smaller than the internal variability already present in ESM projections, the authors conclude the emulator can usefully support adaptation and mitigation planning.

Core claim

A score-based diffusion model operating on a spherical mesh produces joint distributions of impact-relevant monthly climate variables that closely match the outputs of three different Earth system models and reproduce their key forced responses; the emulator's inaccuracies remain small compared with the magnitude of internal variability in the original projections.

What carries the argument

score-based diffusion process on a spherical mesh that learns and samples from the joint distribution of multiple climate fields

If this is right

  • Emulators of this type can supply inputs to impact models without waiting for new full ESM runs or new CMIP cycles.
  • The approach allows rapid exploration of alternative policy futures while remaining anchored to ESM behavior.
  • Diagnostics for densities, correlations, and tails confirm the emulator preserves the statistics most relevant to impact studies.
  • The method points toward further development at daily resolution and finer spatial scales.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Large ensembles of impact scenarios could be generated at low cost for uncertainty quantification.
  • The same spherical diffusion architecture might transfer to other gridded geospatial emulation tasks.
  • Bias-aware training or regime-shift handling could reduce the remaining failure cases for variables with strong seasonal changes.

Load-bearing premise

A diffusion model trained on monthly fields from only three Earth system models will generalize to other models, variables, or time periods without large biases caused by the spherical mesh or the choice of training diagnostics.

What would settle it

Generate fields with the emulator for an ESM withheld from training and check whether the distribution of a key impact variable such as extreme precipitation differs from the true ESM output by more than the spread attributable to internal variability.

Figures

Figures reproduced from arXiv: 2510.04358 by Andre Nogueira Souza, Raffaele Ferrari, Shahine Bouabid.

Figure 1
Figure 1. Figure 1: Time series of annual GMST anomalies from the MPI-ESM2-1-LR ScenarioMIP simula [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distributions from the MPI-ESM1-2-LR piControl run and the emulator with ∆ [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of moments from the MPI-ESM1-2-LR piControl run and the emulated distribution [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Left: Boxplots of GMST anomalies for the MPI-ESM1-2-LR piControl run, (a) samples drawn [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Spatial maps of grid-cell EMD-to-noise ratios between the MPI-ESM1-2-LR piControl run and [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: (Left) 95th percentile of monthly precipitation in the MPI-ESM1-2-LR piControl run for three regions where the emulator fails to match the ESM distribution; (Center) Spatial map of the max–min ratio of the monthly 95th percentile at each grid cell, highlighting regions with contrasts between months in dry or precipitating regimes; (Right) Max EMD-to-noise ratio across seasons. 3.2 Cross-correlation across … view at source ↗
Figure 7
Figure 7. Figure 7: Spatial maps of grid-cell cross-correlations for all months between each pair of the four variables, [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Top: Examples of individual realizations from the MPI-ESM2-1-LR and emulator for the same [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: 1850-2100 evolution of the emulated mean annual anomaly for temperature (top) and relative [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Hovm¨oller diagrams of 30◦N–30◦S annual mean precipitation anomalies under SSP3-7.0 for MPI￾ESM1-2-LR (left) and the emulated large ensemble (center). The right panel shows the error normalized by the MPI-ESM1-2-LR standard deviation across ensemble members. The emulator reproduces the large-scale anomaly patterns seen in the ESM, with wetter conditions over the tropical Pacific (120◦E–120◦W) and drier co… view at source ↗
Figure 11
Figure 11. Figure 11: 1850-2100 evolution of the annual zonal mean wind speed anomalies under SSP3-7.0 for MPI [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Time of emergence of the annually averaged anomalies computed for the MPI-ESM1-2-LR [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Distributional change of the climate model output and emulated large ensemble between pre [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: EMD-to-noise ratio under SSP2-4.5 for each ESM, variable, season and AR6 region. Results [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Extreme tails of the MPI-ESM1-2-LR output distribution and the distribution from the emulated [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Relative bias in the extreme tail quantile between the MPI-ESM1-2-LR output and the emulated [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗
read the original abstract

Policy targets evolve faster than the Coupled Model Intercomparison Project cycles, complicating adaptation and mitigation planning that must often contend with outdated projections. Climate model output emulators address this gap by offering inexpensive surrogates that can rapidly explore alternative futures while staying close to Earth System Model (ESM) behavior. The focus is on emulators designed to provide inputs to impact models. Using monthly ESM fields of near-surface temperature, precipitation, relative humidity, and wind speed, it is shown that deep generative models have the potential to model the joint distribution of variables relevant for impacts. The specific model proposed uses score-based diffusion on a spherical mesh and runs on a single mid-range graphical processing unit. A thorough suite of diagnostics is introduced to compare emulator outputs with their parent ESMs, including their probability densities, cross-variable correlations, time of emergence, or tail behavior. The emulator performance is evaluated across three distinct ESMs in both pre-industrial and forced regimes. The results show that the emulator produces distributions that closely match the ESM outputs and captures key forced responses. They also reveal important failure cases, notably for variables with a strong regime shift in the seasonal cycle. Although not a perfect match to the ESM, the inaccuracies of the emulator are small relative to the magnitude of internal variability in ESM projections. This suggests that the generative emulators can be useful in supporting impact assessment. Priorities for future development toward daily resolution, finer spatial scales, and bias-aware training are discussed. Code is made available at https://github.com/shahineb/climemu.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces a score-based diffusion model operating on a spherical mesh to emulate monthly near-surface temperature, precipitation, relative humidity, and wind speed fields from Earth System Models (ESMs). Trained on outputs from three ESMs, the emulator is evaluated in both pre-industrial and forced regimes using diagnostics for probability densities, cross-variable correlations, time of emergence, and tail behavior. The central claim is that generated samples closely match ESM distributions and capture key forced responses, with inaccuracies small relative to internal variability despite noted failure cases for variables exhibiting strong seasonal regime shifts; code is released publicly.

Significance. If the central claim holds, the work offers a computationally efficient surrogate for generating impact-relevant climate fields, potentially accelerating exploration of policy scenarios beyond CMIP cycles. Strengths include the multi-ESM evaluation, public code release at https://github.com/shahineb/climemu, and introduction of a diagnostic suite that directly targets impact-model inputs. These elements support reproducibility and practical utility if the error-variability comparison is quantified.

major comments (1)
  1. [Abstract] Abstract: the claim that 'the inaccuracies of the emulator are small relative to the magnitude of internal variability in ESM projections' is load-bearing for the utility conclusion but lacks quantitative support for the acknowledged failure cases (variables with strong regime shifts in the seasonal cycle). No ratio of emulator bias to ESM standard deviation (or equivalent metric) is reported for precipitation, wind speed, or seasonal transitions in the tails or regime-shift periods, leaving the central claim unsubstantiated for impact assessment.
minor comments (2)
  1. [Methods] Methods section: full details on the diffusion noise schedule, network architecture, and spherical mesh discretization are referenced but not fully specified in the provided text, hindering independent reproduction.
  2. [Results] Results: error bars or uncertainty estimates on the reported diagnostics (densities, correlations, time of emergence) are absent, making it difficult to assess whether differences from ESMs are statistically significant.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The feedback highlights an important opportunity to strengthen the quantitative support for our central claim. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'the inaccuracies of the emulator are small relative to the magnitude of internal variability in ESM projections' is load-bearing for the utility conclusion but lacks quantitative support for the acknowledged failure cases (variables with strong regime shifts in the seasonal cycle). No ratio of emulator bias to ESM standard deviation (or equivalent metric) is reported for precipitation, wind speed, or seasonal transitions in the tails or regime-shift periods, leaving the central claim unsubstantiated for impact assessment.

    Authors: We agree that the abstract claim would benefit from more explicit quantification, particularly for the failure cases involving strong seasonal regime shifts. In the revised manuscript we will add a new supplementary table reporting the ratio of emulator error (using mean absolute deviation in distribution statistics and tail quantiles) to ESM internal variability (standard deviation across years or ensemble members) for all variables, including precipitation and wind speed, with separate entries for seasonal transition months. This will directly address the concern and allow readers to evaluate the claim for impact assessment. While the main text already shows several diagnostics where discrepancies remain smaller than variability, we acknowledge the need for this focused metric in the abstract context. revision: yes

Circularity Check

0 steps flagged

No significant circularity; evaluation is empirical distributional comparison

full rationale

The paper trains a score-based diffusion model on monthly ESM fields from three models and evaluates generated samples via direct statistical comparisons (densities, correlations, time of emergence, tails) to the parent ESM outputs. These metrics are independent empirical checks rather than quantities derived by construction from fitted parameters or self-citations. No load-bearing step reduces to a self-definition, fitted-input prediction, or uniqueness theorem imported from the authors' prior work. The framework is self-contained against external benchmarks (the ESM data themselves), with acknowledged failure cases treated as empirical observations rather than hidden tautologies.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions of generative modeling plus domain-specific choices about mesh and variable selection. No new physical entities are postulated.

free parameters (1)
  • diffusion noise schedule and network hyperparameters
    Chosen during training to match ESM statistics; typical for score-based models and not derived from first principles.
axioms (1)
  • domain assumption The joint distribution of monthly climate fields can be learned by score-based diffusion on a spherical mesh.
    Invoked as the modeling premise for the emulator architecture.

pith-pipeline@v0.9.0 · 5812 in / 1246 out tokens · 37158 ms · 2026-05-18T10:11:11.700331+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Towards accurate extreme event likelihoods from diffusion model climate emulators

    physics.ao-ph 2026-05 unverdicted novelty 6.0

    Diffusion model climate emulators provide probability density estimates that allow likelihood calculations and odds-ratio-based importance sampling for extreme events such as tropical cyclones.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Emulating Earth system model temperatures with MESMER: from global mean temperature trajectories to grid-point-level realizations on land.Earth System Dynamics, 2020

    2 Lea Beusch, Lukas Gudmundsson, and Sonia I Seneviratne. Emulating Earth system model temperatures with MESMER: from global mean temperature trajectories to grid-point-level realizations on land.Earth System Dynamics, 2020. 2, 5 S. Nath, Q. Lejeune, L. Beusch, S. I. Seneviratne, and C.-F. Schleussner. MESMER-M: an Earth system model emulator for spatiall...

  2. [2]

    MERCURY: A fast and versatile multi-resolution based global emulator of compound climate hazards.arXiv preprint arXiv:2501.04018, 2024

    2 Shruti Nath, Julie Carreau, Kai Kornhuber, Peter Pfleiderer, Carl-Friedrich Schleussner, and Philippe Naveau. MERCURY: A fast and versatile multi-resolution based global emulator of compound climate hazards.arXiv preprint arXiv:2501.04018, 2024. 2, 4 Sarah Sch¨ ongart, Lukas Gudmundsson, Mathias Hauser, Peter Pfleiderer, Quentin Lejeune, Shruti Nath, So...

  3. [3]

    The Shared Socioeconomic Pathways and their energy, land use, and greenhouse gas emissions implications: An overview.Global environmental change, 42:153–168, 2017

    3 Keywan Riahi, Detlef P Van Vuuren, Elmar Kriegler, Jae Edmonds, Brian C O’neill, Shinichiro Fujimori, Nico Bauer, Katherine Calvin, Rob Dellink, Oliver Fricko, et al. The Shared Socioeconomic Pathways and their energy, land use, and greenhouse gas emissions implications: An overview.Global environmental change, 42:153–168, 2017. 3 Veronika Eyring, Sandr...

  4. [4]

    4 N. J. Leach, S. Jenkins, Z. Nicholls, C. J. Smith, J. Lynch, M. Cain, T. Walsh, B. Wu, J. Tsutsui, and M. R. Allen. FaIRv2.0.0: a generalized impulse response model for climate uncertainty and future scenario exploration.Geoscientific Model Development, 2021. 4 Sonia Seneviratne, Yann Quilcaille, Michael Windisch, Lukas Gudmundsson, Bianca Biess, Felix ...

  5. [5]

    Performance of pattern-scaled climate projections under high-end warming

    4 25 Timothy J Osborn, Craig J Wallace, Jason A Lowe, and Dan Bernie. Performance of pattern-scaled climate projections under high-end warming. Part I: surface air temperature over land.Journal of Climate, 31 (14):5667–5680, 2018. 5 Paolo Giani, Raffaele Ferrari, and Noelle Eckley Selin. Origin and limits of invariant warming patterns in climate models.in...

  6. [6]

    Future global climate: scenario-based projections and near-term information.Climate Change 2021: The Physical Science Basis

    5 June-Yi Lee, Jochem Marotzke, Govindasamy Bala, Long Cao, Susanna Corti, John P Dunne, Francois Engelbrecht, Erich Fischer, John C Fyfe, Christopher Jones, et al. Future global climate: scenario-based projections and near-term information.Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of t...

  7. [7]

    ClimateBench v1

    5, 15 Duncan Watson-Parris, Yuhan Rao, Dirk Olivi´ e, Øyvind Seland, Peer Nowack, Gustau Camps-Valls, Philip Stier, Shahine Bouabid, Maura Dewey, Emilie Fons, et al. ClimateBench v1. 0: A Benchmark for Data- Driven Climate Projections.Journal of Advances in Modeling Earth Systems, 14(10):e2021MS002954,

  8. [8]

    Pattern scaling using ClimGen: monthly-resolution future climate scenarios including changes in the variability of precipitation.Climatic Change, 134:353–369, 2016

    5 Timothy J Osborn, Craig J Wallace, Ian C Harris, and Thomas M Melvin. Pattern scaling using ClimGen: monthly-resolution future climate scenarios including changes in the variability of precipitation.Climatic Change, 134:353–369, 2016. 5 C Tebaldi, A Armbruster, HP Engler, and R Link. Emulating climate extreme indices.Environmental Research Letters, 15(7...

  9. [9]

    14 27 Yue Dong, Andrew G Pauling, Shaina Sadai, and Kyle C Armour. Antarctic ice-sheet meltwater reduces transient warming and climate sensitivity through the sea-surface temperature pattern effect.Geophysical Research Letters, 49(24):e2022GL101249, 2022. 14 Kevin E Trenberth. Changes in precipitation with climate change.Climate research, 47(1-2):123–138,...

  10. [10]

    Transferring climate change physical knowledge

    19, 21 Francesco Immorlano, Veronika Eyring, Thomas le Monnier de Gouville, Gabriele Accarino, Donatello Elia, Stephan Mandt, Giovanni Aloisio, and Pierre Gentine. Transferring climate change physical knowledge. Proceedings of the National Academy of Sciences, 122(15):e2413503122, 2025. 20 Mengze Wang, Benedikt Barthel Sorensen, and Themistoklis Sapsis. G...

  11. [11]

    EnScale: Temporally-consistent multivariate generative downscaling via proper scoring rules

    20 Karyn Tabor and John W Williams. Globally downscaled climate projections for assessing the conservation impacts of climate change.Ecological Applications, 20(2):554–565, 2010. 20 Colin R Mahony, Tongli Wang, Andreas Hamann, and Alex J Cannon. A global climate model ensemble for downscaled monthly climate normals over North America.International Journal...

  12. [12]

    Exeter, Devon, 2010 - 2015

    22 Met Office.Cartopy: a cartographic python library with a matplotlib interface. Exeter, Devon, 2010 - 2015. URLhttp://scitools.org.uk/cartopy. 22 Michael L. Waskom. seaborn: statistical data visualization.Journal of Open Source Software, 6(60):3021,

  13. [13]

    L., 2021, @doi [Journal of Open Source Software] 10.21105/joss.03021 , 6, 3021

    doi: 10.21105/joss.03021. URLhttps://doi.org/10.21105/joss.03021. 22 Patrick Kidger and Cristian Garcia. Equinox: neural networks in JAX via callable PyTrees and filtered transformations.Differentiable Programming workshop at Neural Information Processing Systems 2021,

  14. [14]

    The DeepMind JAX Ecosystem, 2020

    22 DeepMind, Igor Babuschkin, Kate Baumli, Alison Bell, Surya Bhupatiraju, Jake Bruce, Peter Buchlovsky, David Budden, Trevor Cai, Aidan Clark, Ivo Danihelka, Antoine Dedieu, Claudio Fantacci, Jonathan Godwin, Chris Jones, Ross Hemsley, Tom Hennigan, Matteo Hessel, Shaobo Hou, Steven Kapturowski, Thomas Keck, Iurii Kemaev, Michael King, Markus Kunesch, Le...