From stable online coupling to decade-long climate simulations: A machine learning parameterization for cloud microphysics in ICON

Axel Lauer; Ellen Sarauer; Mierk Schwabe; Philipp Weiss; Philip Stier; Veronika Eyring

arxiv: 2606.23829 · v1 · pith:7CNSOJWSnew · submitted 2026-06-22 · ⚛️ physics.ao-ph

From stable online coupling to decade-long climate simulations: A machine learning parameterization for cloud microphysics in ICON

Ellen Sarauer , Mierk Schwabe , Philipp Weiss , Axel Lauer , Philip Stier , Veronika Eyring This is my paper

Pith reviewed 2026-06-26 05:59 UTC · model grok-4.3

classification ⚛️ physics.ao-ph

keywords machine learning parameterizationcloud microphysicsICON modelonline couplingnumerical stabilityclimate simulationgraupel scheme

0 comments

The pith

A machine learning cloud microphysics scheme achieves stable decade-long online coupling in the ICON model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that an ML parameterization for cloud microphysics, trained on 5 km convection-permitting data, can be coupled online to the coarser ICON climate model and remain numerically stable for decade-long runs. Physical constraints on mass positivity and overshoot prevention, plus careful data curation, prove necessary to handle atmospheric states absent from training. The resulting scheme matches the classical graupel scheme in reproducing observed climate while removing two microphysics-specific tuning parameters. This matters for Earth system modeling because cloud microphysics remains a major uncertainty source, and stable hybrid approaches could eventually replace tuned components.

Core claim

The central claim is that a two-stage ML microphysics scheme (classifier for active cells plus regressor for tendencies), trained on global 5 km ICON output and equipped with physical constraints, couples stably to ICON at standard climate resolution, sustains decade-long integrations, reproduces observed climate comparably to the graupel scheme, and eliminates two tuning parameters, although systematic reductions in long-term mean-state biases are not yet achieved.

What carries the argument

Two-stage ML model (classifier identifying active grid cells, regressor predicting microphysical tendencies) with enforced physical constraints for mass positivity and overshoot prevention.

If this is right

The coupled ML scheme maintains numerical stability over decade-long climate simulations.
It reproduces observed climate with performance comparable to the classical graupel scheme.
It eliminates two microphysics-specific tuning parameters required by the graupel scheme.
Strong offline accuracy is insufficient for stable online coupling; physical constraints are required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Constraint-enforcement methods developed here could be applied to stabilize ML versions of other subgrid processes.
Expanding the training data to include a wider range of climate states might enable the bias reductions not yet realized.
The hybrid setup leaves the dynamical core untouched while replacing only microphysics, suggesting a practical route for incremental adoption in existing ESMs.

Load-bearing premise

Enforcing physical constraints such as mass positivity and overshoot prevention together with careful dataset curation is sufficient to achieve numerical stability when the model encounters atmospheric states and feedbacks absent from the 5 km training data.

What would settle it

A decade-long coupled simulation in which the ML scheme produces numerical instability, model crash, or climate statistics that diverge substantially from both observations and the graupel scheme reference run.

Figures

Figures reproduced from arXiv: 2606.23829 by Axel Lauer, Ellen Sarauer, Mierk Schwabe, Philipp Weiss, Philip Stier, Veronika Eyring.

**Figure 1.** Figure 1: Schematic overview of the online ML coupling workflow. High-resolution ICON simulations provide training data, which are coarse-grained, preprocessed, and used to train the ML microphysics scheme. After offline evaluation, the ML microphysics scheme is coupled online to ICON, followed by stability assessment, tuning of the online-coupled ICON version, and evaluation of the simulated climate and cloud prope… view at source ↗

**Figure 2.** Figure 2: Overview of the ML microphysics scheme. Inputs and outputs correspond to the classical graupel scheme (see [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: 10-year global annual mean bias relative to observations for key climate variables, expressed as deviations normalised by the observational spread σ. Blue circles show the tuned ICON baseline configuration using the classical graupel scheme and red squares show the tuned ML microphysics configuration. Shaded bands denote the observational range (annual mean ± σ), where the mean and standard deviation σ are… view at source ↗

**Figure 4.** Figure 4: Portrait diagram showing the relative root-mean-square error (RMSE) of selected key atmospheric and surface variables (see Appendix Appendix D, Table D1) across an ensemble of CMIP6 models and the two ICON simulations compared to observational datasets. Rows denote variables and columns individual models. RMSE values are shown relative to the median RMSE from the CMIP6 ensemble with blue (red) colors indic… view at source ↗

read the original abstract

The representation of cloud microphysics and its nonlinear character and scale-dependence is a remaining source of uncertainty in Earth system models (ESMs). Here, we develop and couple online a machine learning (ML)-based cloud microphysics parameterization with the Icosahedral non-hydrostatic modeling framework (ICON). The primary challenge is achieving numerically stable, long-term online coupling when transitioning from training with km-scale data to application in coarse-scale simulations, where the coupled system encounters atmospheric states and feedbacks not seen during training. The training data is obtained from a global convection-permitting ICON simulation at 5 km resolution. The ML microphysics scheme uses a two-stage design: a classifier to identify active grid cells and a regressor to predict cloud microphysical tendencies. Physical constraints such as enforcing mass positivity and overshoot prevention prove essential for numerical stability in the coupled system. We demonstrate that achieving stable online coupling requires enforcing physical constraints and careful dataset curation, and that strong offline performance alone is insufficient. The coupled model maintains numerical stability over decade-long simulations with a performance in reproducing the observed climate comparable to the classical graupel scheme. The ML-based scheme eliminates two microphysics-specific tuning parameters of the classical graupel scheme, though systematic improvements in long-term mean-state biases are not yet realized. This study demonstrates that stable, decade-long climate simulations with an ML-based cloud microphysics scheme trained on convection-permitting data are feasible, providing a foundation for future hybrid ESMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows an ML microphysics scheme can couple online into ICON and run stably for a decade, but the evidence for why the constraints prevent drift is thin.

read the letter

The key result is that a two-stage ML scheme trained on 5 km convection-permitting data stays numerically stable when run online in the coarser ICON model for ten-year simulations, matching the classical graupel scheme's climate performance while removing two tuning parameters.

What is new is the explicit focus on long-term online coupling rather than offline accuracy. The classifier-plus-regressor design plus mass-positivity and overshoot clipping is presented as the combination that bridges the resolution gap and keeps the model from blowing up. They also note that strong offline scores alone do not guarantee stability, which is a useful practical observation.

The work is solid on the engineering side: they actually ran the coupled system for decade timescales and report no instability. Removing tuning parameters is a clear practical gain if it generalizes.

The soft spots are in the supporting evidence. The abstract states that physical constraints are essential but supplies no ablation runs, no count of out-of-distribution inputs during the long simulations, and no quantitative metrics with error bars on how closely the ML version reproduces the reference climate or observations. Without those, it is hard to judge whether the constraints truly correct inconsistent feedbacks or simply mask accumulating errors that might appear under different initial conditions or longer runs.

The distribution-shift concern from the stress-test note holds up on the given information: training on 5 km data and applying to coarse-grid states over decades leaves open the possibility that slow drift from clipped tendencies could still emerge. The paper would benefit from diagnostics on that point.

This is for groups already building hybrid microphysics schemes who need a concrete example of decade-scale stability. It deserves peer review because the feasibility claim is worth detailed checking even if the current validation leaves gaps.

Referee Report

2 major / 1 minor

Summary. The paper develops a two-stage ML parameterization (classifier for active cells + regressor for tendencies) for cloud microphysics in ICON, trained on 5 km convection-permitting data. It incorporates physical constraints (mass positivity, overshoot prevention) and dataset curation to achieve numerically stable online coupling. The central claim is that the coupled model runs stably for decade-long climate simulations with performance comparable to the classical graupel scheme, while eliminating two microphysics-specific tuning parameters (though without systematic bias reductions).

Significance. If the stability and comparability claims hold under the reported constraints, the work would be significant for hybrid Earth system modeling: it provides concrete evidence that ML microphysics trained at km-scale can be stably coupled at coarse resolution over multi-year timescales, removing two free parameters and highlighting the role of physical constraints beyond offline accuracy. This directly addresses a key barrier to ML parameterizations in long climate integrations.

major comments (2)

[Abstract] Abstract: the assertion that 'physical constraints such as enforcing mass positivity and overshoot prevention prove essential for numerical stability' is load-bearing for the central claim yet is presented without any ablation (e.g., runs with/without clipping) or diagnostic quantifying the fraction of online states outside the 5 km training support; this leaves the distribution-shift robustness untested.
[Abstract] Abstract and results summary: the claim of 'numerical stability over decade-long simulations' and 'performance ... comparable to the classical graupel scheme' lacks any reported quantitative metrics (error bars, drift rates, or global-mean bias tables) or cross-validation against shorter offline/online tests, making it impossible to assess whether clipped errors accumulate into slow drift.

minor comments (1)

[Abstract] The abstract states that 'strong offline performance alone is insufficient' but does not reference the specific offline metrics or training/validation splits used to reach that conclusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review. The two major comments correctly identify areas where the current manuscript would benefit from additional evidence. We outline targeted revisions below to address both points.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that 'physical constraints such as enforcing mass positivity and overshoot prevention prove essential for numerical stability' is load-bearing for the central claim yet is presented without any ablation (e.g., runs with/without clipping) or diagnostic quantifying the fraction of online states outside the 5 km training support; this leaves the distribution-shift robustness untested.

Authors: We agree that an explicit ablation and distribution-shift diagnostic would strengthen the claim. Our internal development runs indicated that removing the constraints produced instabilities within days, but this evidence was not included in the submitted manuscript. In the revised version we will add (i) a controlled ablation comparing otherwise identical online integrations with and without the positivity/overshoot constraints and (ii) a diagnostic quantifying the fraction of online states that fall outside the 5 km training distribution. revision: yes
Referee: [Abstract] Abstract and results summary: the claim of 'numerical stability over decade-long simulations' and 'performance ... comparable to the classical graupel scheme' lacks any reported quantitative metrics (error bars, drift rates, or global-mean bias tables) or cross-validation against shorter offline/online tests, making it impossible to assess whether clipped errors accumulate into slow drift.

Authors: We concur that quantitative support is required. The submitted manuscript presents only qualitative statements of stability and comparability. The revised manuscript will include (i) global-mean bias tables for temperature, humidity, and precipitation with standard-error estimates, (ii) time series of domain-averaged quantities over the full decade with drift-rate calculations, and (iii) side-by-side comparison against the shorter offline and online test integrations already performed during development. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical demonstration of ML coupling stability

full rationale

The paper's central result is an empirical demonstration that an ML microphysics scheme (classifier + regressor trained on external 5 km ICON data, with added mass-positivity and overshoot constraints) can be coupled online to the coarse ICON model and remain stable over decade-long runs. No derivation chain reduces any claimed outcome to fitted parameters by construction, nor does any load-bearing step rely on self-citation of an unverified uniqueness theorem or ansatz. The training data, physical constraints, and long-term simulation results are independent of the target performance metrics; offline accuracy is explicitly stated as insufficient, and success is shown via direct experiment rather than algebraic identity. This is a standard non-circular empirical ML parameterization study.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on domain assumptions about physical constraints being necessary and sufficient for stability, plus standard ML fitting to simulation data. No new physical entities are postulated.

free parameters (1)

Neural network weights and biases
Fitted during supervised training on the 5 km ICON simulation output to predict microphysical tendencies.

axioms (2)

domain assumption Mass positivity must be enforced in predictions
Invoked in abstract as essential for numerical stability in the coupled system.
domain assumption Overshoot prevention is required
Stated as necessary alongside mass positivity to maintain stability.

pith-pipeline@v0.9.1-grok · 5818 in / 1401 out tokens · 36528 ms · 2026-06-26T05:59:31.357749+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Optuna: A Next-generation Hyperparameter Optimization Framework

Adler, R. F., Sapiano, M. R. P., Huffman, J.-J., George J. nd Wang, Gu, G., Bolvin, D., Chiu, L., . . . Shin, D.-B. (2018). The global precipitation climatology project (gpcp) monthly analysis (new version 2.3) and a review of 2017 global precipita- tion.Atmosphere,9(4). Retrieved from https://www.mdpi.com/2073-4433/9/4/ 138doi: 10.3390/atmos9040138 Agara...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.3390/atmos9040138 2018
[2]

doi: 10.21105/joss.07602 Baldauf, M., Seifert, A., F¨ orstner, J., Majewski, D., Raschendorfer, M., & Reinhardt, T. (2011). Operational convective-scale numerical weather prediction with the cosmo model: Description and sensitivities.Monthly Weather Review,139(12), 3887 -

work page doi:10.21105/joss.07602 2011
[3]

doi: 10.1175/MWR-D-10-05013.1 Balogh, B., Saint-Martin, D., & Ribes, A. (2022). How to calibrate a dynamical system with neural network based physics?Geophysical Research Letters,49(8), e2022GL097872. doi: 10.1029/2022GL097872 Behrens, G., Beucler, T., Iglesias-Suarez, F., Yu, S., Gentine, P., Pritchard, M., . . . Eyring, V. (2025). Simulating atmospheric...

work page doi:10.1175/mwr-d-10-05013.1 2022
[4]

doi: 10.1175/1520-0469(1984)041⟨2949:TMAMSA⟩2.0.CO;2 Sarauer, E., Schwabe, M., Lauer, A., Stier, P., Weiss, P., & Eyring, V. (2024). Physics-informed machine learning-based cloud microphysics parameterization for earth system models.Accepted to International Conference on Learning Representations

work page doi:10.1175/1520-0469(1984)041 1984
[5]

Sarauer, E., Schwabe, M., Lauer, A., Stier, P., Weiss, P., & Eyring, V. (2025). A physics-informed machine learning parameterization for cloud microphysics in icon. Environmental Data Science. doi: 10.1017/eds.2025.10016 Schlund, M., & Bock, L. (2026, March).Iconeval.Zenodo. Retrieved from https:// doi.org/10.5281/zenodo.18937451doi: 10.5281/zenodo.189374...

work page doi:10.1017/eds.2025.10016 2025
[6]

M., & Gerber, E

doi: 10.3390/atmos17030306 Yang, L. M., & Gerber, E. P. (2026). Overcoming set imbalance in data-driven parameterization: A case study of gravity wave momentum transport.Journal of Advances in Modeling Earth Systems,18(2), e2024MS004313. doi: 10.1029/ 2024MS004313 Young, A. H., Knapp, K. R., Inamdar, A., Hankins, W., & Rossow, W. B. (2018). The internatio...

work page doi:10.3390/atmos17030306 2026
[7]

for cloud ice and from CloudSat (G. L. Stephens et al., 2002; G. Stephens et al., 2018)for cloud liquid water are shown in the left column. The center column shows differences between the tuned ICON baseline simulation using the classical graupel microphysics scheme and CALIPSO-ICECLOUD, the right column shows differences between the tuned ICON simulation...

2002
[8]

Water vapor path (kg m−2) ERA5 (Hersbach et al., 2020), ESACCI- WATERVAPOUR (Schr¨ oder et al., 2023), ISCCP-FH (Rossow et al., 2016; Young et al., 2018), MERRA2 (Gelaro et al.,

2020
[9]

Total cloud cover (%) CLARA-AVHRR (Karlsson et al., 2017, 2020), ERA5 (Hersbach et al., 2020), ESACCI-CLOUD (Stengel et al., 2020), MERRA2 (Gelaro et al., 2017), MODIS (Platnick et al., 2003), PATMOS-x (Heidinger et al.,

2017
[10]

Cloud liquid water path (10 −2 kg m−2) CLARA-AVHRR (Karlsson et al., 2017, 2020), Cloud- Sat (G. L. Stephens et al., 2002; G. Stephens et al., 2018), ERA5 (Hersbach et al., 2020), ESACCI- CLOUD (Stengel et al., 2020), MERRA2 (Gelaro et al., 2017), MODIS (Platnick et al.,

2017
[11]

Cloud ice water path (10 −2 kg m−2) CLARA-AVHRR (Karlsson et al., 2017, 2020), Cloud- Sat (G. L. Stephens et al., 2002; G. Stephens et al., 2018), ERA5 (Hersbach et al., 2020), ESACCI- CLOUD (Stengel et al., 2020), MERRA2 (Gelaro et al., 2017), MODIS (Platnick et al.,

2017
[12]

Shortwave cloud radiative effect (W m−2) CERES-EBAF (NASA/LARC/SD/ASDC, 2022; Loeb et al., 2009, 2012), ESACCI-CLOUD (Stengel et al., 2020), ISCCP-FH (Rossow et al., 2016; Young et al., 2018), MERRA2 (Gelaro et al.,

2022
[13]

Longwave cloud radiative effect (W m−2) CERES-EBAF (NASA/LARC/SD/ASDC, 2022; Loeb et al., 2009, 2012), ERA5 (Hersbach et al., 2020), ESACCI-CLOUD (Stengel et al., 2020), ISCCP-FH (Rossow et al., 2016; Young et al., 2018), MERRA2 (Gelaro et al.,

2022
[14]

Cloud liquid water mass mixing ratio (kg kg−1) CloudSat (G. L. Stephens et al., 2002; G. Stephens et al.,

2002

[1] [1]

Optuna: A Next-generation Hyperparameter Optimization Framework

Adler, R. F., Sapiano, M. R. P., Huffman, J.-J., George J. nd Wang, Gu, G., Bolvin, D., Chiu, L., . . . Shin, D.-B. (2018). The global precipitation climatology project (gpcp) monthly analysis (new version 2.3) and a review of 2017 global precipita- tion.Atmosphere,9(4). Retrieved from https://www.mdpi.com/2073-4433/9/4/ 138doi: 10.3390/atmos9040138 Agara...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.3390/atmos9040138 2018

[2] [2]

doi: 10.21105/joss.07602 Baldauf, M., Seifert, A., F¨ orstner, J., Majewski, D., Raschendorfer, M., & Reinhardt, T. (2011). Operational convective-scale numerical weather prediction with the cosmo model: Description and sensitivities.Monthly Weather Review,139(12), 3887 -

work page doi:10.21105/joss.07602 2011

[3] [3]

doi: 10.1175/MWR-D-10-05013.1 Balogh, B., Saint-Martin, D., & Ribes, A. (2022). How to calibrate a dynamical system with neural network based physics?Geophysical Research Letters,49(8), e2022GL097872. doi: 10.1029/2022GL097872 Behrens, G., Beucler, T., Iglesias-Suarez, F., Yu, S., Gentine, P., Pritchard, M., . . . Eyring, V. (2025). Simulating atmospheric...

work page doi:10.1175/mwr-d-10-05013.1 2022

[4] [4]

doi: 10.1175/1520-0469(1984)041⟨2949:TMAMSA⟩2.0.CO;2 Sarauer, E., Schwabe, M., Lauer, A., Stier, P., Weiss, P., & Eyring, V. (2024). Physics-informed machine learning-based cloud microphysics parameterization for earth system models.Accepted to International Conference on Learning Representations

work page doi:10.1175/1520-0469(1984)041 1984

[5] [5]

Sarauer, E., Schwabe, M., Lauer, A., Stier, P., Weiss, P., & Eyring, V. (2025). A physics-informed machine learning parameterization for cloud microphysics in icon. Environmental Data Science. doi: 10.1017/eds.2025.10016 Schlund, M., & Bock, L. (2026, March).Iconeval.Zenodo. Retrieved from https:// doi.org/10.5281/zenodo.18937451doi: 10.5281/zenodo.189374...

work page doi:10.1017/eds.2025.10016 2025

[6] [6]

M., & Gerber, E

doi: 10.3390/atmos17030306 Yang, L. M., & Gerber, E. P. (2026). Overcoming set imbalance in data-driven parameterization: A case study of gravity wave momentum transport.Journal of Advances in Modeling Earth Systems,18(2), e2024MS004313. doi: 10.1029/ 2024MS004313 Young, A. H., Knapp, K. R., Inamdar, A., Hankins, W., & Rossow, W. B. (2018). The internatio...

work page doi:10.3390/atmos17030306 2026

[7] [7]

for cloud ice and from CloudSat (G. L. Stephens et al., 2002; G. Stephens et al., 2018)for cloud liquid water are shown in the left column. The center column shows differences between the tuned ICON baseline simulation using the classical graupel microphysics scheme and CALIPSO-ICECLOUD, the right column shows differences between the tuned ICON simulation...

2002

[8] [8]

Water vapor path (kg m−2) ERA5 (Hersbach et al., 2020), ESACCI- WATERVAPOUR (Schr¨ oder et al., 2023), ISCCP-FH (Rossow et al., 2016; Young et al., 2018), MERRA2 (Gelaro et al.,

2020

[9] [9]

Total cloud cover (%) CLARA-AVHRR (Karlsson et al., 2017, 2020), ERA5 (Hersbach et al., 2020), ESACCI-CLOUD (Stengel et al., 2020), MERRA2 (Gelaro et al., 2017), MODIS (Platnick et al., 2003), PATMOS-x (Heidinger et al.,

2017

[10] [10]

Cloud liquid water path (10 −2 kg m−2) CLARA-AVHRR (Karlsson et al., 2017, 2020), Cloud- Sat (G. L. Stephens et al., 2002; G. Stephens et al., 2018), ERA5 (Hersbach et al., 2020), ESACCI- CLOUD (Stengel et al., 2020), MERRA2 (Gelaro et al., 2017), MODIS (Platnick et al.,

2017

[11] [11]

Cloud ice water path (10 −2 kg m−2) CLARA-AVHRR (Karlsson et al., 2017, 2020), Cloud- Sat (G. L. Stephens et al., 2002; G. Stephens et al., 2018), ERA5 (Hersbach et al., 2020), ESACCI- CLOUD (Stengel et al., 2020), MERRA2 (Gelaro et al., 2017), MODIS (Platnick et al.,

2017

[12] [12]

Shortwave cloud radiative effect (W m−2) CERES-EBAF (NASA/LARC/SD/ASDC, 2022; Loeb et al., 2009, 2012), ESACCI-CLOUD (Stengel et al., 2020), ISCCP-FH (Rossow et al., 2016; Young et al., 2018), MERRA2 (Gelaro et al.,

2022

[13] [13]

Longwave cloud radiative effect (W m−2) CERES-EBAF (NASA/LARC/SD/ASDC, 2022; Loeb et al., 2009, 2012), ERA5 (Hersbach et al., 2020), ESACCI-CLOUD (Stengel et al., 2020), ISCCP-FH (Rossow et al., 2016; Young et al., 2018), MERRA2 (Gelaro et al.,

2022

[14] [14]

Cloud liquid water mass mixing ratio (kg kg−1) CloudSat (G. L. Stephens et al., 2002; G. Stephens et al.,

2002