From stable online coupling to decade-long climate simulations: A machine learning parameterization for cloud microphysics in ICON
Pith reviewed 2026-06-26 05:59 UTC · model grok-4.3
The pith
A machine learning cloud microphysics scheme achieves stable decade-long online coupling in the ICON model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a two-stage ML microphysics scheme (classifier for active cells plus regressor for tendencies), trained on global 5 km ICON output and equipped with physical constraints, couples stably to ICON at standard climate resolution, sustains decade-long integrations, reproduces observed climate comparably to the graupel scheme, and eliminates two tuning parameters, although systematic reductions in long-term mean-state biases are not yet achieved.
What carries the argument
Two-stage ML model (classifier identifying active grid cells, regressor predicting microphysical tendencies) with enforced physical constraints for mass positivity and overshoot prevention.
If this is right
- The coupled ML scheme maintains numerical stability over decade-long climate simulations.
- It reproduces observed climate with performance comparable to the classical graupel scheme.
- It eliminates two microphysics-specific tuning parameters required by the graupel scheme.
- Strong offline accuracy is insufficient for stable online coupling; physical constraints are required.
Where Pith is reading between the lines
- Constraint-enforcement methods developed here could be applied to stabilize ML versions of other subgrid processes.
- Expanding the training data to include a wider range of climate states might enable the bias reductions not yet realized.
- The hybrid setup leaves the dynamical core untouched while replacing only microphysics, suggesting a practical route for incremental adoption in existing ESMs.
Load-bearing premise
Enforcing physical constraints such as mass positivity and overshoot prevention together with careful dataset curation is sufficient to achieve numerical stability when the model encounters atmospheric states and feedbacks absent from the 5 km training data.
What would settle it
A decade-long coupled simulation in which the ML scheme produces numerical instability, model crash, or climate statistics that diverge substantially from both observations and the graupel scheme reference run.
Figures
read the original abstract
The representation of cloud microphysics and its nonlinear character and scale-dependence is a remaining source of uncertainty in Earth system models (ESMs). Here, we develop and couple online a machine learning (ML)-based cloud microphysics parameterization with the Icosahedral non-hydrostatic modeling framework (ICON). The primary challenge is achieving numerically stable, long-term online coupling when transitioning from training with km-scale data to application in coarse-scale simulations, where the coupled system encounters atmospheric states and feedbacks not seen during training. The training data is obtained from a global convection-permitting ICON simulation at 5 km resolution. The ML microphysics scheme uses a two-stage design: a classifier to identify active grid cells and a regressor to predict cloud microphysical tendencies. Physical constraints such as enforcing mass positivity and overshoot prevention prove essential for numerical stability in the coupled system. We demonstrate that achieving stable online coupling requires enforcing physical constraints and careful dataset curation, and that strong offline performance alone is insufficient. The coupled model maintains numerical stability over decade-long simulations with a performance in reproducing the observed climate comparable to the classical graupel scheme. The ML-based scheme eliminates two microphysics-specific tuning parameters of the classical graupel scheme, though systematic improvements in long-term mean-state biases are not yet realized. This study demonstrates that stable, decade-long climate simulations with an ML-based cloud microphysics scheme trained on convection-permitting data are feasible, providing a foundation for future hybrid ESMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a two-stage ML parameterization (classifier for active cells + regressor for tendencies) for cloud microphysics in ICON, trained on 5 km convection-permitting data. It incorporates physical constraints (mass positivity, overshoot prevention) and dataset curation to achieve numerically stable online coupling. The central claim is that the coupled model runs stably for decade-long climate simulations with performance comparable to the classical graupel scheme, while eliminating two microphysics-specific tuning parameters (though without systematic bias reductions).
Significance. If the stability and comparability claims hold under the reported constraints, the work would be significant for hybrid Earth system modeling: it provides concrete evidence that ML microphysics trained at km-scale can be stably coupled at coarse resolution over multi-year timescales, removing two free parameters and highlighting the role of physical constraints beyond offline accuracy. This directly addresses a key barrier to ML parameterizations in long climate integrations.
major comments (2)
- [Abstract] Abstract: the assertion that 'physical constraints such as enforcing mass positivity and overshoot prevention prove essential for numerical stability' is load-bearing for the central claim yet is presented without any ablation (e.g., runs with/without clipping) or diagnostic quantifying the fraction of online states outside the 5 km training support; this leaves the distribution-shift robustness untested.
- [Abstract] Abstract and results summary: the claim of 'numerical stability over decade-long simulations' and 'performance ... comparable to the classical graupel scheme' lacks any reported quantitative metrics (error bars, drift rates, or global-mean bias tables) or cross-validation against shorter offline/online tests, making it impossible to assess whether clipped errors accumulate into slow drift.
minor comments (1)
- [Abstract] The abstract states that 'strong offline performance alone is insufficient' but does not reference the specific offline metrics or training/validation splits used to reach that conclusion.
Simulated Author's Rebuttal
We thank the referee for the constructive review. The two major comments correctly identify areas where the current manuscript would benefit from additional evidence. We outline targeted revisions below to address both points.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that 'physical constraints such as enforcing mass positivity and overshoot prevention prove essential for numerical stability' is load-bearing for the central claim yet is presented without any ablation (e.g., runs with/without clipping) or diagnostic quantifying the fraction of online states outside the 5 km training support; this leaves the distribution-shift robustness untested.
Authors: We agree that an explicit ablation and distribution-shift diagnostic would strengthen the claim. Our internal development runs indicated that removing the constraints produced instabilities within days, but this evidence was not included in the submitted manuscript. In the revised version we will add (i) a controlled ablation comparing otherwise identical online integrations with and without the positivity/overshoot constraints and (ii) a diagnostic quantifying the fraction of online states that fall outside the 5 km training distribution. revision: yes
-
Referee: [Abstract] Abstract and results summary: the claim of 'numerical stability over decade-long simulations' and 'performance ... comparable to the classical graupel scheme' lacks any reported quantitative metrics (error bars, drift rates, or global-mean bias tables) or cross-validation against shorter offline/online tests, making it impossible to assess whether clipped errors accumulate into slow drift.
Authors: We concur that quantitative support is required. The submitted manuscript presents only qualitative statements of stability and comparability. The revised manuscript will include (i) global-mean bias tables for temperature, humidity, and precipitation with standard-error estimates, (ii) time series of domain-averaged quantities over the full decade with drift-rate calculations, and (iii) side-by-side comparison against the shorter offline and online test integrations already performed during development. revision: yes
Circularity Check
No circularity: empirical demonstration of ML coupling stability
full rationale
The paper's central result is an empirical demonstration that an ML microphysics scheme (classifier + regressor trained on external 5 km ICON data, with added mass-positivity and overshoot constraints) can be coupled online to the coarse ICON model and remain stable over decade-long runs. No derivation chain reduces any claimed outcome to fitted parameters by construction, nor does any load-bearing step rely on self-citation of an unverified uniqueness theorem or ansatz. The training data, physical constraints, and long-term simulation results are independent of the target performance metrics; offline accuracy is explicitly stated as insufficient, and success is shown via direct experiment rather than algebraic identity. This is a standard non-circular empirical ML parameterization study.
Axiom & Free-Parameter Ledger
free parameters (1)
- Neural network weights and biases
axioms (2)
- domain assumption Mass positivity must be enforced in predictions
- domain assumption Overshoot prevention is required
Reference graph
Works this paper leans on
-
[1]
Optuna: A Next-generation Hyperparameter Optimization Framework
Adler, R. F., Sapiano, M. R. P., Huffman, J.-J., George J. nd Wang, Gu, G., Bolvin, D., Chiu, L., . . . Shin, D.-B. (2018). The global precipitation climatology project (gpcp) monthly analysis (new version 2.3) and a review of 2017 global precipita- tion.Atmosphere,9(4). Retrieved from https://www.mdpi.com/2073-4433/9/4/ 138doi: 10.3390/atmos9040138 Agara...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.3390/atmos9040138 2018
-
[2]
doi: 10.21105/joss.07602 Baldauf, M., Seifert, A., F¨ orstner, J., Majewski, D., Raschendorfer, M., & Reinhardt, T. (2011). Operational convective-scale numerical weather prediction with the cosmo model: Description and sensitivities.Monthly Weather Review,139(12), 3887 -
-
[3]
doi: 10.1175/MWR-D-10-05013.1 Balogh, B., Saint-Martin, D., & Ribes, A. (2022). How to calibrate a dynamical system with neural network based physics?Geophysical Research Letters,49(8), e2022GL097872. doi: 10.1029/2022GL097872 Behrens, G., Beucler, T., Iglesias-Suarez, F., Yu, S., Gentine, P., Pritchard, M., . . . Eyring, V. (2025). Simulating atmospheric...
-
[4]
doi: 10.1175/1520-0469(1984)041⟨2949:TMAMSA⟩2.0.CO;2 Sarauer, E., Schwabe, M., Lauer, A., Stier, P., Weiss, P., & Eyring, V. (2024). Physics-informed machine learning-based cloud microphysics parameterization for earth system models.Accepted to International Conference on Learning Representations
-
[5]
Sarauer, E., Schwabe, M., Lauer, A., Stier, P., Weiss, P., & Eyring, V. (2025). A physics-informed machine learning parameterization for cloud microphysics in icon. Environmental Data Science. doi: 10.1017/eds.2025.10016 Schlund, M., & Bock, L. (2026, March).Iconeval.Zenodo. Retrieved from https:// doi.org/10.5281/zenodo.18937451doi: 10.5281/zenodo.189374...
-
[6]
doi: 10.3390/atmos17030306 Yang, L. M., & Gerber, E. P. (2026). Overcoming set imbalance in data-driven parameterization: A case study of gravity wave momentum transport.Journal of Advances in Modeling Earth Systems,18(2), e2024MS004313. doi: 10.1029/ 2024MS004313 Young, A. H., Knapp, K. R., Inamdar, A., Hankins, W., & Rossow, W. B. (2018). The internatio...
-
[7]
for cloud ice and from CloudSat (G. L. Stephens et al., 2002; G. Stephens et al., 2018)for cloud liquid water are shown in the left column. The center column shows differences between the tuned ICON baseline simulation using the classical graupel microphysics scheme and CALIPSO-ICECLOUD, the right column shows differences between the tuned ICON simulation...
2002
-
[8]
Water vapor path (kg m−2) ERA5 (Hersbach et al., 2020), ESACCI- WATERVAPOUR (Schr¨ oder et al., 2023), ISCCP-FH (Rossow et al., 2016; Young et al., 2018), MERRA2 (Gelaro et al.,
2020
-
[9]
Total cloud cover (%) CLARA-AVHRR (Karlsson et al., 2017, 2020), ERA5 (Hersbach et al., 2020), ESACCI-CLOUD (Stengel et al., 2020), MERRA2 (Gelaro et al., 2017), MODIS (Platnick et al., 2003), PATMOS-x (Heidinger et al.,
2017
-
[10]
Cloud liquid water path (10 −2 kg m−2) CLARA-AVHRR (Karlsson et al., 2017, 2020), Cloud- Sat (G. L. Stephens et al., 2002; G. Stephens et al., 2018), ERA5 (Hersbach et al., 2020), ESACCI- CLOUD (Stengel et al., 2020), MERRA2 (Gelaro et al., 2017), MODIS (Platnick et al.,
2017
-
[11]
Cloud ice water path (10 −2 kg m−2) CLARA-AVHRR (Karlsson et al., 2017, 2020), Cloud- Sat (G. L. Stephens et al., 2002; G. Stephens et al., 2018), ERA5 (Hersbach et al., 2020), ESACCI- CLOUD (Stengel et al., 2020), MERRA2 (Gelaro et al., 2017), MODIS (Platnick et al.,
2017
-
[12]
Shortwave cloud radiative effect (W m−2) CERES-EBAF (NASA/LARC/SD/ASDC, 2022; Loeb et al., 2009, 2012), ESACCI-CLOUD (Stengel et al., 2020), ISCCP-FH (Rossow et al., 2016; Young et al., 2018), MERRA2 (Gelaro et al.,
2022
-
[13]
Longwave cloud radiative effect (W m−2) CERES-EBAF (NASA/LARC/SD/ASDC, 2022; Loeb et al., 2009, 2012), ERA5 (Hersbach et al., 2020), ESACCI-CLOUD (Stengel et al., 2020), ISCCP-FH (Rossow et al., 2016; Young et al., 2018), MERRA2 (Gelaro et al.,
2022
-
[14]
Cloud liquid water mass mixing ratio (kg kg−1) CloudSat (G. L. Stephens et al., 2002; G. Stephens et al.,
2002
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.