Improving Ensemble CAPE Forecasts with a Diffusion Model Incorporating Aerosol Information
Pith reviewed 2026-06-30 17:23 UTC · model grok-4.3
The pith
A diffusion model that adds aerosol data to GFS inputs generates ensemble CAPE forecasts outperforming both GFS and GEFS on standard metrics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The diffusion model takes a GFS CAPE forecast as input and outputs an ensemble that significantly outperforms both GFS and GEFS 6-hour forecasts on root mean square error, continuous ranked probability score, and Brier score. A two-stage training pipeline combines a large historical GFS dataset with a smaller GEFS dataset. Adding aerosol optical depths as extra inputs further improves the forecasts, and black carbon, organic carbon, and sulfate aerosols exert greater influence than sea salt or dust.
What carries the argument
Two-stage diffusion model with classifier-free guidance that ingests GFS CAPE forecasts plus five aerosol optical depth fields and produces calibrated ensemble members.
If this is right
- The generated ensembles can be fed directly into downstream severe-weather guidance products.
- Classifier-free guidance provides a tunable knob between forecast accuracy and ensemble dispersion.
- Permutation importance ranks black carbon, organic carbon, and sulfate aerosols as stronger drivers of CAPE predictions than sea salt or dust.
- The same input-output structure can accept additional gridded fields without retraining the entire pipeline.
Where Pith is reading between the lines
- The framework could be tested on other convective indices such as lifted index or storm-relative helicity.
- Operational systems could ingest near-real-time aerosol analyses to produce updated CAPE ensembles each cycle.
- The two-stage approach may generalize to other forecast variables where a high-resolution deterministic run and a lower-resolution ensemble both exist.
Load-bearing premise
A two-stage training pipeline can combine a large GFS forecast archive with a smaller GEFS archive even though the two systems use different initialization and parameterization schemes that change over time.
What would settle it
Evaluation on a withheld period after a major GFS or GEFS model upgrade in which the diffusion ensemble no longer shows lower RMSE or higher CRPS than the raw GFS or GEFS output.
read the original abstract
Convective available potential energy (CAPE) is an important variable for forecasting severe weather and understanding deep convection and precipitation. The latest versions of the Global Forecast System (GFS) and related Global Ensemble Forecast System (GEFS) have exhibited a bias towards underestimating CAPE values during the summertime. We train an artificial intelligence (AI) diffusion model to improve the skill and uncertainty quantification of afternoon 6-hour lead time ensemble forecasts over the United States. Our model takes a GFS CAPE forecast as input and outputs an ensemble that significantly outperforms both GFS and GEFS 6-hour forecasts on root mean square error, continuous ranked probability score, and Brier score. We propose a two-stage training pipeline to leverage both a larger historical GFS forecast dataset and a smaller historical GEFS dataset, despite the two using initialization and parameterization schemes that vary over time. We also show that classifier-free guidance can be used to control the skill and spread of the forecasts. We then demonstrate the versatility of our framework by adding aerosol optical depths (AODs) of black carbon, organic carbon, dust, sea salt, and sulfates as additional input features. Aerosols can invigorate or suppress convection depending on atmospheric conditions. Our AI models effectively incorporate aerosols to produce improved CAPE forecasts. We interpret the model components by using permutation feature importance to rank the influence of the different AODs and find that black carbon, organic carbon, and sulfate aerosols have a greater impact on the model's CAPE predictions than sea salt and dust aerosols.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a diffusion model that takes GFS CAPE forecasts as input and generates ensemble forecasts for 6-hour afternoon lead times over the United States. It claims these ensembles significantly outperform both GFS and GEFS baselines on RMSE, CRPS, and Brier score. A two-stage training pipeline is proposed to combine a large historical GFS dataset with a smaller GEFS dataset despite differing initialization and parameterization schemes. Classifier-free guidance is used to control forecast skill and spread. The model is extended to incorporate aerosol optical depth (AOD) inputs for black carbon, organic carbon, dust, sea salt, and sulfates; permutation feature importance is then applied to rank their influence on CAPE predictions.
Significance. If the reported skill improvements and aerosol effects are robust, the work could contribute to operational severe-weather forecasting by improving ensemble uncertainty quantification and by demonstrating a practical route to include aerosol-convection interactions in AI post-processing. The classifier-free guidance mechanism offers a controllable way to trade off bias and spread that may be reusable in other ensemble post-processing settings.
major comments (3)
- [Methods (two-stage training pipeline)] Methods (two-stage training pipeline): The pipeline is asserted to leverage both GFS and GEFS data despite time-varying schemes, yet no domain-adaptation layer, scheme-specific covariates, or time-aware conditioning is described. Without these, measured gains on RMSE/CRPS/Brier (and the subsequent AOD permutation ranking) could arise from unmodeled distribution shift rather than genuine aerosol signal or model architecture.
- [Results (aerosol incorporation)] Results (aerosol incorporation): The claim that “AI models effectively incorporate aerosols to produce improved CAPE forecasts” rests on the addition of five AOD fields and a permutation-importance ranking, but the manuscript does not report an ablation that isolates the contribution of the AOD inputs relative to the base diffusion model trained without aerosols.
- [Results (performance metrics)] Results (performance metrics): The abstract states significant outperformance on three scores, yet the manuscript must supply the actual numerical deltas, ensemble size, validation-split details, and error bars so that the magnitude and statistical reliability of the claimed improvements can be evaluated.
minor comments (2)
- [Abstract] Abstract: the phrase “significantly outperforms” should be accompanied by the key numerical improvements so readers can judge effect size without consulting the full text.
- [Notation] Notation: ensure AOD and CAPE are defined at first use in the main text and that all aerosol species are listed consistently between abstract and methods.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable comments. We address each of the major comments below and will revise the manuscript accordingly to improve clarity and completeness.
read point-by-point responses
-
Referee: Methods (two-stage training pipeline): The pipeline is asserted to leverage both GFS and GEFS data despite time-varying schemes, yet no domain-adaptation layer, scheme-specific covariates, or time-aware conditioning is described. Without these, measured gains on RMSE/CRPS/Brier (and the subsequent AOD permutation ranking) could arise from unmodeled distribution shift rather than genuine aerosol signal or model architecture.
Authors: We agree that additional details on the two-stage training pipeline are needed to clarify how it handles the differences in GFS and GEFS schemes. The pipeline pre-trains the diffusion model on the larger historical GFS dataset and then fine-tunes it on the GEFS dataset to adapt to the specific ensemble characteristics. This sequential approach implicitly addresses distribution shifts through the fine-tuning stage. To address the referee's concern, we will expand the methods section to describe the pipeline in more detail, including data selection criteria and training hyperparameters that help mitigate scheme variations. We maintain that the reported improvements are due to the model architecture rather than unmodeled shifts, but the added description will strengthen this claim. revision: yes
-
Referee: Results (aerosol incorporation): The claim that “AI models effectively incorporate aerosols to produce improved CAPE forecasts” rests on the addition of five AOD fields and a permutation-importance ranking, but the manuscript does not report an ablation that isolates the contribution of the AOD inputs relative to the base diffusion model trained without aerosols.
Authors: We acknowledge the value of an ablation study to isolate the contribution of the AOD inputs. While the manuscript shows that adding the five AOD fields leads to improved forecasts and uses permutation importance to rank their effects, a direct comparison to the base model without aerosols is not explicitly reported. We will include this ablation in the revised manuscript, providing performance metrics for the model with and without AOD inputs to quantify the additional gains from aerosol information. revision: yes
-
Referee: Results (performance metrics): The abstract states significant outperformance on three scores, yet the manuscript must supply the actual numerical deltas, ensemble size, validation-split details, and error bars so that the magnitude and statistical reliability of the claimed improvements can be evaluated.
Authors: We appreciate this suggestion for improving the presentation of results. The manuscript includes comparative figures, but we will add explicit numerical values for the improvements in RMSE, CRPS, and Brier score, specify the ensemble size (e.g., number of members generated), detail the validation split used (such as the time periods or geographic regions), and include error bars or statistical significance measures. These details will be incorporated into the results section and, where feasible, the abstract. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper describes an empirical machine-learning pipeline (diffusion model trained in two stages on GFS then GEFS data, with optional AOD inputs) whose central claims are performance improvements measured against external, independent baselines (GFS and GEFS forecasts) on standard metrics (RMSE, CRPS, Brier). No equations, fitted parameters, or self-citations are presented that reduce the reported skill gains or aerosol rankings to the inputs by construction; the two-stage training is a methodological choice whose validity can be tested externally rather than a definitional loop. The derivation chain is therefore self-contained against the stated benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Dong, W., Y. Lin, J. S. Wright, Y. Xie, X. Yin, and J. Guo, 2019: Precipitable water and cape dependence of rainfall intensities in china. Climate Dynamics, 52 (5), 3357--3368, doi:10.1007/s00382-018-4327-8
-
[2]
Ferro, C. A. T., 2014: Fair scores for ensemble forecasts. Quarterly Journal of the Royal Meteorological Society, 140 (683), 1917--1923, doi:https://doi.org/10.1002/qj.2270, https://rmets.onlinelibrary.wiley.com/doi/pdf/10.1002/qj.2270
-
[3]
Fortin, V., M. Abaza, F. Anctil, and R. Turcotte, 2014: Why should ensemble spread match the rmse of the ensemble mean? Journal of Hydrometeorology, 15 (4), 1708 -- 1713, doi:10.1175/JHM-D-14-0008.1
-
[4]
Ginoux, P., M. Chin, I. Tegen, J. M. Prospero, B. Holben, O. Dubovik, and S.-J. Lin, 2001: Sources and distributions of dust aerosols simulated with the gocart model. Journal of Geophysical Research: Atmospheres, 106 (D17), 20\,255--20\,273, doi:https://doi.org/10.1029/2000JD000053, https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2000JD000053
-
[5]
GMAO , 2015: MERRA-2 tavg1\_2d\_aer\_nx: 2d, 1-hourly, time-averaged, single-level, assimilation, aerosol diagnostics v5.12.4. Goddard Earth Sciences Data and Information Services Center ( GES DISC ), Greenbelt, MD, USA, doi:10.5067/KLICLTZ8EM9D
-
[6]
Katzfuss, 2014: Probabilistic forecasting
Gneiting, T., and M. Katzfuss, 2014: Probabilistic forecasting. Annual Review of Statistics and Its Application, 1 (Volume 1, 2014), 125--151, doi:https://doi.org/10.1146/annurev-statistics-062713-085831
-
[7]
Haywood, J., and O. Boucher, 2000: Estimates of the direct and indirect radiative forcing due to tropospheric aerosols: A review. Reviews of Geophysics, 38 (4), 513--543, doi:https://doi.org/10.1029/1999RG000078, https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/1999RG000078
-
[8]
Jain, and P
Ho, J., A. Jain, and P. Abbeel, 2020: Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., Curran Associates, Inc., Vol. 33, 6840--6851, ://proceedings.neurips.cc/paper_files/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf
2020
-
[9]
Classifier-Free Diffusion Guidance
Ho, J., and T. Salimans, 2021: Classifier-free diffusion guidance. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, ://arxiv.org/abs/2207.12598
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[10]
Atmospheric Environment, 51, 165--174, doi:https://doi.org/10.1016/j.atmosenv.2012.01.021
Huang, M., and Coauthors, 2012: Sectoral and geographical contributions to summertime continental united states (conus) black carbon spatial distributions. Atmospheric Environment, 51, 165--174, doi:https://doi.org/10.1016/j.atmosenv.2012.01.021
-
[11]
Jiang, J. H., H. Su, L. Huang, Y. Wang, S. Massie, B. Zhao, A. Omar, and Z. Wang, 2018: Contrasting effects on deep convective clouds by different types of aerosols. Nature Communications, 9 (1), 3874, doi:10.1038/s41467-018-06280-4
-
[12]
Johnson, B. T., J. M. Haywood, and M. K. Hawcroft, 2019: Are changes in atmospheric circulation important for black carbon aerosol impacts on clouds, precipitation, and radiation? Journal of Geophysical Research: Atmospheres, 124 (14), 7930--7950, doi:https://doi.org/10.1029/2019JD030568, https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2019JD030568
-
[13]
Kain, J. S., and J. M. Fritsch, 1993: Convective Parameterization for Mesoscale Models: The Kain-Fritsch Scheme, 165--170. American Meteorological Society, Boston, MA, doi:10.1007/978-1-935704-13-3_16, ://doi.org/10.1007/978-1-935704-13-3_16
-
[14]
Kaltenböck, R., G. Diendorfer, and N. Dotzek, 2009: Evaluation of thunderstorm indices from ecmwf analyses, lightning data and severe storm reports. Atmospheric Research, 93 (1), 381--396, doi:https://doi.org/10.1016/j.atmosres.2008.11.005
-
[15]
Kiehl, J. T., and B. P. Briegleb, 1993: The relative roles of sulfate aerosols and greenhouse gases in climate forcing. Science, 260 (5106), 311--314, doi:10.1126/science.260.5106.311, https://www.science.org/doi/pdf/10.1126/science.260.5106.311
-
[16]
Natural Hazards and Earth System Sciences, 7 (2), 327--342, doi:10.5194/nhess-7-327-2007
Kunz, M., 2007: The skill of convective parameters and indices to predict isolated and severe thunderstorms. Natural Hazards and Earth System Sciences, 7 (2), 327--342, doi:10.5194/nhess-7-327-2007
-
[17]
Learning skillful medium-range global weather forecasting
Lam, R., and Coauthors, 2023: Learning skillful medium-range global weather forecasting. Science, 382 (6677), 1416--1421, doi:10.1126/science.adi2336, https://www.science.org/doi/pdf/10.1126/science.adi2336
-
[18]
Lepore, C., D. Veneziano, and A. Molini, 2015: Temperature and cape dependence of rainfall extremes in the eastern united states. Geophysical Research Letters, 42 (1), 74--83, doi:https://doi.org/10.1002/2014GL062247, https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1002/2014GL062247
-
[19]
Li, L., R. Carver, I. Lopez-Gomez, F. Sha, and J. Anderson, 2024: Generative emulation of weather forecast ensembles with diffusion models. Science Advances, 10 (13), eadk4489, doi:10.1126/sciadv.adk4489, https://www.science.org/doi/pdf/10.1126/sciadv.adk4489
-
[20]
Hutter, 2019: Decoupled weight decay regularization
Loshchilov, I., and F. Hutter, 2019: Decoupled weight decay regularization. 7th International Conference on Learning Representations, ICLR 2019 , OpenReview.net, New Orleans, LA, USA, ://openreview.net/forum?id=Bkg6RiCqY7
2019
-
[21]
Lu, C., Y. Zhou, F. Bao, J. Chen, C. Li, and J. Zhu, 2022: Dpm-solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps. Proceedings of the 36th International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS '22
2022
-
[22]
Lu, C., Y. Zhou, F. Bao, J. Chen, C. Li, and J. Zhu, 2025: Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. Machine Intelligence Research, 22 (4), 730–751, doi:10.1007/s11633-025-1562-4
-
[23]
Malloy, K., and M. K. Tippett, 2025: Forecasting u.s. tornado outbreak activity and associated environments in the global ensemble forecast system (gefs). Weather and Forecasting, 40 (4), 593 -- 608, doi:10.1175/WAF-D-24-0138.1
-
[24]
3rd ed., ://christophm.github.io/interpretable-ml-book
Molnar, C., 2025: Interpretable Machine Learning. 3rd ed., ://christophm.github.io/interpretable-ml-book
2025
-
[25]
://www.emc.ncep.noaa.gov/emc/pages/numerical_forecast_systems/gefs.php, accessed: 2026-03-30, NOAA/NWS/NCEP Environmental Modeling Center
NOAA Environmental Modeling Center , 2024: Global ensemble forecast system ( GEFS ). ://www.emc.ncep.noaa.gov/emc/pages/numerical_forecast_systems/gefs.php, accessed: 2026-03-30, NOAA/NWS/NCEP Environmental Modeling Center
2024
-
[26]
Robins, Andrea Rotnitzky, and Lue Ping Zhao
Politis, D. N., and J. P. Romano, 1994: The stationary bootstrap. Journal of the American Statistical Association, 89 (428), 1303--1313, doi:10.1080/01621459.1994.10476870, https://doi.org/10.1080/01621459.1994.10476870
-
[27]
Nature, 637 (8044), 84--90, doi:10.1038/s41586-024-08252-9
Price, I., and Coauthors, 2025: Probabilistic weather forecasting with machine learning. Nature, 637 (8044), 84--90, doi:10.1038/s41586-024-08252-9
-
[28]
Randall, D. A., D.-M. Pan, P. Ding, and D. G. Cripe, 1997: Quasi-Equilibrium, 359--385. Springer Netherlands, Dordrecht, doi:10.1007/978-94-015-8828-7_14, ://doi.org/10.1007/978-94-015-8828-7_14
-
[29]
Sims, J., D. Koch, H. Tolman, and D. Achuthavarier, 2021: Engaging the forecast community in UFS model development. ://www.ufs.epic.noaa.gov/2021/07/forecasters-workshop/, accessed: 2026-03-28, Unified Forecast System / NOAA Earth Prediction Innovation Center
2021
-
[30]
Meng, and S
Song, J., C. Meng, and S. Ermon, 2021: Denoising diffusion implicit models. International Conference on Learning Representations, ://openreview.net/forum?id=St1giarCHLP
2021
-
[31]
Sun, X., D. Heinzeller, L. Bernardet, L. Pan, W. Li, D. Turner, and J. Brown, 2024: A case study investigating the low summertime cape behavior in the global forecast system. Weather and Forecasting, 39 (1), 3 -- 17, doi:10.1175/WAF-D-22-0208.1
-
[32]
Tang, I. N., A. C. Tridico, and K. H. Fung, 1997: Thermodynamic and optical properties of sea salt aerosols. Journal of Geophysical Research: Atmospheres, 102 (D19), 23\,269--23\,275, doi:https://doi.org/10.1029/97JD01806, https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/97JD01806
-
[33]
Atmospheric Environment (1967), 8 (12), 1251--1256, doi:https://doi.org/10.1016/0004-6981(74)90004-3
Twomey, S., 1974: Pollution and the planetary albedo. Atmospheric Environment (1967), 8 (12), 1251--1256, doi:https://doi.org/10.1016/0004-6981(74)90004-3
-
[34]
Wang, Y., A. Khalizov, M. Levy, and R. Zhang, 2013: New directions: Light absorbing aerosols and their atmospheric impacts. Atmospheric Environment, 81, 713--715, doi:https://doi.org/10.1016/j.atmosenv.2013.09.034
-
[35]
Ye, B., A. D. D. Genio, and K. K.-W. Lo, 1998: Cape variations in the current climate and in a climate change. Journal of Climate, 11 (8), 1997 -- 2015, doi:10.1175/1520-0442(1998)011<1997:CVITHCC>2.0.CO;2
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.