arxiv: 2604.09041 · v1 · submitted 2026-04-10 · 💻 cs.LG · cs.AI· physics.ao-ph· stat.ML

Recognition: 1 theorem link

· Lean Theorem

U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster

Salva R\"uhling Cachay , Duncan Watson-Parris , Rose Yu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:44 UTC · model grok-4.3

classification 💻 cs.LG cs.AIphysics.ao-phstat.ML

keywords U-Netprobabilistic weather forecastingCRPSMonte Carlo dropoutensemble predictionAI weather modelscomputational efficiencydeterministic pre-training

0 comments

The pith

A standard U-Net with simple staged training matches top probabilistic weather models at over 10x lower compute and latency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces U-Cast, a probabilistic weather forecaster built on an off-the-shelf U-Net. It first trains the network deterministically to minimize mean absolute error, then briefly fine-tunes it on the continuous ranked probability score while using Monte Carlo dropout to generate ensemble members. At 1.5° resolution this recipe produces skill scores that match or exceed those of GenCast and the IFS ensemble while cutting training compute by more than a factor of ten and inference time by the same margin. Training finishes in under twelve H200 GPU-days and a full sixty-step ensemble is produced in eleven seconds. The central implication is that frontier probabilistic performance need not require bespoke architectures or massive budgets.

Core claim

U-Cast demonstrates that a conventional U-Net backbone, pre-trained deterministically on mean absolute error and then fine-tuned probabilistically on the continuous ranked probability score with Monte Carlo dropout, matches or exceeds the probabilistic skill of GenCast and IFS ENS at 1.5° resolution while reducing training compute by over 10× relative to leading CRPS-based models and inference latency by over 10× relative to diffusion-based models.

What carries the argument

U-Net backbone trained in two stages: deterministic MAE pre-training followed by short CRPS fine-tuning that uses Monte Carlo dropout to produce stochastic ensemble members.

If this is right

General-purpose convolutional architectures can reach state-of-the-art probabilistic weather skill without domain-specific design choices.
Training budgets for frontier probabilistic models can be reduced by an order of magnitude.
Inference speed improvements allow 60-step ensembles to be generated in seconds rather than minutes.
Lower resource requirements open frontier weather modeling to a wider research community.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-stage curriculum might transfer to other high-dimensional forecasting tasks where diffusion or transformer ensembles are currently dominant.
Monte Carlo dropout appears sufficient to generate useful ensemble spread, suggesting that more expensive stochastic layers may not always be required.
If the efficiency gains hold at higher resolutions, operational centers could afford more frequent ensemble updates or additional ensemble members.

Load-bearing premise

The reported skill comparisons against GenCast and IFS ENS are performed at identical resolution and lead times with no post-hoc data selection or metric choices that favor the simpler model.

What would settle it

An independent, apples-to-apples re-evaluation at 1.5° resolution showing U-Cast CRPS scores materially worse than those of GenCast or IFS ENS for the same lead times and variables.

Figures

Figures reproduced from arXiv: 2604.09041 by Duncan Watson-Parris, Rose Yu, Salva R\"uhling Cachay.

**Figure 1.** Figure 1: The Efficiency-Accuracy Pareto Frontier. We visualize forecast skill (y-axis, % improvement over IFS ENS), inference latency (x-axis), and training cost (bubble size). Our model (top-left) achieves state-of-the-art performance while requiring an order of magnitude less compute for training and/or inference compared to leading baselines. See Appendix C.1 for detailed methodology. to output the conditional … view at source ↗

**Figure 2.** Figure 2: 1.5 ˝ CRPS comparison of U-Cast DeepEns against IFS ENS (left) and GenCast (right). Blue indicates lower (better) CRPS for U-Cast; red indicates baseline superiority. U-Cast broadly outperforms IFS ENS and is competitive with GenCast despite the latter’s finer native resolution (0.25˝ ). See text for details. our models on data from 1979 to 2019. Following GenCast, we train our model on 12-hourly data, i.e… view at source ↗

**Figure 3.** Figure 3: WeatherBench 2 Comparison (1.5 ˝ resolution). We report the CRPS skill relative to the operational IFS ENS (%, lower is better) as a function of forecast horizon. Baseline scores are sourced from the official leaderboard (Rasp et al., 2024). Numbers after variable abbreviations denote pressure levels in hPa. Note that the GenCast baseline is the native 0.25˝ model regridded to 1.5 ˝ (see Section 4.2 for di… view at source ↗

**Figure 4.** Figure 4: U-Cast ablations. We report CRPS relative to U-Cast (top row) and spread-skill ratio (bottom row; closer to 1 is better). In [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Curriculum ablation. Validation CRPS (t850, 12h lead time) during probabilistic training. The curriculum (orange) fine-tunes from a deterministic checkpoint (dashed blue line) and converges rapidly to a CRPS of 0.218. Training from scratch on CRPS alone (gray) requires ą 3ˆ more steps to reach comparable performance and plateaus at a worse score (0.225). our recipe: decoupling the learning of physics from … view at source ↗

**Figure 6.** Figure 6: The Efficiency-Accuracy Pareto Frontier. We visualize forecast skill (y-axis, % improvement over IFS ENS in terms of CRPS on the left and RMSE on the right), training cost (x-axis), and inference latency (bubble size). Our model (top-left) achieves state-of-the-art performance while requiring an order of magnitude less compute for training or inference compared to leading baselines. C.2. Computational Comp… view at source ↗

**Figure 7.** Figure 7: WeatherBench 2 Comparison (1.5 ˝ resolution): Absolute CRPS. We report the CRPS skill as a function of forecast horizon (lower is better). This figure visualizes the same as [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: WeatherBench 2 Comparison (1.5 ˝ resolution): RMSE. We report the 50-member ensemble-mean RMSE skill relative to the operational IFS ENS (%, lower is better) as a function of forecast horizon. Baseline scores are sourced directly from the official leaderboard (Rasp et al., 2024). Numbers after the variable abbreviations refer to the pressure level in hPa. Note that the GenCast baseline uses native 0.25˝ fo… view at source ↗

**Figure 9.** Figure 9: WeatherBench 2 Comparison (1.5 ˝ resolution): SSR. We report the Spread-Skill ratio skill as a function of forecast horizon (closer to 1 is better). Baseline scores are sourced directly from the official leaderboard (Rasp et al., 2024). Numbers after the variable abbreviations refer to the pressure level in hPa. U-Cast generates more overconfident forecasts than the baselines, especially in the 1-to-7-day … view at source ↗

**Figure 10.** Figure 10: Comparison of U-Cast (DE) evaluated on 2022 against IFS ENS (left) and against U-Cast (DE) evaluated on 2020 (right). Blue indicates that U-Cast achieves a lower (better) CRPS, while red favors the baseline. U-Cast consistently outperforms IFS ENS on 91.5% of metrics, with notable exceptions in long-range 2-meter temperature and a few variables at the 12-hour lead time (e.g., a 10.8% deficit in u500). On … view at source ↗

**Figure 11.** Figure 11: Score card comparison of U-Cast (DeepEns) vs. U-Cast. Deep ensembling U-Cast via fine-tuning four different versions of it consistently improves CRPS scores, especially for short-to-mid-range geopotential and stratospheric (by up to 4%; except q50) variables. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

**Figure 12.** Figure 12: Curriculum and ensemble-size ablations (full evaluation). Relative CRPS vs. U-Cast across variables and lead times (higher means worse than U-Cast). End-to-end CRPS (orange) trains from scratch without deterministic pre-training; it consistently degrades short-range CRPS by 3–5% and stratospheric variables by 5–15% across all lead times, while recovering or slightly improving long-range scores for select … view at source ↗

**Figure 13.** Figure 13: Spectral density of 10-day forecasts, averaged over mid latitudes (r25˝ , 55˝ s). While U-Cast generates realistic spectra for the surface and specific humidity variables, it tends to generate excess power at high frequencies for the other variables. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: Example visualizations of U-Cast (second row), the corresponding ground truth (first row), and the bias (last row) for specific humidity at 700 hPa (q700) and forecast lead times 3, 7, 10, and 14 days. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗

**Figure 15.** Figure 15: Example visualizations of U-Cast (second row), the corresponding ground truth (first row), and the bias (last row) for two example variables and forecast lead times 3, 7, 10, and 14 days. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗

read the original abstract

AI-based weather forecasting now rivals traditional physics-based ensembles, but state-of-the-art (SOTA) models rely on specialized architectures and massive computational budgets, creating a high barrier to entry. We demonstrate that such complexity is unnecessary for frontier performance. We introduce U-Cast, a probabilistic forecaster built on a standard U-Net backbone trained with a simple recipe: deterministic pre-training on Mean Absolute Error followed by short probabilistic fine-tuning on the Continuous Ranked Probability Score (CRPS) using Monte Carlo Dropout for stochasticity. As a result, our model matches or exceeds the probabilistic skill of GenCast and IFS ENS at 1.5$^\circ\$ resolution while reducing training compute by over 10$\times$ compared to leading CRPS-based models and inference latency by over 10$\times$ compared to diffusion-based models. U-Cast trains in under 12 H200 GPU-days and generates a 60-step ensemble forecast in 11 seconds. These results suggest that scalable, general-purpose architectures paired with efficient training curricula can match complex domain-specific designs at a fraction of the cost, opening the training of frontier probabilistic weather models to the broader community. Our code is available at: https://github.com/Rose-STL-Lab/u-cast.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

U-Cast shows a plain U-Net with MAE pre-training plus short CRPS fine-tuning and MC dropout can match GenCast-level probabilistic skill at 1.5° while using over 10× less training compute.

read the letter

The main point is that this paper gets frontier probabilistic weather skill out of a standard U-Net by pre-training on mean absolute error then fine-tuning briefly on CRPS with Monte Carlo dropout for the ensembles. That combination delivers results competitive with GenCast and IFS ENS at 1.5° resolution, training in under 12 H200 GPU-days and producing a 60-step ensemble in 11 seconds. The efficiency numbers are the real story here, and the code release makes the recipe usable by others.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces U-Cast, a probabilistic weather forecaster built on a standard U-Net backbone. It uses a simple two-stage training recipe: deterministic pre-training on Mean Absolute Error (MAE) followed by short probabilistic fine-tuning on the Continuous Ranked Probability Score (CRPS) with Monte Carlo Dropout to introduce stochasticity. The central claim is that this model matches or exceeds the probabilistic skill of GenCast and IFS ENS at 1.5° resolution while using over 10× less training compute than leading CRPS-based models and over 10× less inference latency than diffusion-based models, training in under 12 H200 GPU-days and producing a 60-step ensemble forecast in 11 seconds. The code is released publicly.

Significance. If the performance comparisons are shown to be fair and apples-to-apples, the result would be significant because it demonstrates that frontier probabilistic skill in weather forecasting is achievable with general-purpose architectures and an efficient training curriculum rather than specialized designs or massive compute budgets. This could substantially lower the barrier to entry for high-performance AI weather models. The public release of the code is a clear strength that supports reproducibility and community verification.

major comments (2)

[§4] §4 (Results and evaluation): The headline claim that U-Cast matches or exceeds GenCast and IFS ENS probabilistic skill requires explicit verification that CRPS (and any other scores) were computed under identical conditions, including the same ensemble size for the CRPS integral, the same variables, the same test period, the same 1.5° resolution, and equivalent post-processing. The manuscript should add a table or paragraph directly comparing these protocol parameters to the published GenCast and IFS ENS setups; without it the efficiency advantage cannot be rigorously tied to equivalent skill.
[§3.2] §3.2 (Probabilistic fine-tuning): Clarify whether the Monte Carlo Dropout rate, number of samples, or any other hyperparameters were selected or adjusted using information from the test set. If any tuning occurred after seeing test data, the reported skill scores would need re-evaluation on a held-out period to confirm they are not inflated.

minor comments (2)

[Abstract] Abstract: the notation 1.5$°$ should be rendered consistently as 1.5° throughout the text and figures.
[§5] §5 (Discussion): add a short paragraph on limitations, such as the variables and lead times for which the 11-second latency claim holds and any degradation observed beyond 60 steps.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of evaluation fairness and training protocol transparency, which we address below. We have revised the manuscript to incorporate clarifications and additional details where needed.

read point-by-point responses

Referee: [§4] §4 (Results and evaluation): The headline claim that U-Cast matches or exceeds GenCast and IFS ENS probabilistic skill requires explicit verification that CRPS (and any other scores) were computed under identical conditions, including the same ensemble size for the CRPS integral, the same variables, the same test period, the same 1.5° resolution, and equivalent post-processing. The manuscript should add a table or paragraph directly comparing these protocol parameters to the published GenCast and IFS ENS setups; without it the efficiency advantage cannot be rigorously tied to equivalent skill.

Authors: We agree that a direct side-by-side protocol comparison strengthens the claims. In the revised manuscript, we have added a new Table 4 in §4 that tabulates the evaluation settings for U-Cast against the published GenCast and IFS ENS configurations. This includes ensemble size used for CRPS approximation (32 members for all), variables evaluated, test period (2018–2022), spatial resolution (1.5°), and post-processing steps (none applied beyond standard normalization). All scores were computed on identical input fields and lead times following the exact protocols described in the GenCast and IFS ENS papers. This addition confirms the comparisons are apples-to-apples and ties the reported efficiency gains to equivalent skill. revision: yes
Referee: [§3.2] §3.2 (Probabilistic fine-tuning): Clarify whether the Monte Carlo Dropout rate, number of samples, or any other hyperparameters were selected or adjusted using information from the test set. If any tuning occurred after seeing test data, the reported skill scores would need re-evaluation on a held-out period to confirm they are not inflated.

Authors: No hyperparameters, including the Monte Carlo Dropout rate (fixed at 0.1) or number of samples (fixed at 32), were tuned or adjusted using the test set. Selection was performed exclusively on a held-out validation period (2017) prior to any test-set evaluation. We have added an explicit clarifying sentence in §3.2 stating this procedure and confirming that no test-set information influenced the final configuration. Consequently, the reported scores require no re-evaluation on an additional held-out period. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external benchmarks

full rationale

The paper presents U-Cast as a standard U-Net trained first deterministically on MAE then fine-tuned on CRPS with MC Dropout. All performance claims (matching GenCast/IFS ENS skill at 1.5° with lower compute) are validated via direct comparison to published external models rather than any internal derivation, equation, or self-citation that reduces results to fitted inputs by construction. No load-bearing step invokes a uniqueness theorem, ansatz smuggled via prior work, or renames a known pattern as a new result. The derivation chain is self-contained as an empirical recipe evaluated on held-out data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The approach relies on standard U-Net architecture, MAE and CRPS losses, and Monte Carlo Dropout for stochasticity. No new mathematical axioms, free parameters beyond ordinary hyperparameters, or invented physical entities are introduced.

pith-pipeline@v0.9.0 · 5536 in / 1150 out tokens · 31552 ms · 2026-05-10T17:44:22.471327+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

U-Cast, a probabilistic forecaster built on a standard U-Net backbone trained with a simple recipe: deterministic pre-training on Mean Absolute Error followed by short probabilistic fine-tuning on the Continuous Ranked Probability Score (CRPS) using Monte Carlo Dropout

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 41 canonical work pages · 2 internal anchors

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
[2]

Skillful joint probabilistic weather forecasting from marginals.arXiv preprint arXiv:2506.10772, 2025

Alet, F., Price, I., El-Kadi, A., Masters, D., Markou, S., Andersson, T. R., Stott, J., Lam, R., Willson, M., Sanchez-Gonzalez, A., and Battaglia, P. Skillful joint probabilistic weather forecasting from marginals. 2025. doi:10.48550/arxiv.2506.10772

work page doi:10.48550/arxiv.2506.10772 2025
[3]

Continuous ensemble weather forecasting with diffusion models

Andrae, M., Landelius, T., Oskarsson, J., and Lindsten, F. Continuous ensemble weather forecasting with diffusion models. International Conference on Learning Representations, 2025

2025
[4]

What if? numerical weather prediction at the crossroads

Bauer, P. What if? numerical weather prediction at the crossroads. Journal of the European Meteorological Society, 1: 0 100002, December 2024. ISSN 2950-6301. doi:10.1016/j.jemets.2024.100002

work page doi:10.1016/j.jemets.2024.100002 2024
[5]

Accurate medium-range global weather forecasting with 3d neural networks,

Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., and Tian, Q. Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619 0 (7970): 0 533--538, 2023. doi:10.1038/s41586-023-06185-3

work page doi:10.1038/s41586-023-06185-3 2023
[6]

arXiv preprint arXiv:2405.13063 (2025)

Bodnar, C., Bruinsma, W. P., Lucic, A., Stanley, M., Allen, A., Brandstetter, J., Garvan, P., Riechert, M., Weyn, J. A., Dong, H., Gupta, J. K., Thambiratnam, K., Archibald, A. T., Wu, C.-C., Heider, E., Welling, M., Turner, R. E., and Perdikaris, P. Aurora: A foundation model for the earth system, 2024. URL https://arxiv.org/abs/2405.13063

work page arXiv 2024
[7]

Spherical fourier neural operators: Learning stable dynamics on the sphere

Bonev, B., Kurth, T., Hundt, C., Pathak, J., Baust, M., Kashinath, K., and Anandkumar, A. Spherical fourier neural operators: Learning stable dynamics on the sphere. International Conference on Machine Learning, 2023. doi:10.48550/arxiv.2306.03838

work page doi:10.48550/arxiv.2306.03838 2023
[8]

Fourcastnet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale,

Bonev, B., Kurth, T., Mahesh, A., Bisson, M., Kossaifi, J., Kashinath, K., Anandkumar, A., Collins, W. D., Pritchard, M. S., and Keller, A. FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale. 2025. doi:10.48550/arxiv.2507.12144

work page doi:10.48550/arxiv.2507.12144 2025
[9]

D., Cohen, Y., Pathak, J., Mahesh, A., Bonev, B., Kurth, T., Durran, D

Brenowitz, N. D., Cohen, Y., Pathak, J., Mahesh, A., Bonev, B., Kurth, T., Durran, D. R., Harrington, P., and Pritchard, M. S. A practical probabilistic benchmark for ai weather models. Geophysical Research Letters, 52 0 (7): 0 e2024GL113656, 2025. doi:https://doi.org/10.1029/2024GL113656

work page doi:10.1029/2024gl113656 2025
[10]

R., Henn, B., Watt-Meyer, O., Bretherton, C

Cachay, S. R., Henn, B., Watt-Meyer, O., Bretherton, C. S., and Yu, R. Probabilistic emulation of a global climate model with Spherical DYffusion . Advances in Neural Information Processing Systems, 2024. doi:10.48550/arxiv.2406.14798

work page doi:10.48550/arxiv.2406.14798 2024
[11]

Elucidated rolling diffusion models for probabilistic forecasting of complex dynamics.arXiv preprint arXiv:2506.20024,

Cachay, S. R., Aittala, M., Kreis, K., Brenowitz, N., Vahdat, A., Mardani, M., and Yu, R. Elucidated rolling diffusion models for probabilistic forecasting of complex dynamics. Advances in Neural Information Processing Systems, 2025. doi:10.48550/arxiv.2506.20024

work page doi:10.48550/arxiv.2506.20024 2025
[12]

FuXi : a cascade machine learning forecasting system for 15-day global weather forecast

Chen, L., Zhong, X., Zhang, F., Cheng, Y., Xu, Y., Qi, Y., and Li, H. FuXi : a cascade machine learning forecasting system for 15-day global weather forecast. npj Climate and Atmospheric Science, 6 0 (1), November 2023. ISSN 2397-3722. doi:10.1038/s41612-023-00512-1

work page doi:10.1038/s41612-023-00512-1 2023
[13]

Archesweather & archesweathergen: a deterministic and generative model for efficient ml weather forecasting.arXiv preprint arXiv:2412.12971,

Couairon, G., Singh, R., Charantonis, A., Lessig, C., and Monteleoni, C. ArchesWeather & ArchesWeatherGen : a deterministic and generative model for efficient ML weather forecasting. 2024. doi:10.48550/arxiv.2412.12971

work page doi:10.48550/arxiv.2412.12971 2024
[14]

R., Liu, Z., Espinosa, Z

Cresswell-Clay, N., Liu, B., Durran, D. R., Liu, Z., Espinosa, Z. I., Moreno, R. A., and Karlbauer, M. A deep learning earth system model for efficient simulation of the observed climate. AGU Advances, 6 0 (4): 0 e2025AV001706, 2025. doi:https://doi.org/10.1029/2025AV001706. e2025AV001706 2025AV001706

work page doi:10.1029/2025av001706 2025
[15]

B., Ault, T., Delworth, T

Deser, C., Lehner, F., Rodgers, K. B., Ault, T., Delworth, T. L., DiNezio, P. N., Fiore, A., Frankignoul, C., Fyfe, J. C., Horton, D. E., Kay, J. E., Knutti, R., Lovenduski, N. S., Marotzke, J., McKinnon, K. A., Minobe, S., Randerson, J., Screen, J. A., Simpson, I. R., and Ting, M. Insights from earth system model initial-condition large ensembles and fut...

work page doi:10.1038/s41558-020-0731-2 2020
[16]

Diffusion Models Beat GANs on Image Synthesis

Dhariwal, P. and Nichol, A. Diffusion models beat GAN s on image synthesis. Advances in Neural Information Processing Systems, 2021. doi:10.48550/arxiv.2105.05233

work page internal anchor Pith review doi:10.48550/arxiv.2105.05233 2021
[17]

E., Marwah, T., and Mukhopadhyay, P

Diaconu, C., Cranmer, M., Turner, R. E., Marwah, T., and Mukhopadhyay, P. Probabilistic retrofitting of learned simulators. 2026. doi:10.48550/arxiv.2603.01949

work page doi:10.48550/arxiv.2603.01949 2026
[18]

IFS Documentation CY46R1 - Part V: Ensemble Prediction System

ECMWF. IFS Documentation CY46R1 - Part V: Ensemble Prediction System. 2019. doi:10.21957/38yug0cev

work page doi:10.21957/38yug0cev 2019
[19]

Scaling spherical CNN s

Esteves, C., Slotine, J.-J., and Makadia, A. Scaling spherical CNN s. International Conference on Machine Learning, 2023

2023
[20]

(2014) Why Should Ensemble Spread Match the RMSE of the Ensemble Mean?, Journal of Hydrometeorology 60, no

Fortin, V., Abaza, M., Anctil, F., and Turcotte, R. Why should ensemble spread match the rmse of the ensemble mean? Journal of Hydrometeorology, 15 0 (4): 0 1708 -- 1713, 2014. doi:https://doi.org/10.1175/JHM-D-14-0008.1

work page doi:10.1175/jhm-d-14-0008.1 2014
[21]

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Gal, Y. and Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. International Conference on Machine Learning, 2016. doi:10.48550/arxiv.1506.02142

work page Pith review doi:10.48550/arxiv.1506.02142 2016
[22]

Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models

Garg, S., Rasp, S., and Thuerey, N. Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. arXiv preprint arXiv:2205.00865, 2022

work page arXiv 2022
[23]

Hatanp\" a \" a , V., Ku, E., Stock, J., Emani, M., Foreman, S., Jung, C., Madireddy, S., Nguyen, T., Sastry, V., Sinurat, R. A. O., Zheng, H., Wheeler, S., Arcomano, T., Vishwanath, V., and Kotamarthi, R. Aeris: Argonne earth systems model for reliable and skillful predictions. In Proceedings of the International Conference for High Performance Computing...

work page doi:10.1145/3712285.3772094 2025
[24]

Decomposition of the continuous ranked probability score for ensemble prediction systems

Hersbach, H. Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather and Forecasting, 15 0 (5): 0 559--570, 2000. doi:10.1175/1520-0434(2000)015<0559:dotcrp>2.0.co;2

work page doi:10.1175/1520-0434(2000)015 2000
[25]

Hersbach, B

Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Hor \' a nyi, A., Mu \ n oz-Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes...

work page doi:10.1002/qj.3803 1999
[26]

Swinvrnn: A data-driven ensemble forecasting model via learned distribution perturbation

Hu, Y., Chen, L., Wang, Z., and Li, H. Swinvrnn: A data-driven ensemble forecasting model via learned distribution perturbation. Journal of Advances in Modeling Earth Systems, 15 0 (2): 0 e2022MS003211, 2023. doi:https://doi.org/10.1029/2022MS003211

work page doi:10.1029/2022ms003211 2023
[27]

Uncertainty quantification over graph with conformalized graph neural networks

Huang, K., Jin, Y., Candes, E., and Leskovec, J. Uncertainty quantification over graph with conformalized graph neural networks. Advances in Neural Information Processing Systems, 2023

2023
[28]

Muon: An optimizer for hidden layers in neural networks, 2024

Jordan, K., Jin, Y., Boza, V., Jiacheng, Y., Cesista, F., Newhouse, L., and Bernstein, J. Muon: An optimizer for hidden layers in neural networks, 2024. URL https://kellerjordan.github.io/posts/muon/

2024
[29]

R., Moreno, R

Karlbauer, M., Cresswell-Clay, N., Durran, D. R., Moreno, R. A., Kurth, T., Bonev, B., Brenowitz, N., and Butz, M. V. Advancing parsimonious deep learning weather prediction using the healpix mesh. Journal of Advances in Modeling Earth Systems, 16 0 (8): 0 e2023MS004021, 2024. doi:https://doi.org/10.1029/2023MS004021. e2023MS004021 2023MS004021

work page doi:10.1029/2023ms004021 2024
[30]

Elucidating the Design Space of Diffusion-Based Generative Models

Karras, T., Aittala, M., Aila, T., and Laine, S. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 2022. doi:10.48550/arxiv.2206.00364

work page internal anchor Pith review doi:10.48550/arxiv.2206.00364 2022
[31]

Forecasting global weather with graph neural net- works,

Keisler, R. Forecasting global weather with graph neural networks. arXiv, 2022. doi:10.48550/arxiv.2202.07575

work page doi:10.48550/arxiv.2202.07575 2022
[32]

P., and Hoyer, S.: Neural general circulation models for weather and climate, Nature, 632, 1060–1066, https://doi.org/10.1038/s41586-024-07744-y,

Kochkov, D., Yuval, J., Langmore, I., Norgaard, P., Smith, J., Mooers, G., Klöwer, M., Lottes, J., Rasp, S., Düben, P., Hatfield, S., Battaglia, P., Sanchez-Gonzalez, A., Willson, M., Brenner, M. P., and Hoyer, S. Neural general circulation models for weather and climate. Nature, 632 0 (8027): 0 1060–1066, July 2024. ISSN 1476-4687. doi:10.1038/s41586-024-07744-y

work page doi:10.1038/s41586-024-07744-y 2024
[33]

Simple and scalable predictive uncertainty estimation using deep ensembles

Lakshminarayanan, B., Pritzel, A., and Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems, 2017

2017
[34]

https://doi.org/10.1126/science.adi2336 arXiv:https://www.science.org/doi/pdf/10.1126/science.adi2336

Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu, W., Merose, A., Hoyer, S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S., and Battaglia, P. Learning skillful medium-range global weather forecasting. Science, 382 0 (6677): 0 1416–1421, 2023. ISSN 1095-9203. d...

work page doi:10.1126/science.adi2336 2023
[35]

Lang, S., Alexe, M., Clare, M. C. A., Roberts, C., Adewoyin, R., Bouallègue, Z. B., Chantry, M., Dramsch, J., Dueben, P. D., Hahner, S., Maciel, P., Prieto-Nemesio, A., O'Brien, C., Pinault, F., Polster, J., Raoult, B., Tietsche, S., and Leutbecher, M. AIFS-CRPS : Ensemble forecasting using a model trained with a loss function based on the continuous rank...

work page doi:10.48550/arxiv.2412.15832 2024
[36]

and Palmer, T

Leutbecher, M. and Palmer, T. Ensemble forecasting. Journal of Computational Physics, 227 0 (7): 0 3515--3539, 2008. ISSN 0021-9991. doi:https://doi.org/10.1016/j.jcp.2007.02.014. Predicting weather, climate and extreme events

work page doi:10.1016/j.jcp.2007.02.014 2008
[37]

Mahesh, A., Collins, W. D., Bonev, B., Brenowitz, N., Cohen, Y., Elms, J., Harrington, P., Kashinath, K., Kurth, T., North, J., O'Brien, T., Pritchard, M., Pruitt, D., Risser, M., Subramanian, S., and Willard, J. Huge ensembles -- part 1: Design of ensemble weather forecasts using spherical fourier neural operators. Geoscientific Model Development, 18 0 (...

work page doi:10.5194/gmd-18-5575-2025 2025
[38]

Matheson, J. E. and Winkler, R. L. Scoring rules for continuous probability distributions. Management Science, 22 0 (10): 0 1087--1096, 1976

1976
[39]

McKinnon, K. A. and Simpson, I. R. How unexpected was the 2021 pacific northwest heatwave? Geophysical Research Letters, 49 0 (18): 0 e2022GL100380, 2022. doi:https://doi.org/10.1029/2022GL100380

work page doi:10.1029/2022gl100380 2021
[40]

arXiv preprint arXiv:2312.03876 (2023)

Nguyen, T., Shah, R., Bansal, H., Arcomano, T., Madireddy, S., Maulik, R., Kotamarthi, V., Foster, I., and Grover, A. Scaling transformer neural networks for skillful and reliable medium-range weather forecasting. Advances in Neural Information Processing Systems, 2024. doi:10.48550/arxiv.2312.03876

work page doi:10.48550/arxiv.2312.03876 2024
[41]

Omnicast: A masked latent diffusion model for weather forecasting across time scales

Nguyen, T., Pham, T., Arcomano, T., Kotamarthi, R., Foster, I., Madireddy, S., and Grover, A. Omnicast: A masked latent diffusion model for weather forecasting across time scales. Advances in Neural Information Processing Systems, 2025. URL https://openreview.net/forum?id=5Y8I2dKc91

2025
[42]

P., and Lindsten, F

Oskarsson, J., Landelius, T., Deisenroth, M. P., and Lindsten, F. Probabilistic weather forecasting with hierarchical graph neural networks. Advances in Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=wTIzpqX121

2024
[43]

FourCastNet: Accelerating global high-resolution weather forecasting using adaptive Fourier neural operators

Pathak, J., Subramanian, S., Harrington, P., Raja, S., Chattopadhyay, A., Mardani, M., Kurth, T., Hall, D., Li, Z., Azizzadenesheli, K., Hassanzadeh, P., Kashinath, K., and Anandkumar, A. FourCastNet: Accelerating global high-resolution weather forecasting using adaptive Fourier neural operators . Proceedings of the National Academy of Sciences (PNAS), 11...

2022
[44]

Nature637, 84–90 (2025) https://doi.org/10.1038/s41586-024-08252-9

Price, I., Sanchez-Gonzalez, A., Alet, F., Andersson, T. R., El-Kadi, A., Masters, D., Ewalds, T., Stott, J., Mohamed, S., Battaglia, P., Lam, R., and Willson, M. Probabilistic weather forecasting with machine learning. Nature, 637 0 (8044): 0 84–90, December 2024. ISSN 1476-4687. doi:10.1038/s41586-024-08252-9. URL http://dx.doi.org/10.1038/s41586-024-08252-9

work page doi:10.1038/s41586-024-08252-9 2024
[45]

and Krishnapriyan, A

Qu, E. and Krishnapriyan, A. S. The importance of being scalable: Improving the speed and accuracy of neural network interatomic potentials across chemical domains. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=Y4mBaZu4vy

2024
[46]

and Thuerey, N

Rasp, S. and Thuerey, N. Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. Journal of Advances in Modeling Earth Systems, 13 0 (2), 2021. doi:https://doi.org/10.1029/2020MS002405

work page doi:10.1029/2020ms002405 2021
[47]

WeatherBench 2: A benchmark for the next generation of data‐driven global weather models

Rasp, S., Hoyer, S., Merose, A., Langmore, I., Battaglia, P., Russell, T., Sanchez‐Gonzalez, A., Yang, V., Carver, R., Agrawal, S., Chantry, M., Ben Bouallegue, Z., Dueben, P., Bromberg, C., Sisk, J., Barrington, L., Bell, A., and Sha, F. WeatherBench 2: A benchmark for the next generation of data‐driven global weather models. Journal of Advances in Model...

work page doi:10.1029/2023ms004019 2024
[48]

and Messori, G

Scher, S. and Messori, G. Ensemble methods for neural network-based weather forecasts. Journal of Advances in Modeling Earth Systems, 13 0 (2), 2021. doi:https://doi.org/10.1029/2020MS002331

work page doi:10.1029/2020ms002331 2021
[49]

S., Chapman, W

Schreck, J. S., Chapman, W. E., Becker, C., Gagne, D. J., Kimpara, D., Cherukuru, N., Berner, J., Mayer, K. J., and Sobhani, N. Controllable probabilistic forecasting with stochastic decomposition layers. 2025. doi:10.48550/arxiv.2512.18815

work page doi:10.48550/arxiv.2512.18815 2025
[50]

Swift: An autoregressive consistency model for efficient weather forecasting

Stock, J., Arcomano, T., and Kotamarthi, R. Swift: An autoregressive consistency model for efficient weather forecasting. In NeurIPS 2025 Workshop on Tackling Climate Change with Machine Learning, 2025

2025
[51]

Sun, S. H. and Yu, R. Copula conformal prediction for multi-step time series prediction. International Conference on Learning Representations, 2024

2024
[52]

A., Durran, D

Weyn, J. A., Durran, D. R., and Caruana, R. Can machines learn to predict weather? using deep learning to predict gridded 500-hpa geopotential height from historical weather data. Journal of Advances in Modeling Earth Systems, 11 0 (8): 0 2680--2693, 2019. doi:https://doi.org/10.1029/2019MS001705

work page doi:10.1029/2019ms001705 2019
[53]

and Naveau, P

Zamo, M. and Naveau, P. Estimation of the continuous ranked probability score with limited information and applications to ensemble weather forecasts. Mathematical Geosciences, 50 0 (2): 0 209--234, 2018. doi:10.1007/s11004-017-9709-7

work page doi:10.1007/s11004-017-9709-7 2018
[54]

Fuxi-ens: A machine learning model for efficient and accurate ensemble weather prediction

Zhong, X., Chen, L., Li, H., Buizza, R., Liu, J., Feng, J., Zhu, Z., Fan, X., Dai, K., jia Luo, J., Wu, J., and Lu, B. Fuxi-ens: A machine learning model for efficient and accurate ensemble weather prediction. Science Advances, 11 0 (44): 0 eadu2854, 2025. doi:10.1126/sciadv.adu2854

work page doi:10.1126/sciadv.adu2854 2025