arxiv: 2601.17636 · v2 · submitted 2026-01-25 · ⚛️ physics.ao-ph

Recognition: 2 theorem links

· Lean Theorem

HealDA: Highlighting the importance of initial errors in end-to-end AI weather forecasts

Aayush Gupta (1) , Akshay Subramaniam (1) , Michael S. Pritchard (1) , Karthik Kashinath (1) , Sergey Frolov (2) , Kelsey Lieberman (3) , Christopher Miller (3) , Nicholas Silverman (3)

show 3 more authors

Noah D. Brenowitz (1) ((1) NVIDIA Corporation (2) NOAA (3) MITRE Corporation)

Authors on Pith no claims yet

Pith reviewed 2026-05-16 11:50 UTC · model grok-4.3

classification ⚛️ physics.ao-ph

keywords machine learningdata assimilationweather forecastinginitial conditionsforecast skillHEALPixAI weather models

0 comments

The pith

A simple machine learning data assimilation system provides initial conditions for off-the-shelf AI weather models that lose less than one day of lead time against ERA5.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HealDA as a neural network that directly maps a short window of satellite and conventional observations to a global 1-degree atmospheric state on the HEALPix grid. When these analyses initialize various existing ML forecast models such as FourCastNet3, Aurora, and FengWu without any retraining, the resulting forecasts trail those started from ERA5 by under one day of effective lead time. Forecast error growth rates remain identical to those from traditional initial conditions, so the skill difference traces back to larger starting errors in the HealDA analyses. Spectral analysis shows these initial errors concentrate in large scales and upper-tropospheric fields because of overfitting during training. Small changes to the verification setup alone can shift the apparent skill gap by 12 to 24 hours.

Core claim

HealDA functions strictly as a data assimilation module whose analyses initialize off-the-shelf ML forecast models. For models including FCN3, Aurora, and FengWu, these initialized forecasts lose less than one day of lead time when scored against ERA5, while FCN3 ensembles trail the ECMWF IFS ENS system by less than 24 hours. Forecast error growth stays unchanged from HealDA initialization, and the skill gap arises primarily from larger initial errors that spectral analysis attributes to overfitting on large scales and upper-tropospheric fields.

What carries the argument

HealDA, the direct observation-to-state neural network that converts a short window of observations into a 1° HEALPix atmospheric analysis without iterative steps.

If this is right

Error growth rates in the ML forecast models stay the same whether initialized by HealDA or by NWP analyses.
The skill gap originates mainly from higher initial errors concentrated at large scales and in the upper troposphere.
Verification setup variations can alter apparent skill differences by 12-24 hours, requiring consistent scoring.
A direct-mapping ML DA system already supplies initial conditions usable by current state-of-the-art ML forecast models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Reducing overfitting on large scales inside HealDA would likely close most of the remaining skill gap.
This direct-mapping approach could support faster, lower-cost end-to-end ML weather pipelines by cutting dependence on full NWP assimilation infrastructure.
Future progress in AI weather forecasting may depend more on improving initial-condition quality than on further model architecture changes.

Load-bearing premise

The selected off-the-shelf ML forecast models represent the broader class of AI weather models and the chosen verification metrics and observation window fairly capture operational differences.

What would settle it

A side-by-side plot of error-growth curves for HealDA-initialized versus ERA5-initialized runs in an additional ML model, or the same comparison repeated with verification metrics focused on small-scale fields.

Figures

Figures reproduced from arXiv: 2601.17636 by (2) NOAA, (3) MITRE Corporation), Aayush Gupta (1), Akshay Subramaniam (1), Christopher Miller (3), Karthik Kashinath (1), Kelsey Lieberman (3), Michael S. Pritchard (1), Nicholas Silverman (3), Noah D. Brenowitz (1) ((1) NVIDIA Corporation, Sergey Frolov (2).

**Figure 1.** Figure 1: End-to-end HealDA system and forecasting pipeline. Observations from various remote-sensing instruments (ATMS, MHS, etc.) and in-situ sources (radiosondes, buoys, etc.) in the time window [𝑡0 − 21 h, 𝑡0 + 3 h] are processed by HealDA, which consists of an Observation Encoder (Obs Encoder) followed by an HPX ViT backbone, to produce an analysis state on the HPX grid at the target time 𝑡0. This analysis can … view at source ↗

**Figure 2.** Figure 2: RMSE of HealDA analysis vs IFS Time series of global RMSE for both HealDA and IFS against ERA5 in the 2022 test period, computed every 6 hours (00/06/12/18 UTC). The original data is shown with reduced opacity to reduce noise, and the solid line represents the 7-day moving average. closely. This behavior is broadly consistent with strong observational constraints on temperature and humidity from microwave … view at source ↗

**Figure 3.** Figure 3: Probabilistic FCN3 skill with HealDA and ERA5 initial conditions. CRPS of FCN3 forecasts initialized by HealDA and ERA5, both verified against ERA5 on the HPX64 grid and averaged over 128 initial conditions at 06/18 UTC in 2022. The inset panels zoom into the 6-48 h lead time range. 0 24 48 72 96 120 144 168 192 216 240 Lead time (hours) 0 50 100 150 200 250 CRPS [m² s ²] a Z500 HealDA-initialized FCN3 IFS… view at source ↗

**Figure 4.** Figure 4: Probabilistic skill of HealDA-initialized FCN3 vs IFS ENS. CRPS of IFS ENS forecasts and FCN3 forecasts initialized from HealDA, verified against ERA5 on the HPX64 grid and averaged over 128 initial conditions at 00/12 UTC in 2022. HealDA initialization (see Section A.5). This confirms our working hypothesis that the main impact of using ML-based initial conditions is shifting the starting error ||𝛿𝑥0||, n… view at source ↗

**Figure 5.** Figure 5: Analysis error spectral decomposition. Spherical power spectra of HealDA and IFS HRES analysis errors on the HPX64 grid, scored relative to ERA5. The HealDA error spectra are shown, averaged over the test year (2022), in solid lines, and a year from the training period (2021), in dashed lines. For IFS, the error spectra averaged across 2021-2022 are shown. Spectra are shown as a function of spherical harmo… view at source ↗

**Figure 6.** Figure 6: Error growth. Error power spectra of FCN3 forecasts initialized with HealDA analysis versus ERA5, shown as a function of spherical harmonic degree for (a) Z500 and (b) T850 at multiple forecast lead times. Power is visualized as as 10 log10 𝐶ℓ. 0 24 48 72 96 120 144 168 192 216 240 Lead time (hours) 0 200 400 600 800 RMSE [m² s ²] a Z500 HealDA-initialized Aurora HealDA-initialized FengWu ERA5-initialized … view at source ↗

**Figure 7.** Figure 7: HealDA can initialize FengWu and Aurora. RMSE of deterministic Aurora and FengWu forecasts initialized from either ERA5 (solid) or HealDA (dashed). Scores are averaged over 128 initial conditions at 06/18 UTC in 2022 and verified against ERA5 on the HPX64 grid. 10 [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Availability of observations across the test period. Observation counts at each 6-hour window centered at 00/06/12/18 UTC for HealDA’s sensor suite in the test period: (a) AMSU-A, (b) MHS, (c) ATMS microwave sounders, and (d) all conventional observations. Solid lines show the number of observations per window; dashed lines show the annual mean. 4.4. Aurora and FengWu from HealDA [PITH_FULL_IMAGE:figures/… view at source ↗

**Figure 9.** Figure 9: HealDA network architecture. Observation streams are flattened and then passed through sensorspecific embedders (detailed in [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: HealDA Sensor Embedder . Each raw observation is described by integer metadata (e.g., HPX pixel, channel, platform), floating-point metadata (e.g., satellite scan angles, local solar time, pressure, height), and the measurement itself. Integer metadata are mapped through embedding tables and combined with featurized floating-point metadata along with the measurement through an Obs tokenizer MLP, yielding … view at source ↗

read the original abstract

AI weather models now rival leading numerical weather prediction (NWP) systems in medium-range skill. However, almost all still rely on NWP data assimilation (DA) to provide initial conditions, tying them to expensive infrastructure and limiting the practical speed and accuracy gains of ML. More recently, ML-based DA systems have been proposed, which are often trained and evaluated end-to-end with a forecast model, making it difficult to assess the quality of their analysis fields. We introduce HealDA, a global ML-based DA system that maps a short window of satellite and conventional observations directly to a 1{\deg} atmospheric state on the HEALPix grid, using a smaller sensor suite than operational NWP. We treat HealDA strictly as a DA module: its analyses are used to initialize off-the-shelf ML forecast models without any fine-tuning of either. For a variety of off-the-shelf ML forecast models, including FourCastNet3 (FCN3), Aurora, and FengWu, HealDA-initialized forecasts lose less than one day of effective lead time when scored against ERA5. HealDA-initialized FCN3 ensembles similarly trail those of the ECMWF IFS ENS system by < 24 h. We find that forecast error growth in these models is unchanged from HealDA initialization, and the skill gap primarily arises from the larger initial error of the HealDA analysis. Spectral analysis reveals that this stems from overfitting to the large scales and upper-tropospheric fields. We also demonstrate that small changes in the verification setup can shift apparent skill by 12--24h, underscoring the need for consistent scoring. Taken together, these results clarify the current performance of ML-based DA systems and show that a relatively simple, direct observation-to-state network can already provide initial conditions that are usable by state-of-the-art ML forecast models with only modest loss in medium-range skill.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HealDA shows a basic ML DA can initialize several off-the-shelf AI forecast models with under a day of skill loss against ERA5, but that headline number moves with the exact scoring choices.

read the letter

The main takeaway is that HealDA maps a short window of observations straight to a 1-degree HEALPix state and then initializes unmodified models like FourCastNet3, Aurora, and FengWu. Those runs lose less than one day of effective lead time versus ERA5 starts, with unchanged error growth rates, so the gap is mostly from bigger initial errors at large scales and upper levels. They also show HealDA-initialized FCN3 ensembles trail ECMWF by under 24 hours. The direct test on multiple frozen forecast models without retraining is the cleanest part of the work and gives a clearer read on the DA quality alone. The spectral analysis pins down where the initial errors concentrate, and the explicit warning that small verification changes can shift skill by 12-24 hours is a useful flag for the whole field. The paper keeps the DA module separate from the forecast step, which avoids the circular training issues that plague some end-to-end ML DA efforts. The soft spot is that the central claim of less than one day loss sits inside the sensitivity range the authors report themselves. The manuscript does not appear to run a full sweep of alternative metrics, levels, or reference thresholds on the HealDA versus control pairs, so it is not obvious how stable the bound stays under other reasonable protocols. Verification against ERA5 is standard but still leaves the usual question of how much the result depends on the target the models were trained on. This paper is for groups working on practical AI weather pipelines who want to see whether they can drop expensive NWP DA steps. Readers who care about deployment timelines and verification hygiene will find concrete numbers and a fair caveat. The empirical comparisons are solid enough and the thinking is straightforward, so it deserves a serious referee even if a couple of extra sensitivity tables would tighten the quantitative claims.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces HealDA, a global ML-based data assimilation system that maps a short window of satellite and conventional observations directly to a 1° atmospheric state on the HEALPix grid. Treating HealDA strictly as a DA module, the authors initialize off-the-shelf ML forecast models (FourCastNet3, Aurora, FengWu) without fine-tuning and report that the resulting forecasts lose less than one day of effective lead time when scored against ERA5, with unchanged error growth rates relative to ERA5-initialized runs. The skill gap is attributed primarily to larger initial errors arising from overfitting to large scales and upper-tropospheric fields, supported by spectral analysis. HealDA-initialized FCN3 ensembles trail ECMWF IFS ENS by <24 h. The paper also shows that small changes in verification setup can shift apparent skill by 12-24 h.

Significance. If the central quantitative claims prove robust, the work is significant for clarifying the role of initial-condition errors in AI weather models and demonstrating that a relatively simple, direct observation-to-state ML DA system can deliver usable initial conditions for state-of-the-art forecast models with only modest medium-range skill loss. The cross-model empirical tests, ensemble comparisons, and spectral diagnosis of error sources provide concrete evidence that initial-error magnitude, rather than altered error growth, drives the performance gap. This could reduce reliance on expensive NWP DA infrastructure.

major comments (3)

[Results (lead-time and error-growth comparisons)] The central claim that HealDA-initialized forecasts lose <1 day of effective lead time (and exhibit unchanged error growth) is load-bearing for the paper's conclusions, yet the manuscript itself reports that small changes in verification setup shift apparent skill by 12-24 h. Without systematic sensitivity tests across the specific choices of scoring metric, reference threshold, pressure levels/variables, and ERA5 vs. independent observations for the HealDA vs. control comparisons, it is unclear whether the quantitative bound holds under alternative but plausible protocols.
[Error growth analysis subsection] The statement that forecast error growth remains unchanged from HealDA initialization requires explicit quantitative support, such as fitted growth rates with confidence intervals or statistical tests comparing HealDA-initialized vs. ERA5-initialized trajectories, to confirm the difference is not significant given the larger initial errors.
[Spectral analysis section] The spectral analysis attributing initial errors to overfitting on large scales and upper-tropospheric fields is used to explain the skill gap; the manuscript should specify the exact spectral bands, variables, and quantitative metric (e.g., power spectrum ratio or scale-dependent RMSE) used to identify this overfitting and demonstrate it is not an artifact of the chosen verification window.

minor comments (3)

Figure captions should explicitly label all curves (model, initialization method, ensemble vs. deterministic) and include the verification metric and reference dataset for immediate readability.
The abstract states results for 'a variety of off-the-shelf ML forecast models' but the main text should list all tested models and any selection criteria if additional models beyond FCN3, Aurora, and FengWu were evaluated.
Consider adding a summary table of effective lead-time losses broken down by model, variable, and pressure level to complement the narrative claims.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The comments highlight important aspects of robustness in our quantitative claims. We have revised the manuscript to incorporate additional analyses addressing each major point, as detailed below.

read point-by-point responses

Referee: [Results (lead-time and error-growth comparisons)] The central claim that HealDA-initialized forecasts lose <1 day of effective lead time (and exhibit unchanged error growth) is load-bearing for the paper's conclusions, yet the manuscript itself reports that small changes in verification setup shift apparent skill by 12-24 h. Without systematic sensitivity tests across the specific choices of scoring metric, reference threshold, pressure levels/variables, and ERA5 vs. independent observations for the HealDA vs. control comparisons, it is unclear whether the quantitative bound holds under alternative but plausible protocols.

Authors: We agree that systematic sensitivity testing strengthens the central claim. In the revised manuscript we have added a dedicated sensitivity analysis subsection. This includes tests varying the scoring metric (RMSE versus anomaly correlation coefficient), reference thresholds for effective lead time, and pressure levels/variables (Z500, T850, U200). The <1-day effective lead-time loss remains consistent across these choices, with variations of 12-24 h as previously noted. All comparisons use ERA5 as the common reference for both HealDA and control runs to maintain fairness. We also discuss the practical limitations of independent global observations and why ERA5 provides the most consistent benchmark. revision: yes
Referee: [Error growth analysis subsection] The statement that forecast error growth remains unchanged from HealDA initialization requires explicit quantitative support, such as fitted growth rates with confidence intervals or statistical tests comparing HealDA-initialized vs. ERA5-initialized trajectories, to confirm the difference is not significant given the larger initial errors.

Authors: We appreciate this request for quantitative rigor. We have added fitted exponential growth rates (with 95% confidence intervals obtained via bootstrap resampling) to the error-growth subsection. For each model and variable, the growth rates from HealDA and ERA5 initializations are statistically indistinguishable (two-sample t-test on bootstrap replicates, p > 0.05). The confidence intervals overlap substantially, confirming that the larger initial error, rather than altered growth, accounts for the skill gap. These results and the associated statistical tests are now reported explicitly. revision: yes
Referee: [Spectral analysis section] The spectral analysis attributing initial errors to overfitting on large scales and upper-tropospheric fields is used to explain the skill gap; the manuscript should specify the exact spectral bands, variables, and quantitative metric (e.g., power spectrum ratio or scale-dependent RMSE) used to identify this overfitting and demonstrate it is not an artifact of the chosen verification window.

Authors: We have expanded the spectral analysis section with the requested details. We compute the power-spectrum ratio (HealDA/ERA5) integrated over zonal wavenumber bands 1-10 (large scales) and 11-50 (mesoscales) for variables Z500, T850, and U200. Overfitting is identified by excess power ratios >1.2 in the large-scale band and upper-tropospheric levels. To rule out verification-window artifacts, we repeated the analysis over five independent 10-day windows spanning different seasons; the scale-dependent excess remains consistent. These specifications and robustness checks are now stated explicitly in the text and figure captions. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation

full rationale

The manuscript presents HealDA as a trained ML mapping from observations to analysis state, then reports direct empirical comparisons of initialized forecasts against ERA5 and ECMWF ensembles using standard skill metrics. No equations, uniqueness theorems, or derivations are invoked; all headline claims (effective lead-time loss <1 day, unchanged error growth) are measured outcomes on held-out data rather than quantities forced by construction from fitted parameters or self-citations. Verification sensitivity is acknowledged but does not alter the non-circular status of the reported measurements.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions of ML weather modeling and ERA5 as ground truth; no new physical axioms or invented entities are introduced.

axioms (1)

domain assumption ERA5 reanalysis serves as a reliable verification target for medium-range skill
Used throughout the evaluation without additional justification in the abstract.

pith-pipeline@v0.9.0 · 5717 in / 1278 out tokens · 34938 ms · 2026-05-16T11:50:53.868651+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

HealDA consists of two main components: an observation encoder followed by an HPX vision transformer (ViT) backbone... trained jointly end-to-end under a single supervised regression objective.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We find that forecast error growth in these models is unchanged from HealDA initialization, and the skill gap primarily arises from the larger initial error of the HealDA analysis.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Towards accurate extreme event likelihoods from diffusion model climate emulators
physics.ao-ph 2026-05 unverdicted novelty 6.0

Diffusion model climate emulators provide probability density estimates that allow likelihood calculations and odds-ratio-based importance sampling for extreme events such as tropical cyclones.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 1 Pith paper · 6 internal anchors

[1]

Deterministic nonperiodic flow.J

Edward N Lorenz. Deterministic nonperiodic flow.J. Atmos. Sci., 20(2):130–141, March 1963. ISSN 0022-4928. doi: 10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2. 1

work page doi:10.1175/1520-0469(1963)020 1963
[2]

Learning skillful medium-range global weather forecasting.Science, 382(6677):1416–1421, 2023

Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, et al. Learning skillful medium-range global weather forecasting.Science, 382(6677):1416–1421, 2023. 1, 2, 3

work page 2023
[3]

Pangu-weather: A 3D high-resolution model for fast and accurate global weather forecast.arXiv preprint arXiv:2211.02556,

Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian. Pangu-weather: A 3D high-resolution model for fast and accurate global weather forecast.arXiv preprint arXiv:2211.02556,

work page arXiv
[4]

ClimaX: A foundation model for weather and climate.arXiv [cs.LG], January 2023

Tung Nguyen, Johannes Brandstetter, Ashish Kapoor, Jayesh K Gupta, and Aditya Grover. ClimaX: A foundation model for weather and climate.arXiv [cs.LG], January 2023. 1 19 HealDA: Highlighting the importance of initial errors in end-to-end AI weather forecasts

work page 2023
[5]

Prognostic validation of a neural network unified physics parameteriza- tion.Geophysicak Research Letters, 17:2493, June 2018

N D Brenowitz and C S Bretherton. Prognostic validation of a neural network unified physics parameteriza- tion.Geophysicak Research Letters, 17:2493, June 2018. ISSN 0094-8276. doi: 10.1029/2018GL078510. 1

work page doi:10.1029/2018gl078510 2018
[6]

Can machines learn to predict weather? using deep learning to predict gridded 500-hPa geopotential height from historical weather data.J

Jonathan A Weyn, Dale R Durran, and Rich Caruana. Can machines learn to predict weather? using deep learning to predict gridded 500-hPa geopotential height from historical weather data.J. Adv. Model. Earth Syst., 11(8):2680–2693, August 2019. ISSN 1942-2466,1942-2466. doi: 10.1029/2019MS001705. 1

work page doi:10.1029/2019ms001705 2019
[7]

The operational medium-range deterministic weather forecasting can be extended beyond a 10-day lead time.Commun

Kang Chen, Tao Han, Fenghua Ling, Junchao Gong, Lei Bai, Xinyu Wang, Jing-Jia Luo, Ben Fei, Wenlong Zhang, Xi Chen, Leiming Ma, Tianning Zhang, Rui Su, Yuanzheng Ci, Bin Li, Xiaokang Yang, and Wanli Ouyang. The operational medium-range deterministic weather forecasting can be extended beyond a 10-day lead time.Commun. Earth Environ., 6(1):518, July 2025. ...

work page doi:10.1038/s43247-025-02502-y 2025
[8]

Brenowitz, Yair Cohen, Jaideep Pathak, Ankur Mahesh, Boris Bonev, Thorsten Kurth, Dale R

Noah D. Brenowitz, Yair Cohen, Jaideep Pathak, Ankur Mahesh, Boris Bonev, Thorsten Kurth, Dale R. Durran, Peter Harrington, and Michael S. Pritchard. A practical probabilistic benchmark for ai weather models.Geophysical Research Letters, 52(7), April 2025. ISSN 1944-8007. doi: 10.1029/2024gl113656. URLhttp://dx.doi.org/10.1029/2024GL113656. 2, 3

work page doi:10.1029/2024gl113656 2025
[9]

WeatherBench 2: A benchmark for the next generation of data-driven global weather models.J

Stephan Rasp, Stephan Hoyer, Alexander Merose, Ian Langmore, Peter Battaglia, Tyler Russell, Alvaro Sanchez-Gonzalez, Vivian Yang, Rob Carver, Shreya Agrawal, Matthew Chantry, Zied Ben Bouallegue, Peter Dueben, Carla Bromberg, Jared Sisk, Luke Barrington, Aaron Bell, and Fei Sha. WeatherBench 2: A benchmark for the next generation of data-driven global we...

work page doi:10.1029/2023ms004019 2024
[10]

Gencast: Diffusion-based ensemble forecasting for medium-range weather.arXiv preprint arXiv:2312.15796, 2023

Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, Remi Lam, and Matthew Willson. Gencast: Diffusion-based ensemble forecasting for medium-range weather.arXiv preprint arXiv:2312.15796, 2023. 2, 3

work page arXiv 2023
[11]

Brenner, and Stephan Hoyer

Dmitrii Kochkov, Janni Yuval, Ian Langmore, Peter Norgaard, Jamie Smith, Griffin Mooers, Milan Klöwer, James Lottes, Stephan Rasp, Peter Düben, Sam Hatfield, Peter Battaglia, Alvaro Sanchez- Gonzalez, Matthew Willson, Michael P. Brenner, and Stephan Hoyer. Neural general circulation models for weather and climate.Nature, 632(8027):1060–1066, July 2024. IS...

work page doi:10.1038/s41586-024-07744-y 2024
[12]

Observations and data assimilation.https: //www.ecmwf.int/en/research/data-assimilation/observations, 2023

European Centre for Medium-Range Weather Forecasts. Observations and data assimilation.https: //www.ecmwf.int/en/research/data-assimilation/observations, 2023. Accessed: 2026-01-08. 2

work page 2023
[13]

Klinker, J.-F

Florence Rabier, H Järvinen, E. Klinker, J.-F. Mahfouf, and Adrian Simmons. The ecmwf operational implementation of four dimensional variational assimilation. part i: Experimental results with simplified physics, 02/1999 1999. URLhttps://www.ecmwf.int/node/11794. 2

work page 1999
[14]

Buizza, Magdalena Alonso Balmaseda, Andrew Brown, S

R. Buizza, Magdalena Alonso Balmaseda, Andrew Brown, S. J. English, Richard Forbes, Alan Geer, T. Haiden, Martin Leutbecher, Linus Magnusson, Mark Rodwell, M. Sleigh, Tim Stockdale, Frédéric Vitart, and N. Wedi. The development and evaluation process followed at ecmwf to upgrade the integrated forecasting system (ifs). ECMWF Techni- cal Memorandum No. 829...

work page 2018
[15]

End-to-enddata-drivenweatherprediction.Nature, 641:1172–1179,

A.Allen, S.Markou, W.Tebbutt, etal. End-to-enddata-drivenweatherprediction.Nature, 641:1172–1179,

work page
[16]

URL https://doi.org/10.1038/s41586-025-08897-0

doi: 10.1038/s41586-025-08897-0. URL https://doi.org/10.1038/s41586-025-08897-0. Published online: 20 March 2025; Version of record: 21 May 2025. 2, 3, 5, 8

work page doi:10.1038/s41586-025-08897-0 2025
[17]

Huracan: A skillful end-to-end data-driven system for ensemble data assimilation and weather prediction, 2025

ZekunNi, JonathanWeyn,HangZhang, YanfeiXiang, JiangBian,WeixinJin, KitThambiratnam, QiZhang, Haiyu Dong, and Hongyu Sun. Huracan: A skillful end-to-end data-driven system for ensemble data assimilation and weather prediction, 2025. URLhttps://arxiv.org/abs/2508.18486. 2, 3, 5, 8, 9 20 HealDA: Highlighting the importance of initial errors in end-to-end AI ...

work page arXiv 2025
[18]

Xichen: An observation-scalable fully ai-driven global weather forecasting system with 4D variational knowledge, 2025

Wuxin Wang, Weicheng Ni, Lilan Huang, Tao Hao, Ben Fei, Shuo Ma, Taikang Yuan, Yanlai Zhao, Kefeng Deng, Xiaoyong Li, Boheng Duan, Lei Bai, and Kaijun Ren. Xichen: An observation-scalable fully ai-driven global weather forecasting system with 4D variational knowledge, 2025. URLhttps: //arxiv.org/abs/2507.09202. 2, 3, 8

work page arXiv 2025
[19]

X. Sun, X. Zhong, X. Xu, et al. A data-to-forecast machine learning system for global weather.Nature Communications, 16:6658, 2025. doi: 10.1038/s41467-025-62024-1. URLhttps://doi.org/10.1038/ s41467-025-62024-1. Published online: 19 July 2025. 2, 3, 5, 8, 12

work page doi:10.1038/s41467-025-62024-1 2025
[20]

Collins, Michael S

Boris Bonev, Thorsten Kurth, Ankur Mahesh, Mauro Bisson, Jean Kossaifi, Karthik Kashinath, Anima Anandkumar, William D. Collins, Michael S. Pritchard, and Alexander Keller. Fourcastnet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale, 2025. URLhttps://arxiv. org/abs/2507.12144. 2, 3

work page arXiv 2025
[21]

Bruinsma, Ana Lucic, Megan Stanley, Anna Allen, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan A

Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Anna Allen, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan A. Weyn, Haiyu Dong, Jayesh K. Gupta, Kit Thambiratnam, Alexander T. Archibald, Chun-Chieh Wu, Elizabeth Heider, Max Welling, Richard E. Turner, and Paris Perdikaris. A foundation model for the earth system.Nature, May ...

work page doi:10.1038/s41586-025-09005-y 2025
[22]

FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, Pedram Hassanzadeh, Karthik Kashinath, and Animashree Anandkumar. Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators, 2022. URLhttps...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[23]

Fengwu: Pushing the skillful global medium-range weather forecast beyond 10 days lead, 2023

Kang Chen, Tao Han, Junchao Gong, Lei Bai, Fenghua Ling, Jing-Jia Luo, Xi Chen, Leiming Ma, Tianning Zhang, Rui Su, Yuanzheng Ci, Bin Li, Xiaokang Yang, and Wanli Ouyang. Fengwu: Pushing the skillful global medium-range weather forecast beyond 10 days lead, 2023. URLhttps://arxiv.org/abs/2304. 02948. 3

work page 2023
[24]

Score-based data assimilation, 2023

François Rozet and Gilles Louppe. Score-based data assimilation, 2023. URLhttps://arxiv.org/abs/ 2306.10574. 3

work page arXiv 2023
[25]

Generative data assimilation of sparse weather station observations at kilometer scales, 2025

Peter Manshausen, Yair Cohen, Peter Harrington, Jaideep Pathak, Mike Pritchard, Piyush Garg, Morteza Mardani, Karthik Kashinath, Simon Byrne, and Noah Brenowitz. Generative data assimilation of sparse weather station observations at kilometer scales, 2025. URLhttps://arxiv.org/abs/2406.16947. 3

work page arXiv 2025
[26]

Dueben, and Torsten Hoefler

Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D. Dueben, and Torsten Hoefler. Diffda: a diffusion model for weather-scale data assimilation, 2024. URLhttps://arxiv.org/abs/2401.05932. 3

work page arXiv 2024
[27]

Appa: Bending weather dynamics with latent diffusion models for global data assimilation, 2025

Gérôme Andry, Sacha Lewin, François Rozet, Omer Rochman, Victor Mangeleer, Matthias Pirlet, Elise Faulx, Marilaure Grégoire, and Gilles Louppe. Appa: Bending weather dynamics with latent diffusion models for global data assimilation, 2025. URLhttps://arxiv.org/abs/2504.18720. 3

work page arXiv 2025
[28]

Lo-sda: Latent optimization for score-based atmospheric data assimilation, 2025

Jing-An Sun, Hang Fan, Junchao Gong, Ben Fei, Kun Chen, Fenghua Ling, Wenlong Zhang, Wanghan Xu, Li Yan, Pierre Gentine, and Lei Bai. Lo-sda: Latent optimization for score-based atmospheric data assimilation, 2025. URLhttps://arxiv.org/abs/2510.22562. 3

work page arXiv 2025
[29]

Data driven weather forecasts trained and initialised directly from observations, 2024

Anthony McNally, Christian Lessig, Peter Lean, Eulalie Boucher, Mihai Alexe, Ewan Pinnington, Matthew Chantry, Simon Lang, Chris Burrows, Marcin Chrust, Florian Pinault, Ethel Villeneuve, Niels Bormann, and Sean Healy. Data driven weather forecasts trained and initialised directly from observations, 2024. URLhttps://arxiv.org/abs/2407.15586. 4

work page arXiv 2024
[30]

An update on ai–dop: skil- ful weather forecasts produced directly from observations.ECMWF Newsletter, (182): 15–18, 2025

Tony McNally, Christian Lessig, Peter Lean, Eulalie Boucher, Mihai Alexe, Ewan Pinning- ton, Patrick Laloyaux, Simon Lang, Florian Pinault, Matt Chantry, Chris Burrows, Ethel Villeneuve, Marcin Chrust, Niels Bormann, and Sean Healy. An update on ai–dop: skil- ful weather forecasts produced directly from observations.ECMWF Newsletter, (182): 15–18, 2025. d...

work page doi:10.21957/tmi6y913dc 2025
[31]

Dawp: A framework for global observation forecasting via data assimilation and weather prediction in satellite observation space, 2025

Junchao Gong, Jingyi Xu, Ben Fei, Fenghua Ling, Wenlong Zhang, Kun Chen, Wanghan Xu, Weidong Yang, Xiaokang Yang, and Lei Bai. Dawp: A framework for global observation forecasting via data assimilation and weather prediction in satellite observation space, 2025. URLhttps://arxiv.org/abs/2510.15978. 4

work page arXiv 2025
[32]

Forecast performance of the ecmwf opera- tional forecasting system in 2022.ECMWF Newsletter, (175):5–12, 2023

Thomas Haiden, Matthieu Chevallier, and David Richardson. Forecast performance of the ecmwf opera- tional forecasting system in 2022.ECMWF Newsletter, (175):5–12, 2023. 5

work page 2022
[33]

The era5 global reanalysis.Quarterly Journal of the Royal Meteorological Society, 146(730):1999–2049, 2020

Hans Hersbach, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz-Sabater, Julien Nicolas, Carole Peubey, Raluca Radu, Dinand Schepers, et al. The era5 global reanalysis.Quarterly Journal of the Royal Meteorological Society, 146(730):1999–2049, 2020. 8, 12, 17

work page 1999
[34]

Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv [cs.LG], October 2025. doi: 10.48550/arXiv.2303.08797. 8

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08797 2025
[35]

Climate in a bottle: Towards a generative foundation model for the kilometer-scale global atmosphere.arXiv [physics.ao-ph], May 2025

Noah D Brenowitz, Tao Ge, Akshay Subramaniam, Aayush Gupta, David M Hall, Morteza Mardani, Arash Vahdat, Karthik Kashinath, and Michael S Pritchard. Climate in a bottle: Towards a generative foundation model for the kilometer-scale global atmosphere.arXiv [physics.ao-ph], May 2025. URL https://arxiv.org/abs/2505.06474. 9, 13

work page arXiv 2025
[36]

SamudrACE: Fast and accurate coupled climate modeling with 3D ocean and atmosphere emulators

James P C Duncan, Elynn Wu, Surya Dheeshjith, Adam Subel, Troy Arcomano, Spencer K Clark, Brian Henn, Anna Kwa, Jeremy McGibbon, W Andre Perkins, William Gregory, Carlos Fernandez-Granda, Julius Busecke, Oliver Watt-Meyer, William J Hurlin, Alistair Adcroft, Laure Zanna, and Christopher Bretherton. SamudrACE: Fast and accurate coupled climate modeling wit...

work page doi:10.48550/arxiv.2509.12490 2025
[37]

ACE2: Accurately learning subseasonal to decadal atmospheric variability and forced responses.arXiv [physics.ao-ph], November 2024

Oliver Watt-Meyer, Brian Henn, Jeremy McGibbon, Spencer K Clark, Anna Kwa, W Andre Perkins, Elynn Wu, Lucas Harris, and Christopher S Bretherton. ACE2: Accurately learning subseasonal to decadal atmospheric variability and forced responses.arXiv [physics.ao-ph], November 2024. 9

work page 2024
[38]

Clark, Brian Henn, James Duncan, Noah D

Oliver Watt-Meyer, Gideon Dresdner, Jeremy McGibbon, Spencer K. Clark, Brian Henn, James Duncan, Noah D. Brenowitz, Karthik Kashinath, Michael S. Pritchard, Boris Bonev, Matthew E. Peters, and Christopher S. Bretherton. Ace: A fast, skillful learned global atmospheric model for climate prediction,

work page
[39]

URLhttps://arxiv.org/abs/2310.02074. 12

work page arXiv
[40]

K. M. Gorski, E. Hivon, A. J. Banday, B. D. Wandelt, F. K. Hansen, M. Reinecke, and M. Bartelmann. Healpix: A framework for high-resolution discretization and fast analysis of data distributed on the sphere. The Astrophysical Journal, 622(2):759–771, April 2005. ISSN 1538-4357. doi: 10.1086/427976. URL http://dx.doi.org/10.1086/427976. 13

work page internal anchor Pith review doi:10.1086/427976 2005
[41]

Durran, Raul A

Matthias Karlbauer, Nathaniel Cresswell-Clay, Dale R. Durran, Raul A. Moreno, Thorsten Kurth, Boris Bonev, Noah Brenowitz, and Martin V. Butz. Advancing parsimonious deep learning weather prediction using the healpix mesh, 2024. URLhttps://arxiv.org/abs/2311.06253. 13

work page arXiv 2024
[42]

Diffusers: State-of-the-art diffusion models.https://github.com/huggingface/diffusers, 2022

Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, Dhruv Nair, Sayak Paul, William Berman, Yiyi Xu, Steven Liu, and Thomas Wolf. Diffusers: State-of-the-art diffusion models.https://github.com/huggingface/diffusers, 2022. 16

work page 2022
[43]

Scalable Diffusion Models with Transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers, 2023. URLhttps: //arxiv.org/abs/2212.09748. 16

work page internal anchor Pith review Pith/arXiv arXiv 2023
[45]

URLhttps://arxiv.org/abs/1910.07467. 17

work page arXiv 1910
[46]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=Bkg6RiCqY7. 17

work page 2019
[47]

Deep Networks with Stochastic Depth

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth.arXiv preprint arXiv:1603.09382, 2016. doi: 10.48550/arXiv.1603.09382. 17 22 HealDA: Highlighting the importance of initial errors in end-to-end AI weather forecasts

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1603.09382 2016
[48]

Weatherbench 2: A benchmark for the next generation of data-driven global weather models, 2024

Stephan Rasp, Stephan Hoyer, Alexander Merose, Ian Langmore, Peter Battaglia, Tyler Russell, Alvaro Sanchez-Gonzalez, Vivian Yang, Rob Carver, Shreya Agrawal, Matthew Chantry, Zied Ben Bouallegue, Peter Dueben, Carla Bromberg, Jared Sisk, Luke Barrington, Aaron Bell, and Fei Sha. Weatherbench 2: A benchmark for the next generation of data-driven global we...

work page arXiv 2024
[49]

TheGlobalEnsembleForecastSystem(version13)Replaydataset

NOAA. TheGlobalEnsembleForecastSystem(version13)Replaydataset. NOAAOpenDataDissemination Program. Available at: https://psl.noaa.gov/data/ufs_replay/, 2024. URL https://psl.noaa. gov/data/ufs_replay/. Subset used: January 2000 – December 2023. Accessed: December 20 2025. 18

work page 2024
[50]

Methods for assessing the impact of current and future components of the global observing system, 04/2024 2024

Sean Healy, Niels Bormann, Alan Geer, Elias Holm, Bruce Ingleby, Katie Lean, Katrin Lonitz, and Cristina Lupu. Methods for assessing the impact of current and future components of the global observing system, 04/2024 2024. URL . 18

work page 2024
[51]

Ascat wind data processing manual

KNMI and OSI SAF and EUMETSAT. Ascat wind data processing manual. Technical report, KNMI, 2009. URL https://scatterometer.knmi.nl/old_manuals/ss3_pm_ascat_1.0.pdf. Accessed: 2025-12-01. 19

work page 2009
[52]

Active techniques in wind observations: Scatterometer,

ECMWF. Active techniques in wind observations: Scatterometer,

work page
[53]

Accessed: 2025-12-01

URL https://www.ecmwf.int/sites/default/files/elibrary/2015/ 8918-active-techniques-wind-observations-scatterometer.pdf . Accessed: 2025-12-01. 19

work page 2015
[54]

Atmospheric motion vectors: Past, present and future

Mary Forsythe. Atmospheric motion vectors: Past, present and future. Technical re- port, ECMWF / Met Office Seminar on Recent Developments in Use of Satellite Obser- vations in NWP, 2008. URL https://www.ecmwf.int/sites/default/files/elibrary/2008/ 74512-atmospheric-motion-vectors-past-present-and-future_0.pdf . ECMWF Seminar on Satel- lite Observations i...

work page 2008
[55]

Gps radio occultation lecture notes, 2015

ECMWF. Gps radio occultation lecture notes, 2015. URLhttps://www.ecmwf.int/sites/default/ files/gpsro_lecture_2015_nwpsaf.pdf. ECMWF / NWPSAF training material. 19

work page 2015
[56]

Earth2studio: Open-source deep-learning framework for ai weather/climate workflows

NickGeneva and the NVIDIA Earth2Studio Team. Earth2studio: Open-source deep-learning framework for ai weather/climate workflows. URLhttps://github.com/NVIDIA/earth2studio/releases/tag/ 0.9.0. 19

work page
[57]

Michaël Zamo and Philippe Naveau. Estimation of the continuous ranked probability score with limited information and applications to ensemble weather forecasts.Mathematical Geosciences, 50 (2):209–234, February 2018. doi: 10.1007/s11004-017-9709-7. URL https://doi.org/10.1007/ s11004-017-9709-7. 24

work page doi:10.1007/s11004-017-9709-7 2018
[58]

Strictly proper scoring rules, prediction, and estimation.J

Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation.J. Am. Stat. Assoc., 102(477):359–378, March 2007. ISSN 0162-1459. doi: 10.1198/016214506000001437. 24

work page doi:10.1198/016214506000001437 2007
[59]

Wmo integrated processing and prediction system activities – part ii: Specifications of wmo integrated processing and prediction system activities

World Meteorological Organization. Wmo integrated processing and prediction system activities – part ii: Specifications of wmo integrated processing and prediction system activities. Wmo-no. 485, World Meteorological Organization, 2023. URLhttps://library.wmo.int/idurl/4/35703. Part II: Specifications of WMO Integrated Processing and Prediction System Act...

work page 2023
[60]

Number5inIFSDocumentation

ECMWF.IFSDocumentationCY48R1–PartV:EnsemblePredictionSystem. Number5inIFSDocumentation. European Centre for Medium-Range Weather Forecasts, 2023. doi: 10.21957/e529074162. 26

work page doi:10.21957/e529074162 2023
[61]

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows, 2021. URLhttps://arxiv.org/ abs/2103.14030. 29 23 HealDA: Highlighting the importance of initial errors in end-to-end AI weather forecasts 0 24 48 72 96 120 144 168 192 216 240 Lead Time (hour...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[62]

Red dotted lines mark reference thresholds (ACC = 0.6; SSR = 1). 28 HealDA: Highlighting the importance of initial errors in end-to-end AI weather forecasts 0 48 96 144 192 240 Lead Time (hours) 0 150 300 450 600 FCN3 RMSE [m² s ²] a Z500 0 48 96 144 192 240 Lead Time (hours) 0.0 0.8 1.6 2.4 3.2 [K]b T850 0 48 96 144 192 240 Lead Time (hours) 0 2 4 6 8 [m...

work page 2022