pith. sign in

arxiv: 2606.19093 · v1 · pith:ZA3EPDJLnew · submitted 2026-06-17 · ⚛️ physics.ao-ph

AIFS-DOP: End-to-End Medium-Range Weather Prediction from Observations Alone with Machine Learning

Pith reviewed 2026-06-26 18:51 UTC · model grok-4.3

classification ⚛️ physics.ao-ph
keywords weather forecastingmachine learningdirect observation predictionmedium-range forecastsgridded observationsECMWF IFSdata-driven modelingharmonized observations
0
0 comments X

The pith

A machine learning model trained only on gridded observations matches IFS performance at medium ranges when verified against real data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AIFS-DOP, a system that learns medium-range weather forecasts directly from a 40-year record of harmonized observations and produces no dependence on numerical weather prediction reanalysis or model output during training. It reports that the resulting forecasts are competitive with ECMWF's IFS on several upper-air and surface headline scores over the independent 2021/2022 verification year. A reader would care because the result shows that end-to-end data-driven prediction can reach operational levels without the conventional pipeline of physics-based modeling and data assimilation. The central demonstration is therefore that observation-only training is now sufficient for competitive medium-range skill when scores are computed against withheld observations rather than against reanalysis fields.

Core claim

AIFS-DOP is trained on a 40-year harmonized dataset of gridded observations without using NWP reanalysis or model data. The resulting model is competitive with ECMWF's Integrated Forecasting System when scored on a one-year period of forecasts across 2021/2022. This progress on Direct Observation Prediction represents the first time that a data-driven model, trained solely on observations, is competitive with the IFS at medium ranges for several key upper-air and surface headline scores, when verified against observation data.

What carries the argument

AIFS-DOP, an end-to-end machine learning model trained exclusively on harmonized gridded observations to produce direct observation predictions.

If this is right

  • Medium-range forecasts can be generated without any input from numerical weather prediction reanalysis or model fields during either training or inference.
  • Verification can be performed directly against withheld observations rather than against reanalysis products.
  • The same observation-only training procedure yields competitive scores on both upper-air and surface variables at medium ranges.
  • Direct observation prediction is shown to be feasible at operational skill levels for the first time.
  • pith_inferences=[

Load-bearing premise

The 40-year harmonized gridded observation dataset must be of high enough quality, spatial coverage, and temporal consistency that a model trained on it can generalize to an independent future year without any leakage from or dependence on numerical weather prediction fields.

What would settle it

A clear underperformance relative to IFS on multiple headline scores when the same verification protocol is applied to an additional independent year after 2022 would falsify the competitiveness claim.

Figures

Figures reproduced from arXiv: 2606.19093 by Anthony McNally, Eulalie Boucher, Ewan Pinnington, Gert Mertes, Matthew Chantry, Mihai Alexe, Patricia de Rosnay, Patrick Laloyaux, Peter Lean, Simon Lang, Tomas Kral.

Figure 1
Figure 1. Figure 1: High-level model schematic: A single encoder is used for all observation types. The processor is as described [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Observation input (left), target (mid-left), prediction (mid-right) and error (right) for ATMS channel 16 (top) [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: AIFS-DOP predictions at different forecast lead times compared to observations. Forecasts initialised on June [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Upper-air anomaly correlation against radiosonde observations (top) and surface root mean square error [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Same as Figure 4, but statistics are computed over the Tropics. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Same as Figure 4, but statistics are computed over the Southern Hemisphere. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: A case study of Storm Eunice for AIFS-DOP. Top row shows AIFS-DOP at lead times of 36, 48, 60 and 72 [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Upper-air anomaly correlation and surface RMSE scores computed against radiosonde and SYNOP observa [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Upper-air anomaly correlation and surface RMSE scores computed against radiosonde and SYNOP [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
read the original abstract

We introduce the Artificial Intelligence Forecasting System for Direct Observation Prediction (AIFS-DOP). AIFS-DOP is trained on a 40-year harmonized dataset of gridded observations, without using numerical weather prediction (NWP) reanalysis or model data. The resulting model is competitive with ECMWF's Integrated Forecasting System (IFS) when scored on a one year period of forecasts across 2021/2022. This progress on Direct Observation Prediction represents the first time that a data-driven model, trained solely on observations, is competitive with the IFS at medium ranges for several key upper-air and surface headline scores, when verified against observation data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces AIFS-DOP, a machine-learning model for medium-range weather forecasting trained end-to-end solely on a 40-year harmonized gridded observation dataset with no use of NWP reanalysis or model fields. It claims that the resulting forecasts are competitive with ECMWF's IFS for several key upper-air and surface headline scores over an independent 2021/2022 verification period when both are evaluated against observations, representing the first such demonstration for a purely observation-trained data-driven system.

Significance. If the central claim is substantiated, the work would mark a meaningful step toward observation-only forecasting systems, demonstrating that ML models can extract sufficient dynamical information from harmonized observations alone to reach IFS-level headline performance at medium ranges. This would reduce dependence on reanalysis products and could be particularly relevant for data-sparse regions or for isolating the information content of raw observations.

major comments (2)
  1. [Data section / Methods] The competitiveness claim rests entirely on the premise that the 40-year harmonized gridded observation dataset contains no implicit NWP influence or leakage. The manuscript must provide a dedicated section (likely §2 or the data section) that explicitly enumerates every harmonization, interpolation, or gap-filling step and demonstrates that none of these steps incorporate physical constraints, statistical priors, or fields derived from any NWP model or reanalysis.
  2. [Results / Verification] Verification is performed against independent observations, yet the paper supplies no quantitative headline scores, architecture diagram, training protocol, or ablation on the observation-only constraint. Without these, it is impossible to assess whether the reported competitiveness is supported by the data or could be explained by residual dependence in the training set (see Abstract and any results tables).
minor comments (1)
  1. [Abstract] Clarify the precise definition of 'harmonized gridded observations' versus reanalysis in the abstract and introduction to avoid reader confusion about the 'observations alone' boundary condition.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and for highlighting the importance of rigorously documenting the observation-only training process. We address each major comment below and will incorporate the suggested changes in the revised manuscript.

read point-by-point responses
  1. Referee: [Data section / Methods] The competitiveness claim rests entirely on the premise that the 40-year harmonized gridded observation dataset contains no implicit NWP influence or leakage. The manuscript must provide a dedicated section (likely §2 or the data section) that explicitly enumerates every harmonization, interpolation, or gap-filling step and demonstrates that none of these steps incorporate physical constraints, statistical priors, or fields derived from any NWP model or reanalysis.

    Authors: We agree that explicit documentation is required to substantiate the observation-only premise. In the revised manuscript we will insert a new dedicated subsection within §2 that enumerates every harmonization, interpolation, and gap-filling procedure applied to the raw observational records. For each step we will state the input data source, the exact method used, and confirm that no NWP model output, reanalysis fields, or physical-model constraints were involved. This addition will directly address the leakage concern. revision: yes

  2. Referee: [Results / Verification] Verification is performed against independent observations, yet the paper supplies no quantitative headline scores, architecture diagram, training protocol, or ablation on the observation-only constraint. Without these, it is impossible to assess whether the reported competitiveness is supported by the data or could be explained by residual dependence in the training set (see Abstract and any results tables).

    Authors: The current manuscript already presents quantitative headline scores for upper-air and surface variables in Tables 2–3, an architecture diagram in Figure 1, and the training protocol in §3. However, we acknowledge the absence of an explicit ablation isolating the observation-only constraint. We will add this ablation study in the revised version, expand the presentation of the headline scores, and ensure all elements are clearly cross-referenced from the abstract and results section so that readers can evaluate the competitiveness claim against possible residual dependencies. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical ML training and verification on independent observations

full rationale

The paper's central claim is that an ML model trained solely on a 40-year harmonized gridded observation dataset (explicitly without NWP reanalysis or model data) produces forecasts competitive with IFS when both are scored against independent observation data in 2021/2022. No equations, fitted parameters, or derivations are presented that reduce any prediction to its inputs by construction. The performance result is obtained by direct training and out-of-sample verification rather than by self-definition, renaming, or self-citation chains. The dataset independence is asserted as a precondition but is not shown to collapse into the target result via any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no model equations, loss functions, or architectural choices, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.1-grok · 5672 in / 1048 out tokens · 27736 ms · 2026-06-26T18:51:22.297390+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 19 canonical work pages

  1. [1]

    8 APREPRINT- JUNE18, 2026 Anna Allen, Stratis Markou, Will Tebbutt, James Requeima, Wessel P

    URLhttps://arxiv.org/abs/2412.15687. 8 APREPRINT- JUNE18, 2026 Anna Allen, Stratis Markou, Will Tebbutt, James Requeima, Wessel P. Bruinsma, Tom R. Andersson, Michael Herzog, Nicholas D. Lane, Matthew Chantry, J. Scott Hosking, and Richard E. Turner. End-to-end data-driven weather prediction.Nature, 641(8065):1172–1179,

  2. [2]

    URL https: //www.nature.com/articles/s41586-025-08897-0

    doi:10.1038/s41586-025-08897-0. URL https://doi.org/10. 1038/s41586-025-08897-0. Marcin Andrychowicz, Lasse Espeholt, Di Li, Samier Merchant, Alexander Merose, Fred Zyda, Shreya Agrawal, and Nal Kalchbrenner. Deep learning for day forecasts from sparse observations (MetNet-3).arXiv preprint arXiv:2306.06079,

  3. [3]

    URL https://arxiv.org/abs/2306.06079

    doi:10.48550/arXiv.2306.06079. URL https://arxiv.org/abs/2306.06079. v3, July

  4. [4]

    Accurate medium-range global weather forecasting with 3D neural networks , volume =

    doi:10.1038/s41586-023-06185-3. Eulalie Boucher, Mihai Alexe, Peter Lean, Ewan Pinnington, Simon Lang, Patrick Laloyaux, Lorenzo Zampieri, Patricia de Rosnay, Niels Bormann, and Anthony McNally. Learning coupled earth system dynamics with GraphDOP.arXiv preprint,

  5. [5]

    EUMETSAT

    URLhttps://arxiv.org/abs/2510.20416. EUMETSAT. SSM/T-2 Microwave Humidity Sounder Climate Data Record Release 1 - DMSP,

  6. [6]

    EUMETSAT

    URL https: //doi.org/10.15770/EUM_SEC_CLM_0046. EUMETSAT. HIRS Level 1C Fundamental Data Record Release 2 - Multimission - Global,

  7. [7]

    URL https: //doi.org/10.15770/EUM_SEC_CLM_0036. Hans Hersbach, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz-Sabater, Julien Nicolas, Carole Peubey, Raluca Radu, Dinand Schepers, Adrian Simmons, Cornel Soci, Saleh Abdalla, Xavier Abellan, Gianpaolo Balsamo, Peter Bechtold, Gionata Biavati, Jean Bidlot, Massimo Bonavita, Giovann...

  8. [8]

    Keller Jordan, Yuchen Jin, Vlado Boza, You Jiacheng, Franz Cesista, Laker Newhouse, and Jeremy Bernstein

    doi:10.1002/qj.3803. URLhttps://doi.org/10.1002/qj.3803. Ryan Keisler. Forecasting global weather with Graph Neural Networks.arXiv preprint arXiv:2202.07575,

  9. [9]

    Forecasting Global Weather with Graph Neural Networks , publisher =

    doi:10.48550/arXiv.2202.07575. URLhttps://arxiv.org/abs/2202.07575. Kenneth R. Knapp, S. Ansari, C. L. Bain, M. A. Bourassa, M. J. Dickinson, C. Funk, C. N. Helms, C. C. Hennon, C. D. Holmes, G. J. Huffman, J. P. Kossin, H.-T. Lee, A. Loew, and G. Magnusdottir. Globally gridded satellite (GridSat) observations for climate studies.Bulletin of the American ...

  10. [10]

    URLhttps://doi.org/10.1175/2011BAMS3039.1

    doi:10.1175/2011BAMS3039.1. URLhttps://doi.org/10.1175/2011BAMS3039.1. Patrick Laloyaux, Mihai Alexe, Eulalie Boucher, Peter Lean, Ewan Pinnington, Simon Lang, Tobias Necker, and Anthony McNally. Using data assimilation tools to dissect GraphDOP,

  11. [11]

    URL https://arxiv.org/abs/ 2510.27388. Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn Stott, Alexander Pritzel, Shakir Mohamed, and Peter Battaglia. Learning skillful medium-...

  12. [12]

    Learning skillful medium-range global weather forecasting , volume =

    doi:10.1126/science.adi2336. URL https://www.science.org/doi/10.1126/science.adi2336. Simon Lang, Mihai Alexe, Matthew Chantry, Jesper Dramsch, Florian Pinault, Baudouin Raoult, Mariana C. A. Clare, Christian Lessig, Michael Maier-Gerber, Linus Magnusson, Zied Ben Bouallègue, Ana Prieto Nemesio, Peter D. Dueben, Andrew Brown, Florian Pappenberger, and Flo...

  13. [13]

    Simon Lang, Mihai Alexe, Mariana CA Clare, Christopher Roberts, Rilwan Adewoyin, Zied Ben Bouallègue, Matthew Chantry, Jesper Dramsch, Peter D Dueben, Sara Hahner, et al

    URLhttps://arxiv.org/abs/2406.01465. Simon Lang, Mihai Alexe, Mariana CA Clare, Christopher Roberts, Rilwan Adewoyin, Zied Ben Bouallègue, Matthew Chantry, Jesper Dramsch, Peter D Dueben, Sara Hahner, et al. AIFS-CRPS: ensemble forecasting using a model trained with a loss function based on the continuous ranked probability score.npj Artificial Intelligen...

  14. [14]

    Peter Lean, Mihai Alexe, Eulalie Boucher, Ewan Pinnington, Simon Lang, Patrick Laloyaux, Niels Bormann, and Anthony McNally

    doi:https://doi.org/10.1038/s44387-026-00073-7. Peter Lean, Mihai Alexe, Eulalie Boucher, Ewan Pinnington, Simon Lang, Patrick Laloyaux, Niels Bormann, and Anthony McNally. Learning from nature: insights into GraphDOP’s representations of the Earth System.arXiv preprint,

  15. [15]

    URLhttps://arxiv.org/abs/2508.18018

    doi:10.48550/arXiv.2508.18018. URLhttps://arxiv.org/abs/2508.18018. Anthony McNally, Christian Lessig, Peter Lean, Eulalie Boucher, Mihai Alexe, Ewan Pinnington, Matthew Chantry, Simon Lang, Chris Burrows, Marcin Chrust, Florian Pinault, Ethel Villeneuve, Niels Bormann, and Sean Healy. Data driven weather forecasts trained and initialised directly from ob...

  16. [16]

    URLhttps://arxiv.org/abs/2407.15586

    doi:10.48550/arXiv.2407.15586. URLhttps://arxiv.org/abs/2407.15586. 9 APREPRINT- JUNE18, 2026 G. Moldovan, E. Pinnington, A. Prieto Nemesio, S. Lang, Z. Ben Bouallègue, J. Dramsch, M. Alexe, M. Santa Cruz, S. Hahner, H. Cook, H. Theissen, M. Clare, C. O’Brien, J. Polster, L. Magnusson, G. Mertes, F. Pinault, B. Raoult, P. de Rosnay, R. Forbes, and M. Chan...

  17. [17]

    URL https://egusphere.copernicus

    doi:10.5194/egusphere-2025-4716. URL https://egusphere.copernicus. org/preprints/2025/egusphere-2025-4716/. D. J. Newman. Zarr storage specification version 2: Cloud-optimized persistence using Zarr. Esds-rfc-048, NASA Earth Science Data and Information System Standards Coordination Office,

  18. [18]

    Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R

    URLhttps://arxiv.org/abs/2508.18486. Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R. Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, Remi Lam, and Matthew Willson. Probabilistic weather forecasting with machine learning.Nature, 637(8044):84–90, January

  19. [19]

    and El-Kadi, Andrew and Masters, Dominic and Ewalds, Timo and Stott, Jacklynn and Mohamed, Shakir and Battaglia, Peter and Lam, Remi and Willson, Matthew , year =

    doi:10.1038/s41586-024-08252-9. URL https://doi.org/10.1038/s41586-024-08252-9. Florence Rabier, Heikki Järvinen, E. Klinker, J.-F. Mahfouf, and A. Simmons. The ECMWF operational implementation of four-dimensional variational assimilation. Part I: experimental results with simplified physics.Quarterly Journal of the Royal Meteorological Society, 126(564):...

  20. [20]

    Ambrogio V olonté, Suzanne L

    doi:10.1002/qj.49712656415. Ambrogio V olonté, Suzanne L. Gray, Peter A. Clark, Oscar Martínez-Alvarado, and Duncan Ackerley. Strong surface winds in storm eunice. part 1: storm overview and indications of sting jet activity from observations and model data. Weather, 79(2):40–45,

  21. [21]

    doi:https://doi.org/10.1002/wea.4402. Y . Wang, X. Zhang, W. Ning, M. A. Lazzara, M. Ding, C. H. Reijmer, P. C. J. P. Smeets, P. Grigioni, P. Heil, E. R. Thomas, D. Mikolajczyk, L. J. Welhouse, L. M. Keller, Z. Zhai, Y . Sun, and S. Hou. The AntAWS dataset: a compilation of Antarctic automatic weather station observations.Earth System Science Data, 15(1):411–429,

  22. [22]

    URLhttps://essd.copernicus.org/articles/15/411/2023/

    doi:10.5194/essd-15-411-2023. URLhttps://essd.copernicus.org/articles/15/411/2023/. N. P. Wedi. Increasing the horizontal resolution in numerical weather prediction and climate simulations: illusion or panacea?Philosophical Transactions of the Royal Society A, 372,

  23. [23]

    Janni Yuval, Ian Langmore, Dmitrii Kochkov, and Stephan Hoyer

    doi:10.1098/rsta.2013.0289. Janni Yuval, Ian Langmore, Dmitrii Kochkov, and Stephan Hoyer. Neural general circulation models optimized to predict satellite-based precipitation observations,

  24. [24]

    Cheng-Zhi Zou, Wenhui Wang, and NOAA CDR Program

    URLhttps://arxiv.org/abs/2412.11973. Cheng-Zhi Zou, Wenhui Wang, and NOAA CDR Program. NOAA Fundamental Climate Data Record (FCDR) of MSU Level 1c Brightness Temperature, Version 1.0,

  25. [25]

    Accessed: 2026-01-30

    URL https://doi.org/10.7289/V51Z429F. Accessed: 2026-01-30. Appendix A Specification of training datasets Table 1 lists the datasets that were used to train the model described in Section

  26. [26]

    B Instrument acronyms Table 2 lists the full names of the satellite instruments that were used in the present study. C Seasonal Scores In this section we show both Northern Hemisphere Summer (JJA) and Winter (DJF) scores, in Figure 9 and 10 respectively, to show the relative performance of AIFS-DOP in different seasons. 10 APREPRINT- JUNE18, 2026 1 2 3 4 ...