arxiv: 2604.18727 · v1 · submitted 2026-04-20 · ⚛️ physics.ao-ph · cs.AI· nlin.CD

Recognition: unknown

Skillful Global Ocean Emulation and the Role of Correlation-Aware Loss

Niraj Agarwal , Timothy A. Smith , Sergey Frolov , Laura C. Slivinski

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:40 UTC · model grok-4.3

classification ⚛️ physics.ao-ph cs.AInlin.CD

keywords global ocean emulationmachine learning forecastingcorrelation-aware lossMahalanobis distancemedium-range ocean predictionstatistical regularizationocean dynamics

0 comments

The pith

Machine learning can emulate global ocean states with skillful accuracy for up to 15 days when a loss function accounts for correlations in variable tendencies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an ocean emulator can produce accurate forecasts for medium-range lead times of 10 to 15 days using daily time steps and a single initial condition. A key improvement comes from replacing standard mean squared error with a loss based on Mahalanobis distance that incorporates the observed correlations among changes in ocean variables. This approach not only boosts forecast accuracy but also helps preserve the slow, large-scale correlated motions characteristic of ocean dynamics. If valid, it points to a practical way to generate fast, reliable ocean state predictions that can support applications such as data assimilation without running full physics models at high cost.

Core claim

The emulator achieves skillful forecasts for 10-15 day lead times. The Mahalanobis distance loss improves forecast skill over mean squared error by explicitly accounting for correlations between tendencies of target variables, and spatial correlation analysis shows it acts as a statistical-dynamical regularizer for the slow, correlated dynamics of the global oceans, providing a better background forecast for downstream tasks.

What carries the argument

The Mahalanobis distance loss, defined using the inverse of the covariance matrix of ocean variable tendencies, which penalizes deviations in a manner consistent with observed inter-variable correlations.

If this is right

The emulator produces skillful medium-range forecasts without requiring autoregressive training or multiple initial conditions.
Replacing mean squared error with the correlation-aware loss increases forecast accuracy by respecting inter-variable dependencies.
The proposed loss regularizes the model to better capture the slow, large-scale ocean dynamics through statistical means.
These forecasts can serve as improved initial backgrounds for data assimilation applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar correlation-aware losses could be applied to emulators of other fluid dynamical systems to enforce physical consistency.
This technique may reduce reliance on computationally expensive physics-based ocean models for generating ensemble forecasts.
Investigating how the covariance structure changes under different climate regimes would test the robustness of the approach.

Load-bearing premise

The covariance structure estimated from the training data remains stable and representative during out-of-sample forecast periods, and atmospheric conditions can be prescribed without needing ocean feedback to capture the relevant dynamics.

What would settle it

If forecasts generated with the Mahalanobis loss show equivalent or worse skill metrics and spatial correlation structures compared to those using mean squared error, or if accuracy drops below skillful levels before 10 days, the central claims would be refuted.

Figures

Figures reproduced from arXiv: 2604.18727 by Laura C. Slivinski, Niraj Agarwal, Sergey Frolov, Timothy A. Smith.

**Figure 1.** Figure 1: Global inter-variable correlation matrix (Σ) derived from the temporal tendencies of all prognostic variables. For 3D fields (SO, TEMP, UO, VO), indices progress from surface to deep levels (left-to-right and bottom-to-top), while SSH is represented as a single entry in the bottom left corner. Diagonal elements are all one. The colorbar indicates Pearson correlation coefficients ranging between [−1, 1]. –5… view at source ↗

**Figure 2.** Figure 2: Global RMSE as a function of lead time for SSH (top panel) and 3D prognostic variables: potential temperature, salinity, and zonal/meridional velocities (columns 1–4). The 3D variables are shown at representative depths of approximately 0.5 m (row 2), 94 m (row 3), and 418 m (row 4) corresponding to the surface, the mixed layer, and the deeper water mass. The RMSEs are aggregated over forecasts using 292 i… view at source ↗

**Figure 3.** Figure 3: Comparison of 10-day lead time RMSE maps for SST (a,c,e) and SSS (b,d,f) from the two emulators: (a-b) RMSE maps from the MSE-based emulator; (c-d) RMSE maps from the MLoss-based emulator, and (e-f) relative percentage difference in RMSE (M-Loss minus MSE) with MSE as the baseline. Blue/red percentage differences suggests gain/degradation in RMSE skill using M-Loss. To understand the degradation in RMSE in… view at source ↗

**Figure 4.** Figure 4: Inter-variable spatial correlation analysis for SSH, temperature, salinity, and zonal and meridional current velocities: (a) the Pearson correlation coefficients for the UFS-Replay reference truth, (b) percentage relative error in the correlation coefficients for forecasts from the MSE-based emulator compared to the UFS-Replay reference, (c) same as (b) but for the MLossbased emulator. M-Loss objective fu… view at source ↗

read the original abstract

Machine learning emulators have shown extraordinary skill in forecasting atmospheric states, and their application to global ocean dynamics offers similar promise. Here, we adapt the GraphCast architecture into a dedicated ocean-only emulator, driven by prescribed atmospheric conditions, for medium-range predictions. The emulator is trained on NOAA's UFS-Replay dataset. Using a 24 hour time step, single initial condition, and without using autoregressive training, we produce an emulator that provides skillful forecasts for 10-15 day lead times. We further demonstrate the use of Mahalanobis distance as loss that improves the forecast skill compared to the Mean Squared Error loss by explicitly accounting for the correlations between tendencies of the target variables. Using spatial correlation analysis of the forecasted fields, we also show that the proposed correlation-aware loss acts as a statistical-dynamical regularizer for the slow, correlated dynamics of the global oceans, offering a better background forecast for downstream tasks like data assimilation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They adapt GraphCast to an ocean-only emulator and show that a Mahalanobis loss beats MSE on 10-15 day forecasts while acting as a regularizer, but the out-of-sample stability of the covariance remains an open question.

read the letter

The main point is that this work takes the GraphCast architecture, drops it into an ocean-only setup driven by prescribed atmosphere, and trains it on UFS-Replay data to produce usable 10-15 day forecasts at 24-hour steps without autoregression. They then swap in a Mahalanobis loss that folds in the correlation structure of the tendencies and report better skill than plain MSE, plus some spatial correlation diagnostics that suggest the loss is damping fast, uncorrelated noise while letting the slower, correlated ocean dynamics through. That combination and the regularizer reading are the concrete new pieces relative to prior GraphCast applications and earlier Mahalanobis uses in other domains. The paper does a clean job of laying out the baseline comparison and the spatial analysis that supports the regularizer claim. The numbers look plausible for this class of emulator, and the ocean-only framing is a reasonable first cut for medium-range work. The soft spot is the covariance matrix itself. It is estimated once from the training data and then held fixed; nothing in the abstract or the stress-test note shows whether that matrix stays representative once the emulator is run forward, especially when atmospheric forcing changes or when internal variability at longer leads starts to matter. The one-way atmosphere prescription also leaves open whether missing ocean feedback affects the skill at the claimed lead times. Those are standard concerns for this kind of loss, not fatal, but they need explicit checks in the full results. This paper is aimed at the ML-for-ocean-modeling crowd and at people who care about loss functions that respect variable correlations. Anyone building emulators for data assimilation or medium-range forecasting will find the method and the diagnostics worth looking at. It is concrete enough and the claims are falsifiable enough that it deserves a serious referee rather than a desk reject; the reviewers can press on the covariance stability and the coupled-dynamics question without the paper falling apart on its own terms. I would send it out.

Referee Report

3 major / 2 minor

Summary. The manuscript adapts the GraphCast architecture to create an ocean-only emulator trained on NOAA's UFS-Replay dataset. Using a 24-hour timestep, single initial condition, and prescribed atmospheric forcing without autoregressive training, it claims skillful forecasts out to 10-15 day lead times. The central innovation is replacing MSE loss with Mahalanobis distance loss that incorporates the covariance structure of tendencies; this is reported to improve skill and to function as a statistical-dynamical regularizer for the slow, correlated dynamics of the global ocean, as shown via spatial correlation analysis of the forecasts.

Significance. If the reported skill improvements and regularizer effect hold under rigorous out-of-sample testing, the work would be significant for medium-range ocean emulation, offering a computationally efficient alternative to traditional models and a loss function that explicitly respects inter-variable correlations. The approach could improve background states for downstream data-assimilation tasks. The adaptation of GraphCast to ocean variables and the explicit demonstration of the correlation-aware loss as a regularizer are concrete strengths.

major comments (3)

[§4] §4 (Loss function and covariance estimation): The Mahalanobis loss uses a covariance matrix estimated from the UFS-Replay training data. The manuscript does not demonstrate that this fixed covariance remains representative of the joint statistics of tendencies at 10-15 day leads, where non-stationarity arising from different atmospheric forcing regimes or internal variability could render the loss mis-specified. Because the claimed improvement over MSE rests on this loss being correctly calibrated out-of-sample, this is a load-bearing concern.
[§3] §3 (Experimental setup): The ocean-only configuration prescribes atmospheric conditions without ocean feedback. The paper should quantify whether this one-way forcing suffices to capture the coupled dynamics relevant to the slow, correlated ocean processes at the claimed lead times; otherwise the reported skill may not generalize to fully coupled systems.
[§5] §5 (Validation and baselines): Skill is asserted for 10-15 day forecasts, yet the manuscript provides limited information on the number of independent test initial conditions, error bars or ensemble spread on the metrics, and comparisons against persistence or other standard baselines beyond MSE. These details are required to substantiate the central claim of skillful emulation.

minor comments (2)

[§4] Clarify in the methods whether the covariance matrix is held fixed after pre-computation or recomputed during training, and report its condition number or regularization strategy.
[§5] Add statistical significance markers or confidence intervals to the spatial correlation plots in the results section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed review. We address each major comment below with point-by-point responses and indicate where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [§4] §4 (Loss function and covariance estimation): The Mahalanobis loss uses a covariance matrix estimated from the UFS-Replay training data. The manuscript does not demonstrate that this fixed covariance remains representative of the joint statistics of tendencies at 10-15 day leads, where non-stationarity arising from different atmospheric forcing regimes or internal variability could render the loss mis-specified. Because the claimed improvement over MSE rests on this loss being correctly calibrated out-of-sample, this is a load-bearing concern.

Authors: We agree that out-of-sample stability of the covariance is important to substantiate the Mahalanobis loss. The covariance was estimated over the multi-year training period that includes diverse regimes. In the revised manuscript, we will add an analysis in the supplement comparing covariances computed from training versus test periods and across seasonal subsets, using metrics such as the Frobenius norm of the difference. This will quantify any non-stationarity and discuss its impact on the loss calibration. revision: yes
Referee: [§3] §3 (Experimental setup): The ocean-only configuration prescribes atmospheric conditions without ocean feedback. The paper should quantify whether this one-way forcing suffices to capture the coupled dynamics relevant to the slow, correlated ocean processes at the claimed lead times; otherwise the reported skill may not generalize to fully coupled systems.

Authors: The ocean-only setup with prescribed atmospheric forcing isolates the emulator's skill in representing ocean dynamics under observed atmospheric states, which is relevant for many operational contexts. We acknowledge the limitation for fully coupled generalization. In revision, we will expand the discussion to compare our error growth rates against published coupled-model results at 10-15 day leads and explicitly state the one-way forcing as a modeling choice with its associated caveats for two-way coupled applications. revision: partial
Referee: [§5] §5 (Validation and baselines): Skill is asserted for 10-15 day forecasts, yet the manuscript provides limited information on the number of independent test initial conditions, error bars or ensemble spread on the metrics, and comparisons against persistence or other standard baselines beyond MSE. These details are required to substantiate the central claim of skillful emulation.

Authors: We will revise Section 5 to include the exact number of independent test initial conditions, error bars (standard error across test cases) on all skill metrics, and additional baselines including persistence and climatology. These will be added to the text, tables, and figures to provide a more complete statistical assessment of the claimed skill. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the claimed derivation

full rationale

The paper adapts the GraphCast architecture for an ocean emulator trained on UFS-Replay data, using either MSE or Mahalanobis loss (with covariance precomputed from training tendencies) to produce 10-15 day forecasts. No load-bearing step reduces a claimed prediction or result to its own inputs by construction, self-definition, or a self-citation chain. The covariance estimation is a fixed preprocessing choice independent of the final forecast evaluation, and skill claims are presented as outcomes of standard held-out validation rather than tautological fits. The derivation from architecture and loss design to reported forecast skill remains self-contained and externally testable.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are stated. The Mahalanobis loss implicitly requires a covariance matrix whose estimation procedure is not described.

pith-pipeline@v0.9.0 · 5474 in / 1269 out tokens · 35717 ms · 2026-05-10T02:40:08.330994+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 7 canonical work pages · 1 internal anchor

[1]

https://arxiv.org/abs/2310.02074

ACE: A fast, skillful learned global atmospheric model for climate prediction , author=. arXiv preprint arXiv:2310.02074 , year=

work page arXiv
[2]

npj Climate and Atmospheric Science , volume=

ACE2: accurately learning subseasonal to decadal atmospheric variability and forced responses , author=. npj Climate and Atmospheric Science , volume=. 2025 , publisher=

2025
[3]

Science , volume=

Learning skillful medium-range global weather forecasting , author=. Science , volume=. 2023 , publisher=

2023
[4]

Nature , volume=

Accurate medium-range global weather forecasting with 3D neural networks , author=. Nature , volume=. 2023 , publisher=

2023
[5]

FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators , author=. arXiv preprint arXiv:2202.11214 , year=

work page internal anchor Pith review arXiv
[6]

arXiv preprint arXiv:2406.01465 (2024 )

AIFS--ECMWF's data-driven forecasting system , author=. arXiv preprint arXiv:2406.01465 , year=

work page arXiv
[7]

arXiv (2022)

Forecasting global weather with graph neural networks , author=. arXiv preprint arXiv:2202.07575 , year=

work page arXiv
[8]

Journal of Geophysical Research: Machine Learning and Computation , volume=

GLONET: Mercator's end-to-end neural Global Ocean forecasting system , author=. Journal of Geophysical Research: Machine Learning and Computation , volume=. 2025 , publisher=

2025
[9]

arXiv preprint arXiv:2402.02995 , year=

Xihe: A data-driven model for global ocean eddy-resolving forecasting , author=. arXiv preprint arXiv:2402.02995 , year=

work page arXiv
[10]

Nature Communications , volume=

Forecasting the eddying ocean with a deep neural network , author=. Nature Communications , volume=. 2025 , publisher=

2025
[11]

Journal of Geophysical Research: Machine Learning and Computation , volume=

Simultaneous emulation and downscaling with physically consistent deep learning-based regional ocean emulators , author=. Journal of Geophysical Research: Machine Learning and Computation , volume=. 2025 , publisher=

2025
[12]

Scientific Reports , volume=

OceanNet: a principled neural operator-based digital twin for regional oceans , author=. Scientific Reports , volume=. 2024 , publisher=

2024
[13]

Geophysical Research Letters , volume=

Samudra: An AI global ocean emulator for climate , author=. Geophysical Research Letters , volume=. 2025 , publisher=

2025
[14]

EGU General Assembly Conference Abstracts , pages=

The Copernicus global 1/12° oceanic and sea ice reanalysis , author=. EGU General Assembly Conference Abstracts , pages=
[15]

Geophysical Research Letters , volume=

Assimilating observed surface pressure into ML weather prediction models , author=. Geophysical Research Letters , volume=. 2025 , publisher=

2025
[16]

Bulletin of the American Meteorological Society , volume=

Coupled data assimilation for integrated earth system analysis and prediction , author=. Bulletin of the American Meteorological Society , volume=. 2017 , publisher=

2017
[17]

On the generalized distance in statistics , author=. Sankhy. 2018 , publisher=

2018
[18]

Journal of Advances in Modeling Earth Systems , volume=

A comparison of data-driven approaches to build low-dimensional ocean models , author=. Journal of Advances in Modeling Earth Systems , volume=. 2021 , publisher=

2021
[19]

Communications Earth & Environment , volume=

Ensemble forecasting greatly expands the prediction horizon for ocean mesoscale variability , author=. Communications Earth & Environment , volume=. 2021 , publisher=

2021
[20]

arXiv preprint arXiv:2411.07814 , year=

Community Research Earth Digital Intelligence Twin (CREDIT) , author=. arXiv preprint arXiv:2411.07814 , year=

work page arXiv
[21]

Quarterly Journal of the Royal Meteorological Society , volume=

The twentieth century reanalysis project , author=. Quarterly Journal of the Royal Meteorological Society , volume=. 2011 , publisher=

2011
[22]

arXiv preprint arXiv:2512.24440 , year=

Towards mechanistic understanding in a data-driven weather model: internal activations reveal interpretable physical features , author=. arXiv preprint arXiv:2512.24440 , year=

work page arXiv
[23]

, year =

NOAA , title =. , year =
[24]

Quarterly journal of the royal meteorological society , volume=

The ERA5 global reanalysis , author=. Quarterly journal of the royal meteorological society , volume=. 2020 , publisher=

2020
[25]

Ocean science , volume=

The ECMWF operational ensemble reanalysis--analysis system for ocean and sea ice: a description of the system and assessment , author=. Ocean science , volume=. 2019 , publisher=

2019
[26]

Proc Natl Inst Sci India , volume=

On the generalized distance in statistics , author=. Proc Natl Inst Sci India , volume=
[27]

Tellus A: Dynamic Meteorology and Oceanography , volume=

Significance of changes in medium-range forecast scores , author=. Tellus A: Dynamic Meteorology and Oceanography , volume=. 2016 , publisher=

2016
[28]

2011 , publisher=

Statistical methods in the atmospheric sciences , author=. 2011 , publisher=

2011