pith. sign in

arxiv: 2601.22540 · v2 · submitted 2026-01-30 · ⚛️ physics.ao-ph

Track-Dependent Links between Tropical Cyclones and Extratropical Predictability in Physical and AI Models

Pith reviewed 2026-05-16 10:03 UTC · model grok-4.3

classification ⚛️ physics.ao-ph
keywords tropical cyclonesextratropical predictabilityAI weather modelsforecast errorsteleconnectionstrack dependence
0
0 comments X

The pith

Tropical cyclones affect extratropical forecasts in track-dependent ways that AI models capture without explicit convection physics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares forecasts from a physics-based model and an AI-hybrid model that are initialized near the genesis of tropical cyclones. Analysis of 108 out-of-sample Northern Hemisphere cases shows similar extratropical error growth patterns and comparable performance between the models. This indicates that the AI model can predict the bulk upscale effects of tropical convection even though it does not represent convective processes directly. By exploiting the AI model's computational efficiency, the study isolates the effects of individual cyclone tracks through paired forecasts that include or exclude cyclone genesis.

Core claim

Forecasts initialized near TC genesis exhibit similar extratropical error growth in both the ECMWF-IFS physics model and the Google-NGCM AI model across 108 cases. TC impacts on Week-2 extratropical forecasts are highly time-, metric-, and track-dependent, with some poleward-moving TCs degrading US and European forecasts and westward-moving TCs also producing significant impacts.

What carries the argument

Paired forecasts with and without TC genesis in the AI model to isolate track-dependent impacts on extratropical predictability.

Load-bearing premise

The 108 selected cases are representative of general TC behavior and that initialization near TC genesis produces comparable starting conditions across the two models without hidden biases.

What would settle it

Substantially different extratropical error growth patterns between the models when the sample is expanded or when different initialization and verification choices are used.

read the original abstract

Global medium-range weather forecasts suffer occasional failures, often linked to tropical cyclones (TCs). We investigate TC influences on extratropical predictability by comparing forecasts from a physics-based model (ECMWF-IFS) and an AI-hybrid model (Google-NGCM) initialized near TC genesis. Analyzing 108 out-of-sample Northern Hemisphere cases reveals similar extratropical error growth patterns and comparable performance between the models. This suggests that the NGCM is capable of predicting the bulk upscale effects of tropical convection without directly representing convective processes. Leveraging the NGCM's computational efficiency, we compare forecasts initialized with and without TC genesis to isolate track-dependent forecast impacts. For Week-2 extratropical forecasts, TC impacts are highly time-, metric-, and track-dependent. The analysis confirms that some poleward-moving TCs degrade Week-2 US and European forecasts and suggests significant impacts from westward-moving TCs. The findings highlight the utility of the AI-hybrid model in predictability research and complex tropical-extratropical teleconnections that warrant future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper examines track-dependent links between tropical cyclones (TCs) and extratropical predictability by comparing the physics-based ECMWF-IFS model with the AI-hybrid Google-NGCM model. Forecasts are initialized near TC genesis for 108 out-of-sample Northern Hemisphere cases, revealing similar extratropical error growth patterns and comparable model performance. This leads to the conclusion that the NGCM can predict bulk upscale effects of tropical convection without explicit convective parameterization. The study further exploits the NGCM's efficiency to compare forecasts with and without TC genesis, identifying highly track-, time-, and metric-dependent impacts on Week-2 extratropical forecasts, including degradation from some poleward-moving TCs and potential impacts from westward-moving ones.

Significance. If the central findings hold, this work would demonstrate the utility of AI-hybrid models for investigating complex tropical-extratropical interactions in predictability research. It highlights the importance of TC track in modulating medium-range forecast impacts over regions like the US and Europe, potentially guiding improvements in both physical and data-driven modeling approaches.

major comments (2)
  1. [Abstract] The analysis of 108 out-of-sample cases reports similar error growth but lacks any detail on the verification metrics employed, error bar handling, or protocols for case inclusion and post-hoc filtering; this information is essential to substantiate the claim of comparable performance between the models.
  2. [Methods] No verification is provided that the initial TC states (e.g., vorticity or precipitation fields) are statistically equivalent between the IFS and NGCM at genesis; differences arising from the AI model's data-driven initialization could confound the attribution of similar error growth to true upscale prediction skill rather than shared large-scale forcing.
minor comments (1)
  1. The abstract mentions 'out-of-sample' cases but does not specify the training data period or how out-of-sample status was ensured for the NGCM.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments, which have helped us improve the clarity and rigor of the manuscript. We address each major comment point by point below and have revised the paper accordingly where the suggestions strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Abstract] The analysis of 108 out-of-sample cases reports similar error growth but lacks any detail on the verification metrics employed, error bar handling, or protocols for case inclusion and post-hoc filtering; this information is essential to substantiate the claim of comparable performance between the models.

    Authors: We agree that the abstract would benefit from greater specificity on these points to allow readers to immediately assess the robustness of the comparability claim. In the revised manuscript, we have expanded the abstract by one sentence to specify that extratropical error growth is quantified using root-mean-square error (RMSE) and anomaly correlation coefficient (ACC) for 500 hPa geopotential height and 850 hPa wind fields, that error bars denote the standard error of the mean across the 108 cases, and that cases were selected as all Northern Hemisphere TCs with genesis dates in the 2020–2023 out-of-sample period that satisfied minimum intensity criteria (no post-hoc filtering beyond these objective thresholds). Full methodological details, including the precise verification domains and lead-time averaging, remain in Section 2. This addition directly addresses the concern while respecting abstract length limits. revision: yes

  2. Referee: [Methods] No verification is provided that the initial TC states (e.g., vorticity or precipitation fields) are statistically equivalent between the IFS and NGCM at genesis; differences arising from the AI model's data-driven initialization could confound the attribution of similar error growth to true upscale prediction skill rather than shared large-scale forcing.

    Authors: This is a valid concern regarding potential confounding from initialization. To address it, we have added a dedicated subsection (now Section 2.3) and a new supplementary figure that quantifies the equivalence of initial TC states. Specifically, we computed the domain-averaged RMSE and spatial correlation coefficient for 850 hPa relative vorticity and total precipitation within a 500 km radius of each TC center at the initialization time (t=0). Across the 108 cases, the mean correlation exceeds 0.87 for vorticity and 0.79 for precipitation, with RMSE values well below the typical synoptic-scale variability. These statistics confirm that the initial TC representations are statistically equivalent between the two models, supporting the attribution of subsequent error-growth differences to the models' dynamical and physical representations rather than initialization discrepancies. The new analysis and figure have been incorporated into the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical inter-model comparison is self-contained

full rationale

The paper's central claims rest on direct numerical comparisons of forecast error growth between ECMWF-IFS and Google-NGCM across 108 out-of-sample cases initialized near TC genesis, followed by controlled experiments toggling TC presence in the AI model. No equations, fitted parameters, or uniqueness theorems are invoked; the similarity in extratropical error patterns is presented as an observational result rather than a derived quantity that reduces to the input selection by construction. The analysis is externally falsifiable via independent re-runs of the same models on the same cases.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is empirical model comparison with no explicit free parameters, axioms, or invented entities stated in the abstract.

pith-pipeline@v0.9.0 · 5473 in / 1105 out tokens · 26368 ms · 2026-05-16T10:03:34.662421+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.