Artificial skill in monsoon onset prediction: two recent examples

Gerd B\"urger

arxiv: 1907.08114 · v1 · pith:5YN5MZD2new · submitted 2019-07-18 · 📊 stat.AP

Artificial skill in monsoon onset prediction: two recent examples

Gerd B\"urger This is my paper

Pith reviewed 2026-05-24 19:20 UTC · model grok-4.3

classification 📊 stat.AP

keywords monsoon onset predictionverification biasartificial skillIndian Meteorological Departmenttipping elementsskill assessmentempirical predictionindependent verification

0 comments

The pith

Verification overlap between model setup and test data creates artificially high skill scores in two monsoon onset prediction methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines two empirical approaches to predicting Indian monsoon onset and shows that their reported skill levels are inflated because the data used to define or fit the models overlaps with the data used to verify performance. In the operational IMD forecasts this overlap arises from the intricate model setup; in the tipping-element trend-extrapolation method the verification procedure is even more flawed, yielding an actual correlation of only 0.24 instead of the claimed superiority. A reader would care because monsoon-onset forecasts guide agricultural and water-management decisions across South Asia, and overstated skill can lead to misplaced reliance on methods that do not actually perform well. Once the overlap is removed, the paper concludes that dynamical models supply the more trustworthy skill values around 0.7.

Core claim

For two cases of empirical monsoon onset prediction it is argued that current verification practice leads to optimistically biased skill, caused by the intricacy of the model setup. For the case of the operational forecasts by the Indian Meteorological Department (IMD) it leads to an overlap of model definition and verification data. A more seriously flawed verification was used in a recent method based on trend extrapolations of 'tipping elements' (TE). Claims of TE of predicting onset 2 weeks earlier than other methods are unjustified. On the contrary, the correlation between TE forecasts and observations is as low as 0.24 and compares poorly to the reported IMD correlation of 0.78. That后者

What carries the argument

Overlap of model-definition data with verification data, which violates independence and produces optimistically biased skill estimates.

If this is right

The reported IMD correlation of 0.78 is likely inflated by the overlap.
The TE method achieves only a 0.24 correlation and does not predict onset two weeks earlier than alternatives.
Dynamical models currently supply the most reliable monsoon-onset skill, around 0.7.
Any future empirical monsoon-onset method must demonstrate skill on strictly independent verification data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar verification-overlap problems could exist in other empirical climate-prediction settings that rely on limited observational records.
Operational agencies may need to adopt rolling independent test periods to keep reported skill scores honest.
Low actual skill in the TE approach suggests that simple trend extrapolation misses important interannual variability in monsoon timing.

Load-bearing premise

The verification data sets overlap with or are not independent from the data used to define or fit the prediction models.

What would settle it

Recomputing the skill scores on a verification period or data set that was never used in model definition or fitting; if the bias claim holds, the resulting correlations for both IMD and TE methods will be substantially lower than the published values.

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Paper shows verification overlap inflates skill scores for IMD and tipping-element monsoon onset methods, with TE correlation recalculated at 0.24.

read the letter

The main takeaway is that reported skill for two empirical monsoon onset predictors looks artificially high because verification data overlapped with the steps used to define or fit the models. The paper recalculates the tipping-element forecasts and gets a correlation of 0.24 with observations, which undercuts the claim that they beat other methods by two weeks. It also flags the IMD operational forecasts' 0.78 correlation as likely inflated for the same overlap reason, and notes that dynamical models reach about 0.7 with cleaner verification. This is useful because it gives a concrete number for the TE case and reminds people that operational forecasts for agriculture need strict separation between setup and test periods. The IMD overlap argument is the softer spot. The abstract and stress-test note do not list the exact years or variables that went into model definition versus verification, so it remains possible the original procedure used only pre-period climatology and avoided leakage. Without that mapping shown, the 0.78 claim rests partly on assertion. The TE re-calculation looks more direct since it supplies the low correlation value. The paper is aimed at people who build or verify statistical forecasts for monsoon timing. A reader who cares about proper cross-validation in climate applications will find the examples worth checking. It deserves peer review so the authors can add the missing year lists and calculations for referees to inspect.

Referee Report

3 major / 0 minor

Summary. The manuscript claims that verification practices for two empirical methods of predicting Indian monsoon onset—the Indian Meteorological Department (IMD) operational forecasts and a tipping elements (TE) trend extrapolation approach—suffer from optimistic bias due to overlap between model definition and verification data. It reports a re-calculated correlation of 0.24 for the TE method, contrasts it with the reported IMD correlation of 0.78 (deemed artificially inflated), and suggests dynamical models achieve more reliable skill around 0.7.

Significance. If the specific overlaps and re-calculations hold, the paper would usefully caution against inflated skill estimates in monsoon prediction literature and advocate for stricter independence in verification, potentially strengthening the reliability of operational forecasts.

major comments (3)

[Abstract] Abstract: The assertion of overlap between model definition and verification data for the IMD forecasts is presented without an explicit mapping of the years or variables used in each step, making it difficult to verify the claimed leakage.
[Abstract] Abstract: The re-calculation of the TE method's correlation to 0.24 is stated without detailing the exact onset-date series, the fitting window, or the cross-validation protocol employed, which is necessary to confirm independence from the original TE study.
[Abstract] Abstract: The comparison to dynamical model skill of ~0.7 lacks a specific reference or citation to the dynamical model in question and its verification procedure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed comments on the abstract. We agree that the abstract would benefit from additional specifics to allow readers to more readily verify the claims, and we will revise it accordingly while preserving the manuscript's core arguments.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion of overlap between model definition and verification data for the IMD forecasts is presented without an explicit mapping of the years or variables used in each step, making it difficult to verify the claimed leakage.

Authors: The full manuscript provides the specific years and variables for the IMD forecast definition versus verification periods that establish the overlap. To address the concern, we will revise the abstract to include a concise mapping of the relevant periods. revision: yes
Referee: [Abstract] Abstract: The re-calculation of the TE method's correlation to 0.24 is stated without detailing the exact onset-date series, the fitting window, or the cross-validation protocol employed, which is necessary to confirm independence from the original TE study.

Authors: The manuscript's methods section specifies the onset-date series, fitting window for the trend extrapolations, and the independent verification protocol that avoids overlap with the original TE study, yielding the 0.24 correlation. We will revise the abstract to briefly reference these elements for clarity. revision: yes
Referee: [Abstract] Abstract: The comparison to dynamical model skill of ~0.7 lacks a specific reference or citation to the dynamical model in question and its verification procedure.

Authors: We agree a citation is required. The ~0.7 value is taken from published dynamical model verifications of monsoon onset. We will add the specific reference(s) to the revised abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: critique paper with external re-calculations

full rationale

The paper is a methodological critique of two external monsoon-onset forecasting approaches (IMD operational forecasts and a tipping-element extrapolation method). It does not advance its own predictive model, does not fit parameters to data, and does not present any derivation chain in which a claimed result is obtained by construction from its own inputs. The reported correlation values (0.78 for IMD, 0.24 for TE) are presented as re-computed verification statistics drawn from external sources; no self-citation load-bearing step, self-definitional loop, or fitted-input-called-prediction pattern appears. The central claim concerns data leakage in prior studies and is therefore evaluated against external benchmarks rather than internal equations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard statistical requirements for independent verification data; no free parameters or new entities are introduced.

axioms (1)

domain assumption Verification data must be independent from model definition data to avoid optimistic bias in skill scores
This premise is invoked to argue that overlap in the IMD and TE cases produces inflated correlations.

pith-pipeline@v0.9.0 · 5663 in / 1417 out tokens · 45296 ms · 2026-05-24T19:20:40.502842+00:00 · methodology

Artificial skill in monsoon onset prediction: two recent examples

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)