Artificial skill in monsoon onset prediction: two recent examples
Pith reviewed 2026-05-24 19:20 UTC · model grok-4.3
The pith
Verification overlap between model setup and test data creates artificially high skill scores in two monsoon onset prediction methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For two cases of empirical monsoon onset prediction it is argued that current verification practice leads to optimistically biased skill, caused by the intricacy of the model setup. For the case of the operational forecasts by the Indian Meteorological Department (IMD) it leads to an overlap of model definition and verification data. A more seriously flawed verification was used in a recent method based on trend extrapolations of 'tipping elements' (TE). Claims of TE of predicting onset 2 weeks earlier than other methods are unjustified. On the contrary, the correlation between TE forecasts and observations is as low as 0.24 and compares poorly to the reported IMD correlation of 0.78. That后者
What carries the argument
Overlap of model-definition data with verification data, which violates independence and produces optimistically biased skill estimates.
If this is right
- The reported IMD correlation of 0.78 is likely inflated by the overlap.
- The TE method achieves only a 0.24 correlation and does not predict onset two weeks earlier than alternatives.
- Dynamical models currently supply the most reliable monsoon-onset skill, around 0.7.
- Any future empirical monsoon-onset method must demonstrate skill on strictly independent verification data.
Where Pith is reading between the lines
- Similar verification-overlap problems could exist in other empirical climate-prediction settings that rely on limited observational records.
- Operational agencies may need to adopt rolling independent test periods to keep reported skill scores honest.
- Low actual skill in the TE approach suggests that simple trend extrapolation misses important interannual variability in monsoon timing.
Load-bearing premise
The verification data sets overlap with or are not independent from the data used to define or fit the prediction models.
What would settle it
Recomputing the skill scores on a verification period or data set that was never used in model definition or fitting; if the bias claim holds, the resulting correlations for both IMD and TE methods will be substantially lower than the published values.
read the original abstract
For two cases of empirical monsoon onset prediction it is argued that current verification practice leads to optimistically biased skill, caused by the intricacy of the model setup. For the case of the operational forecasts by the Indian Meteorological Department (IMD) it leads to an overlap of model definition and verification data. A more seriously flawed verification was used in a recent method based on trend extrapolations of 'tipping elements' (TE). Claims of TE of predicting onset 2 weeks earlier than other methods are unjustified. On the contrary, the correlation between TE forecasts and observations is as low as 0.24 and compares poorly to the reported IMD correlation of 0.78. That latter value likely being artificially inflated, currently the best and most reliable monsoon onset predictions come from a dynamical model with more reliable skill values of about 0.7.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that verification practices for two empirical methods of predicting Indian monsoon onset—the Indian Meteorological Department (IMD) operational forecasts and a tipping elements (TE) trend extrapolation approach—suffer from optimistic bias due to overlap between model definition and verification data. It reports a re-calculated correlation of 0.24 for the TE method, contrasts it with the reported IMD correlation of 0.78 (deemed artificially inflated), and suggests dynamical models achieve more reliable skill around 0.7.
Significance. If the specific overlaps and re-calculations hold, the paper would usefully caution against inflated skill estimates in monsoon prediction literature and advocate for stricter independence in verification, potentially strengthening the reliability of operational forecasts.
major comments (3)
- [Abstract] Abstract: The assertion of overlap between model definition and verification data for the IMD forecasts is presented without an explicit mapping of the years or variables used in each step, making it difficult to verify the claimed leakage.
- [Abstract] Abstract: The re-calculation of the TE method's correlation to 0.24 is stated without detailing the exact onset-date series, the fitting window, or the cross-validation protocol employed, which is necessary to confirm independence from the original TE study.
- [Abstract] Abstract: The comparison to dynamical model skill of ~0.7 lacks a specific reference or citation to the dynamical model in question and its verification procedure.
Simulated Author's Rebuttal
We thank the referee for their detailed comments on the abstract. We agree that the abstract would benefit from additional specifics to allow readers to more readily verify the claims, and we will revise it accordingly while preserving the manuscript's core arguments.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion of overlap between model definition and verification data for the IMD forecasts is presented without an explicit mapping of the years or variables used in each step, making it difficult to verify the claimed leakage.
Authors: The full manuscript provides the specific years and variables for the IMD forecast definition versus verification periods that establish the overlap. To address the concern, we will revise the abstract to include a concise mapping of the relevant periods. revision: yes
-
Referee: [Abstract] Abstract: The re-calculation of the TE method's correlation to 0.24 is stated without detailing the exact onset-date series, the fitting window, or the cross-validation protocol employed, which is necessary to confirm independence from the original TE study.
Authors: The manuscript's methods section specifies the onset-date series, fitting window for the trend extrapolations, and the independent verification protocol that avoids overlap with the original TE study, yielding the 0.24 correlation. We will revise the abstract to briefly reference these elements for clarity. revision: yes
-
Referee: [Abstract] Abstract: The comparison to dynamical model skill of ~0.7 lacks a specific reference or citation to the dynamical model in question and its verification procedure.
Authors: We agree a citation is required. The ~0.7 value is taken from published dynamical model verifications of monsoon onset. We will add the specific reference(s) to the revised abstract. revision: yes
Circularity Check
No circularity: critique paper with external re-calculations
full rationale
The paper is a methodological critique of two external monsoon-onset forecasting approaches (IMD operational forecasts and a tipping-element extrapolation method). It does not advance its own predictive model, does not fit parameters to data, and does not present any derivation chain in which a claimed result is obtained by construction from its own inputs. The reported correlation values (0.78 for IMD, 0.24 for TE) are presented as re-computed verification statistics drawn from external sources; no self-citation load-bearing step, self-definitional loop, or fitted-input-called-prediction pattern appears. The central claim concerns data leakage in prior studies and is therefore evaluated against external benchmarks rather than internal equations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Verification data must be independent from model definition data to avoid optimistic bias in skill scores
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.