On the robustness of Mann-Kendall tests used to forecast critical transitions

Nils Thibeau--Sutre; Tom J.M. Van Dooren; Tristan Gamot

arxiv: 2604.15230 · v1 · submitted 2026-04-16 · 📊 stat.AP

On the robustness of Mann-Kendall tests used to forecast critical transitions

Tristan Gamot , Nils Thibeau--Sutre , Tom J.M. Van Dooren This is my paper

Pith reviewed 2026-05-10 09:12 UTC · model grok-4.3

classification 📊 stat.AP

keywords Mann-Kendall testcritical transitionsearly warning signalstrend detectionautocorrelationtype I errorrobustness

0 comments

The pith

Mann-Kendall tests inflate false positives when detecting trends in early-warning signals for critical transitions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether Mann-Kendall statistics reliably identify trends in early-warning indicators that precede critical transitions. These tests assume the statistic follows a Gaussian distribution even when the underlying series are autocorrelated, an assumption the authors check against simulated data that mimic real critical-transition behavior. The simulations cover all common transition types studied in early-warning research. Empirical distributions of the statistic deviate from the expected Gaussian form, producing type I error rates higher than the nominal level. As a result the tests would often declare an approaching transition when none is occurring, and the authors recommend against their use for this purpose.

Core claim

Empirical distributions of the Mann-Kendall statistic from classical early-warning indicators before critical transitions do not match the theoretical Gaussian distributions assumed by the tests, producing inflated type I error rates across all commonly investigated transition types.

What carries the argument

The Mann-Kendall statistic whose distribution is taken to be Gaussian under asymptotic arguments even for autocorrelated series.

If this is right

Routine use of the tests will announce critical transitions that are not taking place.
The mismatch holds for every standard type of critical transition examined in early-warning studies.
Alternative trend-detection methods are needed for reliable forecasting in this setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Fields that rely on early-warning signals, such as ecology and climate science, face systematic over-prediction of transitions until methods change.
Other non-parametric trend tests that share the same asymptotic Gaussian assumption may exhibit similar failures under autocorrelation.
Re-analysis of published early-warning claims that used Mann-Kendall tests could revise the frequency of reported transitions.

Load-bearing premise

The simulated time series and early-warning indicators capture the autocorrelation structures and lengths that occur in real critical-transition applications.

What would settle it

A real-world dataset of early-warning indicators leading to a documented critical transition in which the Mann-Kendall statistic distribution matches the theoretical Gaussian form and type I error rates stay at the nominal level.

Figures

Figures reproduced from arXiv: 2604.15230 by Nils Thibeau--Sutre, Tom J.M. Van Dooren, Tristan Gamot.

**Figure 2.** Figure 2: Empirical distributions of normalized Mann-Kendall’s tau calculated from lag-1 auto [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of simulated empirical distributions of test statistics [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Empirical rejection rates of tests of the null hypothesis of no trend when the nominal [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Empirical rejection rates of the null hypothesis of no trend for the Hamed and Rao [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

read the original abstract

Non-parametric approaches to test for trends in time series make use of the Mann-Kendall statistic. Based on asymptotic arguments, these tests assume that its distribution follows a Gaussian distribution, even for autocorrelated time series. Recent results on the lack of validity of this assumption urge a robustness analysis of these approaches. While the issue is relevant across a wide range of applications, we illustrate it here in the context of detecting early warning signals (EWS) of critical transitions, which are used across a variety of research domains, and where commonly applied methods generate autocorrelation. We present a broad analysis, covering all types of critical transitions commonly investigated in EWS studies. We compare empirical distributions of the Mann-Kendall statistic computed from classical EWS indicators preceding critical transitions to the theoretical distributions hypothesized by Mann-Kendall tests. We detect mismatches leading to inflated type I error rates, which would routinely lead to announcing a critical transition while it is not occurring. In contrast to a recent recommendation, we conclude that the use of Mann-Kendall tests for trend detection in the context of forecasting critical transitions should be avoided. We point out several alternative methods available instead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mann-Kendall tests show inflated type I errors on simulated EWS indicators before transitions, so the paper advises against using them, though real-data autocorrelation needs checking.

read the letter

Mann-Kendall tests show inflated type I errors on simulated EWS indicators before transitions, so the paper advises against using them, though real-data autocorrelation needs checking. The authors run Monte Carlo simulations of classical early-warning indicators (variance, lag-1 autocorrelation, etc.) ahead of all standard bifurcation types and compare the resulting Mann-Kendall statistic distributions to the asymptotic Gaussian that the test assumes. They find clear mismatches that push the false-positive rate well above nominal levels. This is the main new piece: a targeted, systematic check inside the EWS literature rather than a general statistical result. The analysis stays simple and direct—no fitted parameters, just empirical versus theoretical distributions—so the mismatch evidence is easy to reproduce and evaluate. Credit is due for covering the full range of transition classes and for pointing to existing alternatives instead of stopping at the warning. The soft spot is simulation realism. The protocols use specific models, window lengths, noise levels, and approach rates to the tipping point. Without a side-by-side look at autocorrelation structure or effective sample size in actual paleo-climate, ecological, or financial records, it is hard to know how much the reported error inflation carries over to field data. If real series are less dependent than the simulations, the practical problem shrinks. The central claim still holds for the cases they simulated, and the low circularity (straight Monte Carlo, no self-referential fitting) keeps the evidence clean. This paper is for applied researchers who rely on trend tests inside early-warning workflows. Anyone who has used Mann-Kendall on autocorrelated indicators will get a concrete caution and a short list of substitutes. It deserves peer review because the empirical comparison is straightforward to verify and the stakes for practice are real, even if the authors should add real-data benchmarks in revision.

Referee Report

2 major / 1 minor

Summary. The manuscript performs Monte Carlo simulations of classical early-warning indicators (variance, lag-1 autocorrelation, etc.) generated from models of all commonly studied critical transitions. It computes the Mann-Kendall statistic on these indicator series and compares the resulting empirical distributions to the asymptotic N(0, σ²) distribution assumed by standard Mann-Kendall tests, reporting systematic mismatches that produce inflated type I error rates. The authors conclude that Mann-Kendall tests should be avoided for trend detection when forecasting critical transitions and point to alternative methods.

Significance. If the reported distributional mismatches persist under autocorrelation structures typical of real-world EWS series, the work would identify a practically important source of false positives in a method widely applied across ecology, climate, and finance. The direct Monte-Carlo comparison of empirical versus theoretical MK distributions supplies a transparent, parameter-free empirical test of the type I error claim and the coverage of multiple bifurcation types is a strength.

major comments (2)

[Simulation protocols and Results] The central recommendation to avoid Mann-Kendall tests rests on mismatches observed only under the paper's chosen simulation protocols (specific models, window lengths, noise levels, and approach rates). No quantitative comparison is made between the autocorrelation structure or effective sample sizes of the simulated EWS indicators and those found in real-world records (e.g., paleo-climate, ecological, or financial time series). This gap directly affects whether the reported type I error inflation applies to the practical settings in which the tests are used.
[Methods and Results] The manuscript states that mismatches lead to inflated type I error rates but does not report the number of Monte Carlo replications, exact sample sizes, window lengths, or quantitative measures of distributional discrepancy (e.g., Kolmogorov-Smirnov distances or empirical 95 % quantiles). Without these details it is impossible to judge the magnitude and robustness of the claimed inflation.

minor comments (1)

[Introduction] The abstract and introduction refer to 'all types of critical transitions commonly investigated' but the precise set of models and parameter ranges used is not enumerated in a table or appendix, hindering reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential importance of our findings. We respond to each major comment in turn and have updated the manuscript to address the issues raised.

read point-by-point responses

Referee: [Simulation protocols and Results] The central recommendation to avoid Mann-Kendall tests rests on mismatches observed only under the paper's chosen simulation protocols (specific models, window lengths, noise levels, and approach rates). No quantitative comparison is made between the autocorrelation structure or effective sample sizes of the simulated EWS indicators and those found in real-world records (e.g., paleo-climate, ecological, or financial time series). This gap directly affects whether the reported type I error inflation applies to the practical settings in which the tests are used.

Authors: The simulations were chosen to represent the standard models and parameter regimes used in the EWS literature for all major types of critical transitions. While we did not perform a systematic quantitative comparison to specific real-world datasets, the autocorrelation induced in the indicators is a direct result of the slowing down phenomenon near the transition, which is expected to be present in real applications. We have added to the revised manuscript a paragraph discussing the typical autocorrelation lengths observed in our simulations and how they align with those in published EWS analyses of real data from ecology and climate science. This supports that the type I error inflation is relevant to practical use cases. revision: partial
Referee: [Methods and Results] The manuscript states that mismatches lead to inflated type I error rates but does not report the number of Monte Carlo replications, exact sample sizes, window lengths, or quantitative measures of distributional discrepancy (e.g., Kolmogorov-Smirnov distances or empirical 95 % quantiles). Without these details it is impossible to judge the magnitude and robustness of the claimed inflation.

Authors: We have revised the manuscript to include all requested details. The Methods section now explicitly reports the number of Monte Carlo replications, the exact sample sizes, window lengths, and noise levels used in the simulations. Additionally, we have incorporated quantitative measures of the distributional discrepancies, including Kolmogorov-Smirnov distances between the empirical and theoretical distributions as well as the empirical 95% quantiles of the Mann-Kendall statistic, to allow readers to assess the magnitude of the type I error inflation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; direct Monte Carlo comparison of empirical vs. theoretical MK distributions

full rationale

The paper's central analysis generates simulated EWS indicator time series from standard bifurcation models, computes the Mann-Kendall statistic on those series, and directly compares the resulting empirical distribution against the asymptotic N(0, σ²) assumed by MK tests. This is an independent Monte Carlo validation step with no fitted parameters, no self-referential equations, and no load-bearing self-citations. The mismatch and resulting type-I error inflation are outputs of the simulation protocol rather than inputs by construction. The conclusion to avoid MK tests follows from this empirical discrepancy without reducing to any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis relies on the standard null-hypothesis assumption of the Mann-Kendall test and on the fidelity of the chosen simulation models; no new free parameters, ad-hoc axioms, or invented entities are introduced.

axioms (1)

domain assumption Under the null of no trend, the Mann-Kendall statistic is asymptotically Gaussian even in the presence of autocorrelation.
This is the assumption whose validity is being empirically tested by comparing simulated distributions to the theoretical Gaussian.

pith-pipeline@v0.9.0 · 5511 in / 1323 out tokens · 61454 ms · 2026-05-10T09:12:26.039586+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

Aburn.sdeint

[Abu22] Matthew J. Aburn.sdeint. Version GPL-3.0+. 2022.url:https://github.com/ mattja/sdeint. [Ash+12] Peter Ashwin et al. ‘Tipping points in open systems: bifurcation, noise-induced and rate-dependent examples in the climate system’. In:Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences370.1962 (2012), pp....

work page 2022
[2]

effective

[BH46] GV Bayley & JM Hammersley. ‘The "effective" number of independent observations inanautocorrelatedtimeseries’.In:Supplement to the Journal of the Royal Statistical Society8.2 (1946), pp. 184–197. [BL15] Chris A Boulton & Timothy M Lenton. ‘Slowing down of North Pacific climate variability and its implications for abrupt ecosystem change’. In:Proceed...

work page 1946
[3]

‘Deep learning for early warning signals of tipping points’

[Bur+21] Thomas M Bury et al. ‘Deep learning for early warning signals of tipping points’. In: Proceedings of the National Academy of Sciences118.39 (2021), e2106140118. [CGE22] Shiyang Chen, Amin Ghadami & Bogdan I Epureanu. ‘Practical guide to using Kend- all’sτinthecontextofforecastingcriticaltransitions’.In:Royal Society Open Science 9.7 (2022). Artic...

work page 2021
[4]

‘Early warning signals of extinction in deterior- ating environments’

[DG10] John M Drake & Blaine D Griffen. ‘Early warning signals of extinction in deterior- ating environments’. In:Nature467.7314 (2010), pp. 456–459. [DHW19] Cees Diks, Cars Hommes & Juanxi Wang. ‘Critical slowing down as an early warning signal for financial crises?’ In:Empirical Economics57 (2019), pp. 1201–1228. [Don24] Graham M Donovan. ‘Characterisin...

work page 2010
[5]

‘A modified Mann-Kendall trend test for autocorrelated data’

[HR98] Khaled H Hamed & A Ramachandra Rao. ‘A modified Mann-Kendall trend test for autocorrelated data’. In:Journal of hydrology204.1-4 (1998), pp. 182–196. [Hu+20] Zichen Hu et al. ‘Modified Mann-Kendall trend test for hydrological time series under the scaling hypothesis and its application’. In:Hydrological Sciences Journal65.14 (2020), pp. 2419–2438. ...

work page 1998
[6]

‘Estimates of the regression coefficient based on Kendall’s tau’

[Sen68] Pranab Kumar Sen. ‘Estimates of the regression coefficient based on Kendall’s tau’. In:Journal of the American statistical association63.324 (1968), pp. 1379–1389. [Sou+21] Emma Southall et al. ‘Early warning signals of infectious disease transitions: a re- view’. In:Journal of the Royal Society Interface18.182 (2021). Article 20210555. [Str18] St...

work page doi:10.5281/zenodo.19613302 1968

[1] [1]

Aburn.sdeint

[Abu22] Matthew J. Aburn.sdeint. Version GPL-3.0+. 2022.url:https://github.com/ mattja/sdeint. [Ash+12] Peter Ashwin et al. ‘Tipping points in open systems: bifurcation, noise-induced and rate-dependent examples in the climate system’. In:Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences370.1962 (2012), pp....

work page 2022

[2] [2]

effective

[BH46] GV Bayley & JM Hammersley. ‘The "effective" number of independent observations inanautocorrelatedtimeseries’.In:Supplement to the Journal of the Royal Statistical Society8.2 (1946), pp. 184–197. [BL15] Chris A Boulton & Timothy M Lenton. ‘Slowing down of North Pacific climate variability and its implications for abrupt ecosystem change’. In:Proceed...

work page 1946

[3] [3]

‘Deep learning for early warning signals of tipping points’

[Bur+21] Thomas M Bury et al. ‘Deep learning for early warning signals of tipping points’. In: Proceedings of the National Academy of Sciences118.39 (2021), e2106140118. [CGE22] Shiyang Chen, Amin Ghadami & Bogdan I Epureanu. ‘Practical guide to using Kend- all’sτinthecontextofforecastingcriticaltransitions’.In:Royal Society Open Science 9.7 (2022). Artic...

work page 2021

[4] [4]

‘Early warning signals of extinction in deterior- ating environments’

[DG10] John M Drake & Blaine D Griffen. ‘Early warning signals of extinction in deterior- ating environments’. In:Nature467.7314 (2010), pp. 456–459. [DHW19] Cees Diks, Cars Hommes & Juanxi Wang. ‘Critical slowing down as an early warning signal for financial crises?’ In:Empirical Economics57 (2019), pp. 1201–1228. [Don24] Graham M Donovan. ‘Characterisin...

work page 2010

[5] [5]

‘A modified Mann-Kendall trend test for autocorrelated data’

[HR98] Khaled H Hamed & A Ramachandra Rao. ‘A modified Mann-Kendall trend test for autocorrelated data’. In:Journal of hydrology204.1-4 (1998), pp. 182–196. [Hu+20] Zichen Hu et al. ‘Modified Mann-Kendall trend test for hydrological time series under the scaling hypothesis and its application’. In:Hydrological Sciences Journal65.14 (2020), pp. 2419–2438. ...

work page 1998

[6] [6]

‘Estimates of the regression coefficient based on Kendall’s tau’

[Sen68] Pranab Kumar Sen. ‘Estimates of the regression coefficient based on Kendall’s tau’. In:Journal of the American statistical association63.324 (1968), pp. 1379–1389. [Sou+21] Emma Southall et al. ‘Early warning signals of infectious disease transitions: a re- view’. In:Journal of the Royal Society Interface18.182 (2021). Article 20210555. [Str18] St...

work page doi:10.5281/zenodo.19613302 1968