pith. sign in

arxiv: 2604.08676 · v1 · submitted 2026-04-09 · 📊 stat.ME

StationarityToolkit: Comprehensive Time Series Stationarity Analysis in Python

Pith reviewed 2026-05-10 16:46 UTC · model grok-4.3

classification 📊 stat.ME
keywords time seriesstationaritystatistical testsPython librarytrendvarianceseasonalitydiagnostics
0
0 comments X

The pith

A Python library runs ten statistical tests across trend, variance, and seasonality to diagnose non-stationarity in time series data with detailed reports instead of binary verdicts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

StationarityToolkit is a Python library that performs ten statistical tests divided into trend, variance, and seasonality categories to analyze time series data. This approach matters because different forms of non-stationarity require distinct tests and transformations, and a single test often fails to distinguish them. The library automatically infers frequency from a datetime index, supplies test statistics, p-values, and actionable notes for each result, and supports an iterative test-transform-retest process. Users can therefore identify the specific type of non-stationarity present rather than receiving only a yes-or-no answer. By grouping tests into three categories the toolkit aims to cover the main sources of non-stationarity encountered in practical forecasting and analysis tasks.

Core claim

The paper presents StationarityToolkit as a comprehensive Python library that executes 10 statistical tests—four for trends, four for variance changes, and two for seasonality—on time series that include a datetime index. It infers the series frequency automatically, reports test statistics and p-values with clear interpretations, and adds actionable notes on what each detection implies, enabling users to apply targeted transformations and retest until the series satisfies stationarity assumptions for downstream modeling.

What carries the argument

The StationarityToolkit library, which categorizes and orchestrates ten stationarity tests to generate diagnostic outputs with statistics, p-values, and transformation recommendations.

Load-bearing premise

The ten selected tests together capture the main types of non-stationarity found in real data and that automatic frequency inference from datetime indices works reliably across different formats and sampling patterns.

What would settle it

A time series known to contain a structural break or variance shift that none of the ten tests flags, yet subsequent forecasting models trained on the data show clear degradation attributable to undetected non-stationarity.

Figures

Figures reproduced from arXiv: 2604.08676 by Bhanu Suraj Malla, Yuqing Hu.

Figure 1
Figure 1. Figure 1: Example input: synthetic time series with noise, baseline, and season [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example output: StationarityToolkit summary and detailed test results [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
read the original abstract

Time-series stationarity is a property that statistical characteristics such as trend, variance, seasonality remain constant over time. It is considered fundamental to many forecasting and analysis methods. Different tests detect different types of non-stationarity: structural breaks or deterministic trends, clustered or time-dependent variance, stochastic or deterministic seasonality. A series might pass one test while failing another; single-test approaches seldom distinguish between conceptually different types of non-stationarity that require different types of tests and transformations. `StationarityToolkit` addresses this by providing a comprehensive Python library that runs 10 statistical tests across three categories: trend (4 tests), variance (4 tests), and seasonality (2 tests). Rather than a binary stationary/non-stationary verdict, users receive detailed diagnostics with actionable notes for each detection. The toolkit automatically infers the frequency of the data provided (requires datetime index), provides clear interpretations with test statistics and p-values, and supports an iterative test-transform-retest workflow essential for real-world data sets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces StationarityToolkit, a Python library that applies 10 established statistical tests for time-series stationarity, grouped into trend (4 tests), variance (4 tests), and seasonality (2 tests). It returns detailed per-test diagnostics with statistics, p-values, and actionable notes rather than binary verdicts, includes automatic frequency inference from datetime indices, and supports iterative test-transform-retest workflows.

Significance. If the implementation is correct and handles edge cases reliably, the toolkit could offer practical value to applied statisticians and forecasters by enabling more nuanced, multi-faceted stationarity diagnostics than single-test approaches commonly used in preprocessing pipelines.

major comments (1)
  1. [Abstract] Abstract: the central claim that the toolkit 'automatically infers the frequency of the data provided (requires datetime index)' and supports real-world iterative workflows is load-bearing, yet no description is given of the inference algorithm, its handling of missing values, irregular sampling, or non-standard datetime formats; without this, the reliability of the advertised functionality cannot be assessed.
minor comments (2)
  1. The first sentence of the abstract is grammatically incomplete ('a property that statistical characteristics such as trend, variance, seasonality remain constant'); rephrasing for clarity would improve readability.
  2. [Abstract] Explicitly naming the 10 tests and citing their original references (e.g., ADF, KPSS, etc.) would allow readers to evaluate coverage without inspecting the source code.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript describing StationarityToolkit. The point raised about insufficient documentation of the frequency inference mechanism is well-taken, and we address it directly below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the toolkit 'automatically infers the frequency of the data provided (requires datetime index)' and supports real-world iterative workflows is load-bearing, yet no description is given of the inference algorithm, its handling of missing values, irregular sampling, or non-standard datetime formats; without this, the reliability of the advertised functionality cannot be assessed.

    Authors: We agree that the manuscript does not provide sufficient detail on the frequency inference procedure, which limits the ability to evaluate its robustness. The current implementation uses pandas.infer_freq as the core method, supplemented by custom logic to compute intervals from the datetime index after converting via pandas.to_datetime (with infer_datetime_format=True for format flexibility). For missing values, NaT entries are dropped before inference, and a warning is issued if they exceed 5% of observations; no imputation is performed automatically. Irregular sampling is detected by checking the standard deviation of consecutive time deltas against a tolerance threshold (default 1e-6 relative to the median delta), triggering a warning and fallback to a user-provided freq parameter if inconsistency is found. Non-standard formats are handled through pandas parsing, supporting ISO 8601, common regional variants, and explicit format strings. In the revised manuscript we will add a new subsection in the Implementation section with pseudocode, edge-case examples, and explicit discussion of these behaviors. This documentation will also clarify how the inferred frequency enables the iterative test-transform-retest workflow by ensuring consistent re-indexing after transformations such as differencing or deseasonalization. The package code already contains these safeguards, so the revision will consist of expanded description rather than new functionality. revision: yes

Circularity Check

0 steps flagged

No significant circularity; tool-description paper with no derivations

full rationale

The paper presents StationarityToolkit as a Python library that aggregates 10 established statistical tests (4 trend, 4 variance, 2 seasonality) plus automatic frequency inference from datetime indices, returning diagnostics rather than new theoretical results. No equations, fitted parameters, or derivation chain appear in the provided text; the central claim is simply that the library implements and organizes these pre-existing procedures with actionable output. This matches the reader's assessment of zero circularity and satisfies the hard rule that circularity is only flagged when a specific reduction to inputs can be quoted. The work is self-contained as a software contribution without self-citation load-bearing or self-definitional steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The contribution is a software implementation of standard tests. No free parameters are fitted, no new axioms are introduced, and no invented entities are postulated.

pith-pipeline@v0.9.0 · 5467 in / 1076 out tokens · 55478 ms · 2026-05-10T16:46:12.926295+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Do Stationarity Transformations Actually Improve Time Series Forecasts? A Controlled Experimental Evaluation

    stat.ME 2026-05 unverdicted novelty 7.0

    Large-scale experiments on synthetic data find stationarity transformations improve forecasts in only 18% of matched cases, with variance stabilization as the main exception and signal attenuation as the mechanism.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · cited by 1 Pith paper

  1. [1]

    Bartlett, M. S. (1937). Properties of sufficiency and statistical tests.Proceedings of the Royal Society A,160(901), 268–282

  2. [2]

    Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations.Journal of the Royal Statistical Society: Series B,26(2), 211–252

  3. [3]

    B., Cleveland, W

    Cleveland, R. B., Cleveland, W. S., McRae, J. E., & Terpenning, I. (1990). STL: A seasonal-trend decomposition procedure based on loess.Journal of Official 5 Statistics,6(1), 3–33

  4. [4]

    A., & Fuller, W

    Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root.Journal of the American Statistical Association,74(366a), 427–431

  5. [5]

    Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation.Econometrica,50(4), 987–1007

  6. [6]

    R., Millman, K

    Harris, C. R., Millman, K. J., Walt, S. J. van der, & others. (2020). Array programming with NumPy.Nature,585(7825), 357–362. https://doi.org/10 .1038/s41586-020-2649-2

  7. [7]

    Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root.Journal of Econometrics,54(1-3), 159–178

  8. [8]

    Levene, H. (1960). Robust tests for equality of variances. In I. Olkin (Ed.), Contributions to probability and statistics(pp. 278–292). Stanford University Press

  9. [9]

    McKinney, W. (2010). Data structures for statistical computing in python. Proceedings of the 9th Python in Science Conference,445, 56–61. https: //doi.org/10.25080/Majora-92bf1922-00a

  10. [10]

    Phillips, P. C. B., & Perron, P. (1988). Testing for a unit root in time series regression.Biometrika,75(2), 335–346

  11. [11]

    Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical modeling with python.9th Python in Science Conference

  12. [12]

    (2017).Arch: ARCH models in python

    Sheppard, K. (2017).Arch: ARCH models in python. Zenodo. https://doi.org/ 10.5281/zenodo.593254

  13. [13]

    Smith, T. G. (2015).Pmdarima(Version 2.1.1). https://github.com/alkaline- ml/pmdarima

  14. [14]

    E., et al

    Virtanen, P., Gommers, R., Oliphant, T. E., & others. (2020). SciPy 1.0: Fundamental algorithms for scientific computing in python.Nature Methods, 17(3), 261–272. https://doi.org/10.1038/s41592-019-0686-2

  15. [15]

    Zivot, E., & Andrews, D. W. K. (1992). Further evidence on the great crash, the oil-price shock, and the unit-root hypothesis.Journal of Business & Economic Statistics,10(3), 251–270. 6