pith. sign in

arxiv: 2107.01742 · v2 · submitted 2021-07-04 · 📊 stat.ME · stat.CO

Nonparametric Detection of Multiple Location-Scale Change Points via Wild Binary Segmentation

Pith reviewed 2026-05-24 12:38 UTC · model grok-4.3

classification 📊 stat.ME stat.CO
keywords nonparametric change point detectionwild binary segmentationLepage statisticlocation-scale changesdistribution-free testrank-based methodsMonte Carlo threshold calibration
0
0 comments X

The pith

WBS-Lepage detects multiple location and scale changes in unknown distributions by combining wild binary segmentation with a rank statistic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes the WBS-Lepage procedure, which applies wild binary segmentation to a Lepage statistic built from Mann-Whitney and Mood rank tests. Because the statistic uses only ranks, its null distribution does not depend on the unknown data distribution, so finite-sample thresholds can be set by Monte Carlo simulation to control the probability of false detections. The method targets sequences where the changes of interest are shifts in location, scale, or both, and it is compared against penalised likelihood and other binary segmentation approaches.

Core claim

The WBS-Lepage procedure combines wild binary segmentation with a rank-based Lepage statistic formed from Mann-Whitney and Mood components to detect multiple change points without specifying a parametric model for the data; the resulting statistic depends on the observations only through their ranks, so its null distribution is distribution-free and finite-sample thresholds can be calibrated by Monte Carlo simulation to control the probability of falsely detecting change points when none exist.

What carries the argument

The Lepage statistic (Mann-Whitney plus Mood rank components) inside wild binary segmentation, which produces a distribution-free test whose thresholds are obtained by simulation.

If this is right

  • The procedure performs competitively with existing nonparametric methods when changes are only in location.
  • It is particularly effective at detecting changes that affect scale.
  • It can be applied directly to stylometric problems such as detecting shifts in an author's writing style.
  • An R package npwbs implements the full procedure for practical use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The distribution-free property could support exact finite-sample control in very small data sets where asymptotic approximations fail.
  • Replacing the Lepage statistic with other rank-based tests might extend the method to detect changes in shape or other features without losing the simulation-based calibration.
  • The same segmentation-plus-simulation structure could be tested on multivariate sequences or on data with dependence that violates the implicit independence assumption.

Load-bearing premise

The changes of interest are limited to shifts in location, scale, or both, and the Lepage rank statistic remains sensitive enough to detect them under the unknown distribution.

What would settle it

A simulation in which data contain known location or scale shifts drawn from a non-normal distribution, yet the procedure either exceeds its nominal false-positive rate or misses a substantial fraction of the planted changes.

Figures

Figures reproduced from arXiv: 2107.01742 by Gordon J. Ross.

Figure 1
Figure 1. Figure 1: Example of a sequence with 3 change points (dotted lines) at locations [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Values of the threshold γ(n) required to give a false positive probability of 0.05 (red line) and 0.01 (black line) for WBS using the Lepage statistic. The same procedure was then carried our for each n ∈ {105, 110, . . . , 995, 1000}, for each n ∈ {1100, 1200, . . . , 5000}, then for each n ∈ {6000, 7000, . . . , 10000}. Linear interpolation based on these values was then used to produce thresholds for ev… view at source ↗
Figure 3
Figure 3. Figure 3: Plots showing a typical sequence simulated from each of the 15 data models. Each row [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: List of the 200 most common words in the Discword corpus of 41 novels. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The left hand plot shows the 41 Discworld books projected onto the second principal component of the corpus. The right hand plot shows the same data with the two detected change￾points superimposed as vertical lines. year 2007, this provides some evidence that the change in writing style may indeed be connected to the diagnosis. These results are consistent with the findings of [Ross, 2020] who detected a … view at source ↗
read the original abstract

Change point methods are used to divide a sequence of observations into segments with different behaviour. Often, the distributional form of the observations is unknown, but the changes of interest are likely to involve shifts in location, scale, or both. We consider the problem of detecting multiple change points in a sequence without specifying a parametric model for the data. We propose the WBS-Lepage procedure, a nonparametric method which combines wild binary segmentation with a rank-based Lepage statistic. The statistic is formed from Mann--Whitney and Mood components, which are respectively sensitive to changes in location and scale. Since it depends on the observations only through their ranks, its null distribution is distribution-free. This allows finite-sample thresholds to be calibrated by Monte Carlo simulation, providing direct control over the probability of falsely detecting change points when none exist. We compare WBS-Lepage with existing nonparametric change point methods, including penalised likelihood and binary-segmentation-based competitors. The proposed method performs competitively for location changes and is particularly effective for detecting changes in scale. We illustrate the procedure on a stylometric analysis of changes in an author's writing style and provide an implementation of our method in the accompanying R package npwbs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes the WBS-Lepage procedure for nonparametric detection of multiple change points in location and/or scale. It integrates wild binary segmentation with a rank-based Lepage statistic (combining Mann-Whitney and Mood components). Because the procedure depends on the data only through ranks, the null distribution under i.i.d. continuous observations is distribution-free, permitting Monte Carlo calibration of thresholds for finite-sample control of the probability of false detections. The paper reports simulation comparisons with penalised-likelihood and binary-segmentation competitors, an application to stylometric data, and supplies an R package npwbs.

Significance. If the reported performance holds, the work supplies a practical nonparametric tool with exact finite-sample type-I control via Monte Carlo simulation under the global null. The distribution-free property, the accompanying reproducible R package, and the competitive results for scale changes are explicit strengths that advance the use of rank statistics in multiple-change-point settings.

minor comments (3)
  1. [Abstract] Abstract: the statement that the method 'is particularly effective for detecting changes in scale' should include a parenthetical reference to the specific simulation configuration (e.g., the scale-change rows of Table 2) that supports the claim.
  2. [§3.3] §3.3: the description of the Monte Carlo threshold calibration does not state the number of Monte Carlo replicates used; this value should be reported explicitly so that the reported thresholds can be reproduced.
  3. [Figure 3] Figure 3: the vertical axis label for the stylometric example is missing the quantity plotted (e.g., cumulative Lepage statistic or segment-wise p-value).

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments appear in the provided report, so we have no individual points requiring rebuttal or revision at this stage.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper defines WBS-Lepage by combining the pre-existing wild binary segmentation procedure with the standard rank-based Lepage statistic (Mann-Whitney + Mood components). The distribution-free null property is a direct, well-known consequence of depending only on ranks under i.i.d. continuous observations, so Monte Carlo threshold calibration under the global null is valid without any data-dependent fitting or self-referential equations. No load-bearing self-citations, no fitted parameters renamed as predictions, and no ansatz or uniqueness claims imported from the authors' prior work appear in the derivation chain. The central claim therefore stands on independent statistical facts rather than reducing to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard property that rank statistics have distribution-free null distributions and on the algorithmic correctness of wild binary segmentation; no new free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption The Lepage statistic formed from Mann-Whitney and Mood components has a distribution-free null distribution under the hypothesis of no change points.
    Explicitly invoked in the abstract to justify Monte Carlo calibration of thresholds.
  • domain assumption Wild binary segmentation correctly identifies multiple change points when supplied with a suitable test statistic.
    The procedure inherits this from the cited WBS literature.

pith-pipeline@v0.9.0 · 5734 in / 1461 out tokens · 36166 ms · 2026-05-24T12:38:37.196164+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

  1. [1]

    Argamon and S

    S. Argamon and S. Levitan. Measuring the Usefulness of Function Words for Authorship Attribution . Proceedings of the 2005 ACH/ALLC Conference,, 2005

  2. [2]

    J. Bai. Estimating Multiple Breaks One at a Time . Econometric Theory, 13 0 (3): 0 315--352, 1997

  3. [3]

    Barry and J

    D. Barry and J. Hartigan. A Bayesian Analysis for Change Point Problems . Journal of the American Statistical Association, 88: 0 309--319, 1993

  4. [4]

    Basseville and I

    M. Basseville and I. V. Nikiforov. Detection of Abrupt Change Theory and Application . Prentice Hall, 1993

  5. [5]

    Obituary: Sir Terry Pratchett [ Online ]

    BBC News . Obituary: Sir Terry Pratchett [ Online ]. Available from http://www.bbc.co.uk/news/entertainment-arts-25401679 . 2015

  6. [6]

    B. E. Brodsky and B. S. Darkhovsky. Nonparametric Methods in Change - Point Problems . Springer Netherlands, 1993

  7. [7]

    Carlstein

    E. Carlstein. Nonparametric Change - Point Estimation . Annals of Statistics, 16 0 (1): 0 188--197, 1988

  8. [8]

    H. Chen. Sequential change-point detection based on nearest neighbors. The Annals of Statistics, 47 0 (3): 0 1381--1407, 2019

  9. [9]

    S. Chib. Estimation and comparison of multiple change-point models. Journal of Econometrics, 86 0 (2): 0 221--241, 1998. ISSN 0304-4076

  10. [10]

    Cho and P

    H. Cho and P. Fryzlewicz. Multiple-change-point detection for high dimensional time series via sparsified binary segmentation. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 77 0 (2): 0 475--507, 2015

  11. [11]

    Davies, C

    L. Davies, C. Höhenrieder, and W. Krämer. Recursive computation of piecewise constant volatilities. Computational Statistics & Data Analysis, 56 0 (11): 0 3623--3631, 2012

  12. [12]

    B. Duran. Survey of Nonparametric Tests for Scale . Communications in Statistics - Theory and Methods, 5 0 (14): 0 1287--1312, 1976

  13. [13]

    Fearnhead

    P. Fearnhead. Exact and efficient Bayesian inference for multiple changepoint problems. Statistics and Computing, 16: 0 203--213, 2006

  14. [14]

    Frick, A

    K. Frick, A. Munk, and H. Sieling. Multiscale change point inference. Journal of the Royal Statistical Society Series B, 76 0 (3): 0 495--580, 2014

  15. [15]

    Fryzlewicz

    P. Fryzlewicz. Wild binary segmentation for multiple change-point detection. The Annals of Statistics, 42 0 (6): 0 2243--2281, Dec. 2014

  16. [16]

    P. Green. Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination . Biometrika, 84 0 (2): 0 711--732, 1995

  17. [17]

    D. M. Hawkins. Fitting multiple change-point models to data. Computational Statistics & Data Analysis, 37: 0 323--341, 2001

  18. [18]

    D. M. Hawkins and Q. Deng. A Nonparametric Change - Point Control Chart . Journal of Quality Technology, 42 0 (2): 0 165--173, 2010

  19. [19]

    D. M. Hawkins, P. H. Qiu, and C. W. Kang. The Changepoint Model for Statistical Process Control . Journal of Quality Technology, 35 0 (4): 0 355--366, 2003

  20. [20]

    Haynes, P

    K. Haynes, P. Fearnhead, and I. A. Eckley. A computationally efficient nonparametric approach for changepoint detection. Statistics and Computing, 27 0 (5): 0 1293--1305, 2017

  21. [21]

    Haynes, R

    K. Haynes, R. Killick, P. Fearnhead, I. Eckley, and D. Grose. changepoint.np: Methods for Nonparametric Changepoint Detection , 2021. URL https://CRAN.R-project.org/package=changepoint.np

  22. [22]

    Inclan and G

    C. Inclan and G. C. Tiao. Use of Cumulative Sums of Squares for Retrospective Detection of Changes of Variance . Journal of the American Statistical Association, 89 0 (427): 0 913--923, 1994

  23. [23]

    Jackson, J

    B. Jackson, J. Scargle, D. Barnes, S. Arabhi, A. Alt, P. Gioumousis, E. Gwin, P. Sangtrakulcharoen, L. Tan, and T. T. Tsai. An algorithm for optimal partitioning of data on an interval. IEEE Signal Processing Letters, 12 0 (2): 0 105--108, 2005

  24. [24]

    N. A. James and D. S. Matteson. ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data . Journal of Statistical Software, 62 0 (1): 0 1--25, 2015

  25. [25]

    Killick, P

    R. Killick, P. Fearnhead, and I. A. Eckley. Optimal Detection of Changepoints With a Linear Computational Cost . Journal of the American Statistical Association, 107 0 (500): 0 1590--1598, 2012

  26. [26]

    T. L. Lai. Sequential Changepoint Detection in Quality Control and Dynamical Systems . Journal of the Royal Statistical Society. Series B - Methodological, 57 0 (4): 0 613--658, 1995

  27. [27]

    Lavielle

    M. Lavielle. Using penalized contrasts for the change-point problem. Signal Processing, 85 0 (8): 0 1501--1510, 2005

  28. [28]

    Y. Lepage. Combination of Wilcoxian 's and Ansari - Bradley 's Statistics . Biometrika, 58 0 (1): 0 213--217, 1971

  29. [29]

    H. B. Mann and D. R. Whitney. On a Test of Whether One of Two Random Variables is Stochastically Larger than the Other . Annals of Mathematical Statistics, 18 0 (1): 0 50--60, 1947

  30. [30]

    D. S. Matteson and N. A. James. A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data . Journal of the American Statistical Association, 109 0 (505): 0 334--345, 2014

  31. [31]

    A. Mood. On the Asymptotic Efficiency of Certain Nonparametric Two - Sample Tests . Annals of Mathematical Statistics, 25: 0 514--533, 1954

  32. [32]

    G. V. Moustakides. Optimal Stopping Times for Detecting Changes in Distributions . Annals of Statistics, 14 0 (4): 0 1379--1387, 1986

  33. [33]

    Y. S. Niu, N. Hao, and H. Zhang. Multiple Change - Point Detection : A Selective Overview . Statistical Science, 31 0 (4): 0 611--623, 2016

  34. [34]

    A. B. Olshen, E. S. Venkatraman, R. Lucito, and M. Wigler. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5 0 (4): 0 557--572, 2014

  35. [35]

    O. H. M. Padilla, A. Athey, A. Reinhart, and J. G. Scott. Sequential Nonparametric Tests for a Change in Distribution : An Application to Detecting Radiological Anomalies . Journal of the American Statistical Association, 114 0 (526): 0 514--528, 2019

  36. [36]

    O. H. M. Padilla, Y. Yu, D. Wang, and A. Rinaldo. Optimal nonparametric change point analysis. Electronic Journal of Statistics, 15 0 (1): 0 1154--1201, 2021

  37. [37]

    E. S. Page. Continuous Inspection Schemes . Biometrika, 41 0 (1/2): 0 100--115, 1954

  38. [38]

    A. N. Pettitt. A Non - Parametric Approach to the Change - Point Problem . Journal of the Royal Statistical Society Series C - Applied Statistics, 28 0 (2): 0 126--135, 1979

  39. [39]

    G. J. Ross. Tracking the evolution of literary style via Dirichlet –multinomial change point regression. Journal of the Royal Statistical Society: Series A (Statistics in Society), 183 0 (1): 0 149--167, 2020

  40. [40]

    G. J. Ross. npwbs: Nonparametric Multiple Change Point Detection Using Wild Binary Segmentation , 2021. URL https://cran.r-project.org/web/packages/npwbs/index.html

  41. [41]

    G. J. Ross, D. K. Tasoulis, and N. M. Adams. Nonparametric Monitoring of Data Streams for Changes in Location and Scale . Technometrics, 53 0 (4): 0 379--389, 2011

  42. [42]

    Truong, L

    C. Truong, L. Oudre, and N. Vayatis. Selective review of offline change point detection methods. Signal Processing, 167: 0 107299, 2020

  43. [43]

    Vostrikov

    L. Vostrikov. Detecting ' Disorder ' in Multidimensional Random Processes . Soviet Mathematics Doklady, 24: 0 55--59, 1981

  44. [44]

    C. Zou, G. Yin, L. Feng, and Z. Wang. Nonparametric maximum likelihood approach to multiple change-point problems. The Annals of Statistics, 42 0 (3): 0 970--1002, 2014