pith. sign in

arxiv: 2411.07874 · v2 · submitted 2024-11-12 · 📊 stat.ME · math.ST· stat.TH

Changepoint Detection in Complex Models: Cross-Fitting Is Needed

Pith reviewed 2026-05-23 17:48 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH
keywords changepoint detectioncross-fittingout-of-sample losscomplex modelsconsistencyover-adaptivity bias
0
0 comments X

The pith

Changepoint detection in complex models requires cross-fitting with out-of-sample losses to avoid over-adaptivity bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that minimizing in-sample losses for changepoint detection leads to inaccurate estimates when models are flexible or high-dimensional, because the fitting process over-adapts to the data. It introduces a cross-fitting approach that uses out-of-sample loss evaluations to separate model fitting from the search for changepoints. A general theory shows this yields consistent estimation under mild conditions, and the key result is that success hinges mainly on the models' ability to predict well on nearly homogeneous segments rather than on the search procedure itself. The method also extends to time-dependent data, and experiments confirm better performance in complex settings.

Core claim

Standard in-sample loss minimization for changepoint detection produces inconsistent estimates in complex models due to over-adaptivity biases; a cross-fitting procedure based on out-of-sample losses decouples fitting from search and delivers consistent changepoint estimates whenever the underlying models achieve sufficient predictive accuracy on nearly homogeneous segments.

What carries the argument

Cross-fitting methodology that evaluates out-of-sample losses to decouple model fitting from changepoint search.

If this is right

  • Consistent changepoint estimation holds under mild conditions on the models' predictive accuracy.
  • The approach extends directly to temporally dependent data.
  • Numerical experiments show substantially improved reliability in high-dimensional and hyperparameter-tuned settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The result suggests that many existing changepoint methods could be repaired by swapping their loss evaluation step for an out-of-sample version without altering the search algorithm.
  • It raises the possibility that predictive accuracy on homogeneous segments is a more useful diagnostic than traditional model complexity penalties when designing new detection procedures.

Load-bearing premise

The models achieve sufficient predictive accuracy on nearly homogeneous segments.

What would settle it

Demonstrate a complex model where out-of-sample cross-fitting still yields inconsistent changepoint locations while in-sample fitting succeeds.

Figures

Figures reproduced from arXiv: 2411.07874 by Changliang Zou, Chengde Qian, Guanghui Wang, Zhaojun Wang.

Figure 1
Figure 1. Figure 1: Curve of the total in-sample and out-of-sample losses plotted against the changepoint location, for cross-validated lasso and ridgeless regression in high-dimensional linear models (see Section 3). Specific model settings are provided in Section 4.1.1 and Section S.5 in Supplementary Material, with the true changepoint located at 150. However, we observe that the utilization of flexible statistical and mac… view at source ↗
Figure 2
Figure 2. Figure 2: Influence of the tuning parameter on changepoint detection accuracy in high-dimensional linear models. riority is more pronounced in scenarios with larger changes, where heavily overfitting lasso estimators, despite producing many false positives, can still predict well. This observation aligns with the insights from Proposition 1. Dotted curves in [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Boxplot of empirical Hausdorff distances for various in-sample and cross-fitting methods in high-dimensional linear models with a single changepoint. a universal regularizer across all segments (cf-ho). This comparison is conducted under two distinct data generating processes (DGPs). Both setups involve a configuration of (n, p, K∗ ) = (1000, 1000, 3), with {τ ∗ k } 3 k=1 = {350, 500, 880}. The covariates … view at source ↗
Figure 4
Figure 4. Figure 4: Boxplot of empirical Hausdorff distances for various in-sample and cross-fitting methods in high-dimensional linear models with multiple changepoints [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Boxplot of empirical Hausdorff distances for various in-sample and cross-fitting methods in nonparametric changepoint models. We generate independent samples zi ∈ R p , i ∈ [n], with (n, p) = (1000, 20). We set K∗ = 3 changepoints at locations {τk} 3 k=1 = {350, 500, 880}. For i ∈ (350, 500] ∪ (880, 1000], zi ∼ N (0, Ip), with Ip being the identity matrix. For i ∈ (0, 350], zi ∼ N (0, Σ), where Σ has entri… view at source ↗
read the original abstract

Changepoint detection is commonly formulated by minimizing the sum of in-sample losses to quantify the model's overall fit. However, for flexible modeling procedures -- especially those involving high-dimensional parameter spaces or hyperparameter tuning -- this strategy can lead to inaccurate changepoint estimation due to over-adaptivity biases. To mitigate this issue, we propose a novel cross-fitting methodology based on out-of-sample loss evaluations, which decouples model fitting from changepoint search. We establish a general theoretical framework for consistent changepoint estimation under mild conditions, and further extend it to temporally dependent data. A key implication of the theory is that consistency depends primarily on the models' predictive accuracy over nearly homogeneous segments. Numerical experiments show that the proposed method substantially improves the reliability and adaptability of changepoint detection in complex scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that minimizing in-sample losses for changepoint detection leads to inaccurate estimates in complex models due to over-adaptivity biases from high-dimensional parameters or hyperparameter tuning. It proposes a cross-fitting methodology based on out-of-sample loss evaluations to decouple model fitting from changepoint search. A general theoretical framework is established for consistent changepoint estimation under mild conditions, extended to temporally dependent data, with the key implication that consistency depends primarily on the models' predictive accuracy over nearly homogeneous segments. Numerical experiments demonstrate improved reliability and adaptability.

Significance. If the theoretical framework holds with the stated generality, the work would advance changepoint detection methodology for flexible and high-dimensional models by addressing a recognized source of bias. The focus on out-of-sample evaluation connects to established practices in statistical learning and could support more reliable inference in applications with complex or dependent data. The experiments provide supporting evidence of practical gains, though the strength depends on the rigor of the consistency results.

major comments (2)
  1. [Theoretical framework] Theoretical framework section: the manuscript asserts a general framework for consistent changepoint estimation under mild conditions, but does not enumerate the conditions or state the precise form of the consistency result (e.g., convergence in probability at a specific rate). This prevents verification that cross-fitting via out-of-sample losses indeed decouples fitting from search as claimed.
  2. [Extension to dependent data] Extension to dependent data: the claim that the framework extends to temporally dependent data is made, but without specifying adaptations to the out-of-sample loss or additional assumptions required for the consistency result, the broad applicability cannot be assessed.
minor comments (2)
  1. [Abstract] Abstract: the description of numerical experiments does not specify the complex models tested or the quantitative metrics for reliability, which would clarify the adaptability claim.
  2. Notation for the loss function and cross-fitting splits is introduced late; defining these earlier would improve readability of the methodology.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major point below and will revise the manuscript to improve clarity on the theoretical results.

read point-by-point responses
  1. Referee: [Theoretical framework] Theoretical framework section: the manuscript asserts a general framework for consistent changepoint estimation under mild conditions, but does not enumerate the conditions or state the precise form of the consistency result (e.g., convergence in probability at a specific rate). This prevents verification that cross-fitting via out-of-sample losses indeed decouples fitting from search as claimed.

    Authors: We agree that the current version does not provide a fully enumerated statement of the conditions or the precise consistency result. In the revision we will add an explicit theorem that lists the mild conditions (including requirements on model predictive accuracy over nearly homogeneous segments and properties of the loss) and states the form of consistency (e.g., convergence in probability of the estimated changepoints). This will make the decoupling property of cross-fitting verifiable. revision: yes

  2. Referee: [Extension to dependent data] Extension to dependent data: the claim that the framework extends to temporally dependent data is made, but without specifying adaptations to the out-of-sample loss or additional assumptions required for the consistency result, the broad applicability cannot be assessed.

    Authors: We acknowledge the need for greater detail on the dependent-data extension. The revised manuscript will specify the required adaptations to the out-of-sample loss (such as block-based cross-fitting) together with the additional assumptions (e.g., appropriate mixing or weak-dependence conditions) under which the consistency result continues to hold. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes cross-fitting via out-of-sample losses to decouple fitting from changepoint search and derives consistency under mild conditions, with the key implication (consistency depends on predictive accuracy over nearly homogeneous segments) explicitly framed as a consequence of the framework rather than an input or fitted parameter. No equations or steps reduce by construction to self-defined quantities, fitted inputs renamed as predictions, or load-bearing self-citations; the derivation chain remains independent of the target result and aligns with standard bias-variance considerations without internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that out-of-sample predictive accuracy over homogeneous segments suffices for consistency; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Models possess sufficient predictive accuracy over nearly homogeneous segments to support consistent changepoint estimation
    Explicitly identified in the abstract as the key implication of the theory.

pith-pipeline@v0.9.0 · 5670 in / 1137 out tokens · 23128 ms · 2026-05-23T17:48:51.710671+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 2 internal anchors

  1. [1]

    and Harchaoui, Z

    Arlot, S., Celisse, A. and Harchaoui, Z. (2019) A kernel multiple change-point algorithm via model selection. J. Mach. Learn. Res., 20, (162):1--56

  2. [2]

    and Lee, T

    Aue, A. and Lee, T. C. M. (2011) On image segmentation using information theoretic criteria. Ann. Statist., 39, 2912--2935

  3. [3]

    Auger, I. E. and Lawrence, C. E. (1989) Algorithms for the optimal identification of segment neighborhoods. Bull. Math. Biol., 51, 39--54

  4. [4]

    and Perron, P

    Bai, J. and Perron, P. (1998) Estimating and testing linear models with multiple structural changes. Econometrica, 66, 47--78

  5. [5]

    and Michailidis, G

    Bai, P., Safikhani, A. and Michailidis, G. (2023) Multiple change point detection in reduced rank high dimensional vector autoregressive models. J. Amer. Statist. Assoc., 118, 2776--2792

  6. [6]

    L., Long, P

    Bartlett, P. L., Long, P. M., Lugosi, G. and Tsigler, A. (2020) Benign overfitting in linear regression. Proc. Natl. Acad. Sci. USA, 117, 30063--30070

  7. [7]

    and Mandal, S

    Belkin, M., Hsu, D., Ma, S. and Mandal, S. (2019) Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. USA, 116, 15849--15854

  8. [8]

    V., Braun, R

    Braun, J. V., Braun, R. K. and M\" u ller, H.-G. (2000) Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika, 87, 301--314

  9. [9]

    and Zou, C

    Chen, H., Ren, H., Yao, F. and Zou, C. (2023) Data-driven selection of the number of change-points via error rate control. J. Amer. Statist. Assoc., 118, 1415--1428

  10. [10]

    and Robins, J

    Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018) Double/debiased machine learning for treatment and structural parameters. Econom. J., 21, C1--C68

  11. [11]

    and Chernozhukov, V

    Chetverikov, D., Liao, Z. and Chernozhukov, V. (2021) On cross-validated L asso in high dimensions. Ann. Statist., 49, 1300--1317

  12. [12]

    and Owens, D

    Cho, H. and Owens, D. (2024) High-dimensional data segmentation in regression settings permitting temporal dependence and non- G aussianity. Electron. J. Stat., 18, 2620--2664

  13. [13]

    and Horv\' a th, L

    Cs\" o rg o , M. and Horv\' a th, L. (1997) Limit theorems in change-point analysis. John Wiley & Sons

  14. [14]

    and Yang, Q

    Dette, H., Pan, G. and Yang, Q. (2022) Estimating a change point in a sequence of very high-dimensional covariance matrices. J. Amer. Statist. Assoc., 117, 444--454

  15. [15]

    and Kirch, C

    Eichinger, B. and Kirch, C. (2018) A MOSUM procedure for the estimation of multiple random change points. Bernoulli, 24, 526--564

  16. [16]

    and Hao, N

    Fan, J., Guo, S. and Hao, N. (2012) Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J. R. Stat. Soc. Ser. B. Stat. Methodol., 74, 37--65

  17. [17]

    and Lv, J

    Fan, J. and Lv, J. (2008) Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol., 70, 849--911

  18. [18]

    (2014) Wild binary segmentation for multiple change-point detection

    Fryzlewicz, P. (2014) Wild binary segmentation for multiple change-point detection. Ann. Statist., 42, 2243--2281

  19. [19]

    --- (2024) Narrowest significance pursuit: inference for multiple change-points in linear models. J. Amer. Statist. Assoc., 119, 1633--1646

  20. [20]

    and Arlot, S

    Garreau, D. and Arlot, S. (2018) Consistent change-point detection with kernels. Electron. J. Stat., 12, 4440--4486

  21. [21]

    M., Rasch, M

    Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch\" o lkopf, B. and Smola, A. (2012) A kernel two-sample test. J. Mach. Learn. Res., 13, 723--773

  22. [22]

    Hao, N., Niu, Y. S. and Zhang, H. (2013) Multiple change-point detection via a screening and ranking algorithm. Statist. Sinica, 23, 1553--1572

  23. [23]

    and L\' e vy-Leduc, C

    Harchaoui, Z. and L\' e vy-Leduc, C. (2010) Multiple change-point estimation with a total variation penalty. J. Amer. Statist. Assoc., 105, 1480--1493

  24. [24]

    and Tibshirani, R

    Hastie, T., Montanari, A., Rosset, S. and Tibshirani, R. J. (2022) Surprises in high-dimensional ridgeless least squares interpolation. Ann. Statist., 50, 949--986

  25. [25]

    D., Barnes, D., Arabhi, S., Alt, A., Gioumousis, P., Gwin, E., Sangtrakulcharoen, P., Tan, L

    Jackson, B., Scargle, J. D., Barnes, D., Arabhi, S., Alt, A., Gioumousis, P., Gwin, E., Sangtrakulcharoen, P., Tan, L. and Tsai, T. T. (2005) An algorithm for optimal partitioning of data on an interval. IEEE Signal Proc. Let., 12, 105--108

  26. [26]

    and Witten, D

    Jewell, S., Fearnhead, P. and Witten, D. (2022) Testing for a change in mean after changepoint detection. J. R. Stat. Soc. Ser. B. Stat. Methodol., 84, 1082--1104

  27. [27]

    Kaul, A., Jandhyala, V. K. and Fotopoulos, S. B. (2019) An efficient two step algorithm for high dimensional change point regression models without grid search. J. Mach. Learn. Res., 20, (111):1--40

  28. [28]

    and Eckley, I

    Killick, R., Fearnhead, P. and Eckley, I. A. (2012) Optimal detection of changepoints with a linear computational cost. J. Amer. Statist. Assoc., 107, 1590--1598

  29. [29]

    and Munk, A

    Kov\' a cs, S., B\" u hlmann, P., Li, H. and Munk, A. (2023) Seeded binary segmentation: a general methodology for fast and optimal changepoint detection. Biometrika, 110, 249--256

  30. [30]

    and B \"u hlmann , P

    Kov \'a cs , S., Li , H., Haubner , L., Munk , A. and B \"u hlmann , P. (2020) Optimistic search: Change point estimation for large-scale data via adaptive logarithmic queries. arXiv preprint, arXiv:2010.10194

  31. [31]

    Lee, S., Liao, Y., Seo, M. H. and Shin, Y. (2018) Oracle estimation of a change point in high-dimensional quantile regression. J. Amer. Statist. Assoc., 113, 1184--1194

  32. [32]

    Lee, S., Seo, M. H. and Shin, Y. (2016) The lasso for high dimensional regression with a possible change point. J. R. Stat. Soc. Ser. B. Stat. Methodol., 78, 193--210

  33. [33]

    Computationally efficient change point detection for high-dimensional regression

    Leonardi, F. and B \"u hlmann, P. (2016) Computationally efficient change point detection for high-dimensional regression. arXiv preprint, arXiv:1601.03704

  34. [34]

    and Liu, Y

    Liu, B., Zhang, X. and Liu, Y. (2021) Simultaneous change point inference and structure recovery for high dimensional G aussian graphical models. J. Mach. Learn. Res., 22, (274):1--62

  35. [35]

    and Kov\' a cs, S

    Londschien, M., B\" u hlmann, P. and Kov\' a cs, S. (2023) Random forests for change point detection. J. Mach. Learn. Res., 24, (216):1--45

  36. [36]

    and B\" u hlmann, P

    Londschien, M., Kov\' a cs, S. and B\" u hlmann, P. (2021) Change-point detection for graphical models in the presence of missing values. J. Comput. Graph. Statist., 30, 768--779

  37. [37]

    Matteson, D. S. and James, N. A. (2014) A nonparametric approach for multiple change point analysis of multivariate data. J. Amer. Statist. Assoc., 109, 334--345

  38. [38]

    and Shah, R

    Pein, F. and Shah, R. D. (2025) Cross-validation for change-point regression: P itfalls and solutions. Bernoulli, 31, 388--411

  39. [39]

    Reliever: Relieving the Burden of Costly Model Fits for Changepoint Detection

    Qian, C., Wang, G. and Zou, C. (2023) Reliever: Relieving the burden of costly model fits for changepoint detection. arXiv preprint arXiv:2307.01150

  40. [40]

    and Yu, Y

    Rinaldo, A., Wang, D., Wen, Q., Willett, R. and Yu, Y. (2021) Localizing changes in high-dimensional regression models. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, vol. 130, 2089--2097. PMLR

  41. [41]

    and Zou, C

    Shi, L., Wang, G. and Zou, C. (2024) Low-rank matrix estimation in the presence of change-points. J. Mach. Learn. Res., 25, (220):1--71

  42. [42]

    and Vayatis, N

    Truong, C., Oudre, L. and Vayatis, N. (2020) Selective review of offline change point detection methods. Signal Processing, 167, 107299

  43. [43]

    (2018) High-dimensional probability

    Vershynin, R. (2018) High-dimensional probability. Cambridge University Press, Cambridge

  44. [44]

    Wang, D., Zhao, Z., Lin, K. Z. and Willett, R. (2021) Statistically and computationally efficient change point localization in regression settings. J. Mach. Learn. Res., 22, (248):1--46

  45. [45]

    and Samworth, R

    Wang, T. and Samworth, R. J. (2018) High dimensional change point estimation via sparse projection. J. R. Stat. Soc. Ser. B. Stat. Methodol., 80, 57--83

  46. [46]

    and Liu, Y

    Wang, X., Liu, B., Zhang, X. and Liu, Y. (2024) Efficient Multiple Change Point Detection and Localization For High-Dimensional Quantile Regression with Heteroscedasticity . J. Amer. Statist. Assoc., To appear

  47. [47]

    D., Gilbert, P

    Williamson, B. D., Gilbert, P. B., Simon, N. R. and Carone, M. (2023) A general framework for inference on algorithm-agnostic variable importance. J. Amer. Statist. Assoc., 118, 1645--1658

  48. [48]

    and Yu, Y

    Xu, H., Wang, D., Zhao, Z. and Yu, Y. (2024) Change-point inference in high-dimensional regression models under temporal dependence. Ann. Statist., 52, 999--1026

  49. [49]

    (1988) Estimating the number of change-points via S chwarz' criterion

    Yao, Y.-C. (1988) Estimating the number of change-points via S chwarz' criterion. Statist. Probab. Lett., 6, 181--189

  50. [50]

    and Vinyals, O

    Zhang, C., Bengio, S., Hardt, M., Recht, B. and Vinyals, O. (2021) Understanding deep learning (still) requires rethinking generalization. Commun. ACM, 64, 107--115

  51. [51]

    and Bradic, J

    Zhang, Y. and Bradic, J. (2022) High-dimensional semi-supervised learning: in search of optimal inference of the mean. Biometrika, 109, 387--403

  52. [52]

    and Li, R

    Zou, C., Wang, G. and Li, R. (2020) Consistent selection of the number of change-points via sample-splitting. Ann. Statist., 48, 413--439

  53. [53]

    and Wang, Z

    Zou, C., Yin, G., Feng, L. and Wang, Z. (2014) Nonparametric maximum likelihood approach to multiple change-point problems. Ann. Statist., 42, 970--1002

  54. [54]

    and Cand\`es, E

    Zrnic, T. and Cand\`es, E. J. (2024) Cross-prediction-powered inference. Proc. Natl. Acad. Sci. USA, 121, (15):1--12

  55. [55]

    , " * write output.state after.block = add.period write newline

    ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := ...

  56. [56]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...