Changepoint Detection in Complex Models: Cross-Fitting Is Needed
Pith reviewed 2026-05-23 17:48 UTC · model grok-4.3
The pith
Changepoint detection in complex models requires cross-fitting with out-of-sample losses to avoid over-adaptivity bias.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Standard in-sample loss minimization for changepoint detection produces inconsistent estimates in complex models due to over-adaptivity biases; a cross-fitting procedure based on out-of-sample losses decouples fitting from search and delivers consistent changepoint estimates whenever the underlying models achieve sufficient predictive accuracy on nearly homogeneous segments.
What carries the argument
Cross-fitting methodology that evaluates out-of-sample losses to decouple model fitting from changepoint search.
If this is right
- Consistent changepoint estimation holds under mild conditions on the models' predictive accuracy.
- The approach extends directly to temporally dependent data.
- Numerical experiments show substantially improved reliability in high-dimensional and hyperparameter-tuned settings.
Where Pith is reading between the lines
- The result suggests that many existing changepoint methods could be repaired by swapping their loss evaluation step for an out-of-sample version without altering the search algorithm.
- It raises the possibility that predictive accuracy on homogeneous segments is a more useful diagnostic than traditional model complexity penalties when designing new detection procedures.
Load-bearing premise
The models achieve sufficient predictive accuracy on nearly homogeneous segments.
What would settle it
Demonstrate a complex model where out-of-sample cross-fitting still yields inconsistent changepoint locations while in-sample fitting succeeds.
Figures
read the original abstract
Changepoint detection is commonly formulated by minimizing the sum of in-sample losses to quantify the model's overall fit. However, for flexible modeling procedures -- especially those involving high-dimensional parameter spaces or hyperparameter tuning -- this strategy can lead to inaccurate changepoint estimation due to over-adaptivity biases. To mitigate this issue, we propose a novel cross-fitting methodology based on out-of-sample loss evaluations, which decouples model fitting from changepoint search. We establish a general theoretical framework for consistent changepoint estimation under mild conditions, and further extend it to temporally dependent data. A key implication of the theory is that consistency depends primarily on the models' predictive accuracy over nearly homogeneous segments. Numerical experiments show that the proposed method substantially improves the reliability and adaptability of changepoint detection in complex scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that minimizing in-sample losses for changepoint detection leads to inaccurate estimates in complex models due to over-adaptivity biases from high-dimensional parameters or hyperparameter tuning. It proposes a cross-fitting methodology based on out-of-sample loss evaluations to decouple model fitting from changepoint search. A general theoretical framework is established for consistent changepoint estimation under mild conditions, extended to temporally dependent data, with the key implication that consistency depends primarily on the models' predictive accuracy over nearly homogeneous segments. Numerical experiments demonstrate improved reliability and adaptability.
Significance. If the theoretical framework holds with the stated generality, the work would advance changepoint detection methodology for flexible and high-dimensional models by addressing a recognized source of bias. The focus on out-of-sample evaluation connects to established practices in statistical learning and could support more reliable inference in applications with complex or dependent data. The experiments provide supporting evidence of practical gains, though the strength depends on the rigor of the consistency results.
major comments (2)
- [Theoretical framework] Theoretical framework section: the manuscript asserts a general framework for consistent changepoint estimation under mild conditions, but does not enumerate the conditions or state the precise form of the consistency result (e.g., convergence in probability at a specific rate). This prevents verification that cross-fitting via out-of-sample losses indeed decouples fitting from search as claimed.
- [Extension to dependent data] Extension to dependent data: the claim that the framework extends to temporally dependent data is made, but without specifying adaptations to the out-of-sample loss or additional assumptions required for the consistency result, the broad applicability cannot be assessed.
minor comments (2)
- [Abstract] Abstract: the description of numerical experiments does not specify the complex models tested or the quantitative metrics for reliability, which would clarify the adaptability claim.
- Notation for the loss function and cross-fitting splits is introduced late; defining these earlier would improve readability of the methodology.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address each major point below and will revise the manuscript to improve clarity on the theoretical results.
read point-by-point responses
-
Referee: [Theoretical framework] Theoretical framework section: the manuscript asserts a general framework for consistent changepoint estimation under mild conditions, but does not enumerate the conditions or state the precise form of the consistency result (e.g., convergence in probability at a specific rate). This prevents verification that cross-fitting via out-of-sample losses indeed decouples fitting from search as claimed.
Authors: We agree that the current version does not provide a fully enumerated statement of the conditions or the precise consistency result. In the revision we will add an explicit theorem that lists the mild conditions (including requirements on model predictive accuracy over nearly homogeneous segments and properties of the loss) and states the form of consistency (e.g., convergence in probability of the estimated changepoints). This will make the decoupling property of cross-fitting verifiable. revision: yes
-
Referee: [Extension to dependent data] Extension to dependent data: the claim that the framework extends to temporally dependent data is made, but without specifying adaptations to the out-of-sample loss or additional assumptions required for the consistency result, the broad applicability cannot be assessed.
Authors: We acknowledge the need for greater detail on the dependent-data extension. The revised manuscript will specify the required adaptations to the out-of-sample loss (such as block-based cross-fitting) together with the additional assumptions (e.g., appropriate mixing or weak-dependence conditions) under which the consistency result continues to hold. revision: yes
Circularity Check
No significant circularity
full rationale
The paper proposes cross-fitting via out-of-sample losses to decouple fitting from changepoint search and derives consistency under mild conditions, with the key implication (consistency depends on predictive accuracy over nearly homogeneous segments) explicitly framed as a consequence of the framework rather than an input or fitted parameter. No equations or steps reduce by construction to self-defined quantities, fitted inputs renamed as predictions, or load-bearing self-citations; the derivation chain remains independent of the target result and aligns with standard bias-variance considerations without internal reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Models possess sufficient predictive accuracy over nearly homogeneous segments to support consistent changepoint estimation
Reference graph
Works this paper leans on
-
[1]
Arlot, S., Celisse, A. and Harchaoui, Z. (2019) A kernel multiple change-point algorithm via model selection. J. Mach. Learn. Res., 20, (162):1--56
work page 2019
-
[2]
Aue, A. and Lee, T. C. M. (2011) On image segmentation using information theoretic criteria. Ann. Statist., 39, 2912--2935
work page 2011
-
[3]
Auger, I. E. and Lawrence, C. E. (1989) Algorithms for the optimal identification of segment neighborhoods. Bull. Math. Biol., 51, 39--54
work page 1989
-
[4]
Bai, J. and Perron, P. (1998) Estimating and testing linear models with multiple structural changes. Econometrica, 66, 47--78
work page 1998
-
[5]
Bai, P., Safikhani, A. and Michailidis, G. (2023) Multiple change point detection in reduced rank high dimensional vector autoregressive models. J. Amer. Statist. Assoc., 118, 2776--2792
work page 2023
-
[6]
Bartlett, P. L., Long, P. M., Lugosi, G. and Tsigler, A. (2020) Benign overfitting in linear regression. Proc. Natl. Acad. Sci. USA, 117, 30063--30070
work page 2020
-
[7]
Belkin, M., Hsu, D., Ma, S. and Mandal, S. (2019) Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. USA, 116, 15849--15854
work page 2019
-
[8]
Braun, J. V., Braun, R. K. and M\" u ller, H.-G. (2000) Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika, 87, 301--314
work page 2000
-
[9]
Chen, H., Ren, H., Yao, F. and Zou, C. (2023) Data-driven selection of the number of change-points via error rate control. J. Amer. Statist. Assoc., 118, 1415--1428
work page 2023
-
[10]
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018) Double/debiased machine learning for treatment and structural parameters. Econom. J., 21, C1--C68
work page 2018
-
[11]
Chetverikov, D., Liao, Z. and Chernozhukov, V. (2021) On cross-validated L asso in high dimensions. Ann. Statist., 49, 1300--1317
work page 2021
-
[12]
Cho, H. and Owens, D. (2024) High-dimensional data segmentation in regression settings permitting temporal dependence and non- G aussianity. Electron. J. Stat., 18, 2620--2664
work page 2024
-
[13]
Cs\" o rg o , M. and Horv\' a th, L. (1997) Limit theorems in change-point analysis. John Wiley & Sons
work page 1997
-
[14]
Dette, H., Pan, G. and Yang, Q. (2022) Estimating a change point in a sequence of very high-dimensional covariance matrices. J. Amer. Statist. Assoc., 117, 444--454
work page 2022
-
[15]
Eichinger, B. and Kirch, C. (2018) A MOSUM procedure for the estimation of multiple random change points. Bernoulli, 24, 526--564
work page 2018
-
[16]
Fan, J., Guo, S. and Hao, N. (2012) Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J. R. Stat. Soc. Ser. B. Stat. Methodol., 74, 37--65
work page 2012
- [17]
-
[18]
(2014) Wild binary segmentation for multiple change-point detection
Fryzlewicz, P. (2014) Wild binary segmentation for multiple change-point detection. Ann. Statist., 42, 2243--2281
work page 2014
-
[19]
--- (2024) Narrowest significance pursuit: inference for multiple change-points in linear models. J. Amer. Statist. Assoc., 119, 1633--1646
work page 2024
-
[20]
Garreau, D. and Arlot, S. (2018) Consistent change-point detection with kernels. Electron. J. Stat., 12, 4440--4486
work page 2018
-
[21]
Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch\" o lkopf, B. and Smola, A. (2012) A kernel two-sample test. J. Mach. Learn. Res., 13, 723--773
work page 2012
-
[22]
Hao, N., Niu, Y. S. and Zhang, H. (2013) Multiple change-point detection via a screening and ranking algorithm. Statist. Sinica, 23, 1553--1572
work page 2013
-
[23]
Harchaoui, Z. and L\' e vy-Leduc, C. (2010) Multiple change-point estimation with a total variation penalty. J. Amer. Statist. Assoc., 105, 1480--1493
work page 2010
-
[24]
Hastie, T., Montanari, A., Rosset, S. and Tibshirani, R. J. (2022) Surprises in high-dimensional ridgeless least squares interpolation. Ann. Statist., 50, 949--986
work page 2022
-
[25]
D., Barnes, D., Arabhi, S., Alt, A., Gioumousis, P., Gwin, E., Sangtrakulcharoen, P., Tan, L
Jackson, B., Scargle, J. D., Barnes, D., Arabhi, S., Alt, A., Gioumousis, P., Gwin, E., Sangtrakulcharoen, P., Tan, L. and Tsai, T. T. (2005) An algorithm for optimal partitioning of data on an interval. IEEE Signal Proc. Let., 12, 105--108
work page 2005
-
[26]
Jewell, S., Fearnhead, P. and Witten, D. (2022) Testing for a change in mean after changepoint detection. J. R. Stat. Soc. Ser. B. Stat. Methodol., 84, 1082--1104
work page 2022
-
[27]
Kaul, A., Jandhyala, V. K. and Fotopoulos, S. B. (2019) An efficient two step algorithm for high dimensional change point regression models without grid search. J. Mach. Learn. Res., 20, (111):1--40
work page 2019
-
[28]
Killick, R., Fearnhead, P. and Eckley, I. A. (2012) Optimal detection of changepoints with a linear computational cost. J. Amer. Statist. Assoc., 107, 1590--1598
work page 2012
-
[29]
Kov\' a cs, S., B\" u hlmann, P., Li, H. and Munk, A. (2023) Seeded binary segmentation: a general methodology for fast and optimal changepoint detection. Biometrika, 110, 249--256
work page 2023
-
[30]
Kov \'a cs , S., Li , H., Haubner , L., Munk , A. and B \"u hlmann , P. (2020) Optimistic search: Change point estimation for large-scale data via adaptive logarithmic queries. arXiv preprint, arXiv:2010.10194
-
[31]
Lee, S., Liao, Y., Seo, M. H. and Shin, Y. (2018) Oracle estimation of a change point in high-dimensional quantile regression. J. Amer. Statist. Assoc., 113, 1184--1194
work page 2018
-
[32]
Lee, S., Seo, M. H. and Shin, Y. (2016) The lasso for high dimensional regression with a possible change point. J. R. Stat. Soc. Ser. B. Stat. Methodol., 78, 193--210
work page 2016
-
[33]
Computationally efficient change point detection for high-dimensional regression
Leonardi, F. and B \"u hlmann, P. (2016) Computationally efficient change point detection for high-dimensional regression. arXiv preprint, arXiv:1601.03704
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[34]
Liu, B., Zhang, X. and Liu, Y. (2021) Simultaneous change point inference and structure recovery for high dimensional G aussian graphical models. J. Mach. Learn. Res., 22, (274):1--62
work page 2021
-
[35]
Londschien, M., B\" u hlmann, P. and Kov\' a cs, S. (2023) Random forests for change point detection. J. Mach. Learn. Res., 24, (216):1--45
work page 2023
-
[36]
Londschien, M., Kov\' a cs, S. and B\" u hlmann, P. (2021) Change-point detection for graphical models in the presence of missing values. J. Comput. Graph. Statist., 30, 768--779
work page 2021
-
[37]
Matteson, D. S. and James, N. A. (2014) A nonparametric approach for multiple change point analysis of multivariate data. J. Amer. Statist. Assoc., 109, 334--345
work page 2014
-
[38]
Pein, F. and Shah, R. D. (2025) Cross-validation for change-point regression: P itfalls and solutions. Bernoulli, 31, 388--411
work page 2025
-
[39]
Reliever: Relieving the Burden of Costly Model Fits for Changepoint Detection
Qian, C., Wang, G. and Zou, C. (2023) Reliever: Relieving the burden of costly model fits for changepoint detection. arXiv preprint arXiv:2307.01150
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [40]
-
[41]
Shi, L., Wang, G. and Zou, C. (2024) Low-rank matrix estimation in the presence of change-points. J. Mach. Learn. Res., 25, (220):1--71
work page 2024
-
[42]
Truong, C., Oudre, L. and Vayatis, N. (2020) Selective review of offline change point detection methods. Signal Processing, 167, 107299
work page 2020
-
[43]
(2018) High-dimensional probability
Vershynin, R. (2018) High-dimensional probability. Cambridge University Press, Cambridge
work page 2018
-
[44]
Wang, D., Zhao, Z., Lin, K. Z. and Willett, R. (2021) Statistically and computationally efficient change point localization in regression settings. J. Mach. Learn. Res., 22, (248):1--46
work page 2021
-
[45]
Wang, T. and Samworth, R. J. (2018) High dimensional change point estimation via sparse projection. J. R. Stat. Soc. Ser. B. Stat. Methodol., 80, 57--83
work page 2018
-
[46]
Wang, X., Liu, B., Zhang, X. and Liu, Y. (2024) Efficient Multiple Change Point Detection and Localization For High-Dimensional Quantile Regression with Heteroscedasticity . J. Amer. Statist. Assoc., To appear
work page 2024
-
[47]
Williamson, B. D., Gilbert, P. B., Simon, N. R. and Carone, M. (2023) A general framework for inference on algorithm-agnostic variable importance. J. Amer. Statist. Assoc., 118, 1645--1658
work page 2023
- [48]
-
[49]
(1988) Estimating the number of change-points via S chwarz' criterion
Yao, Y.-C. (1988) Estimating the number of change-points via S chwarz' criterion. Statist. Probab. Lett., 6, 181--189
work page 1988
-
[50]
Zhang, C., Bengio, S., Hardt, M., Recht, B. and Vinyals, O. (2021) Understanding deep learning (still) requires rethinking generalization. Commun. ACM, 64, 107--115
work page 2021
-
[51]
Zhang, Y. and Bradic, J. (2022) High-dimensional semi-supervised learning: in search of optimal inference of the mean. Biometrika, 109, 387--403
work page 2022
- [52]
-
[53]
Zou, C., Yin, G., Feng, L. and Wang, Z. (2014) Nonparametric maximum likelihood approach to multiple change-point problems. Ann. Statist., 42, 970--1002
work page 2014
-
[54]
Zrnic, T. and Cand\`es, E. J. (2024) Cross-prediction-powered inference. Proc. Natl. Acad. Sci. USA, 121, (15):1--12
work page 2024
-
[55]
, " * write output.state after.block = add.period write newline
ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := ...
-
[56]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.