pith. sign in

arxiv: 2606.18949 · v1 · pith:UB6TULYZnew · submitted 2026-06-17 · 📊 stat.ME

Feature Screening for High-Dimensional Structural Break Predictive Regression

Pith reviewed 2026-06-26 20:09 UTC · model grok-4.3

classification 📊 stat.ME
keywords feature screeningstructural breakspredictive regressionhigh-dimensional datachange point detectioncointegrationsure screening
0
0 comments X

The pith

A screening procedure selects sparse active predictors and change points in high-dimensional structural break predictive regressions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a three-step method to select and estimate active predictors and structural breaks when the number of predictors is large and the number of breaks can grow with sample size. It first applies Sure Independence Canonical Screening to isolate active predictors that may be stationary or cointegrated, then uses Ratio-Controlled Regression Screening to locate the breaks, and finally prunes extras with information criteria. The result is consistent selection and estimation of the true predictors and break locations. This matters for return predictability work, where many candidate variables and possible shifts in relationships are common.

Core claim

The procedure begins by identifying the active predictors using a Sure Independence Canonical Screening procedure, estimates the change points through a Ratio-Controlled Regression Screening method that allows their number to increase with the sample size, and reduces redundancy by eliminating unnecessary breakpoints and predictors using information criteria, allowing consistent estimation and selection of true breakpoints and active predictors that may be stationary or cointegrated.

What carries the argument

Sure Independence Canonical Screening (SICS) followed by Ratio-Controlled Regression Screening (RCRS) to identify sparse active predictors and change points.

If this is right

  • Consistent selection holds even when the number of change points grows with sample size.
  • Active predictors that are cointegrated can still be recovered correctly.
  • The information-criteria step removes redundant breakpoints and predictors after initial screening.
  • Simulations and empirical applications on return data show the steps work in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same screening steps could be tested in other high-dimensional time-series settings with multiple regimes.
  • Improved selection might translate to better forecasts when relationships shift over time.
  • Extensions could explore relaxing exact sparsity or adding nonlinear break forms.

Load-bearing premise

The Sure Independence Canonical Screening and Ratio-Controlled Regression Screening steps correctly identify the sparse active predictors and change points under the high-dimensional regime with possible cointegration.

What would settle it

A high-dimensional dataset with known true sparse predictors and known change points where the procedure fails to recover them consistently would falsify the consistency claim.

Figures

Figures reproduced from arXiv: 2606.18949 by Rongmao Zhang, Wenyang Zhang, Yang Zu, Zhenjie Qin.

Figure 1
Figure 1. Figure 1: Boxplots of Correlations under DGP 2 in Section 4, with [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Boxplot of RSS ratios under DGP3 in Section [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Flowchart of our procedure. and then refines the predictor set through elimination. This procedure can overcome the weakness of LASSO methods by Lee et al. (2022) and Mei and Shi (2024), which fails to select effective cointegrated predictors. We will also demonstrate this in the simulation experiment presented in Section 4.4. 3 ASYMPTOTIC THEORY In this section, we investigate the asymptotic properties of… view at source ↗
Figure 4
Figure 4. Figure 4: Left: Locations Frequency of estimated change points. Right: Frequency of [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Percentages of Correct Estimation (PCE) of [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: SICS–RCRS–IC procedure diagnostic for CPI inflation. [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗
read the original abstract

Predictive regression is a crucial tool for exploring return predictability. In this study, we introduce an efficient procedure for selecting and estimating active predictors and change points in structural break predictive regression. Our approach allows the number of change points to increase with the sample size and accommodates sparse active predictors that may be stationary or cointegrated. We begin by identifying the active predictors using a Sure Independence Canonical Screening (SICS) procedure. Next, we estimate the change points through a Ratio-Controlled Regression Screening (RCRS) method. Finally, we reduce redundancy by eliminating unnecessary breakpoints and predictors using information criteria (IC). This approach allows for consistent estimation and selection of true breakpoints and active predictors. Our simulations and empirical studies demonstrate that the proposed procedure performs effectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a three-step procedure for high-dimensional structural break predictive regression: Sure Independence Canonical Screening (SICS) to select sparse active predictors that may be stationary or cointegrated, Ratio-Controlled Regression Screening (RCRS) to estimate a possibly diverging number of change points, and information criteria (IC) to prune redundant breakpoints and predictors. The central claim is that the procedure achieves consistent estimation and selection of the true breakpoints and active predictors, with supporting evidence from simulations and empirical studies.

Significance. If the consistency results hold, particularly the sure-screening guarantee under cointegration and exponential growth of p, the method would offer a practical tool for econometric applications involving return predictability with structural breaks and high-dimensional regressors. The explicit accommodation of cointegrated I(1) predictors and diverging breaks distinguishes it from existing screening methods that assume stationarity.

major comments (2)
  1. [Abstract] Abstract and theoretical development: The claim that SICS achieves sure screening (with probability approaching 1) for active predictors including cointegrated ones, while p grows exponentially in n, is asserted without any derivation, assumption set, or high-dimensional rate that extends the canonical correlation ranking to I(1) processes. Standard sure-independence screening proofs rely on weak dependence for uniform convergence of marginal utilities; cointegration can induce persistent cross-sectional dependence that violates this, rendering the subsequent RCRS and IC steps moot. This is load-bearing for the consistency result stated in the abstract.
  2. [Theoretical results (wherever stated)] No section supplies the required non-stationary extension or explicit growth condition on p that would keep the SICS property intact under cointegration, as required by the weakest assumption identified in the stress test. Without this, the simulation performance cannot be taken as evidence that the procedure works in the regime claimed.
minor comments (1)
  1. [Abstract] The abstract states that simulations 'demonstrate that the proposed procedure performs effectively' but provides no details on design (e.g., dimension p, break magnitudes, cointegration strength) or metrics; these should be summarized in the main text or a table for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and for highlighting the critical gap in the theoretical support for the SICS procedure. We address the two major comments point by point below. Revisions will be made to align the claims with the available derivations.

read point-by-point responses
  1. Referee: [Abstract] Abstract and theoretical development: The claim that SICS achieves sure screening (with probability approaching 1) for active predictors including cointegrated ones, while p grows exponentially in n, is asserted without any derivation, assumption set, or high-dimensional rate that extends the canonical correlation ranking to I(1) processes. Standard sure-independence screening proofs rely on weak dependence for uniform convergence of marginal utilities; cointegration can induce persistent cross-sectional dependence that violates this, rendering the subsequent RCRS and IC steps moot. This is load-bearing for the consistency result stated in the abstract.

    Authors: We agree that the manuscript asserts the sure-screening property for cointegrated predictors under exponential growth of p without supplying the required derivation or rate conditions. The existing proofs rely on weak dependence assumptions that do not automatically extend to the persistent dependence induced by cointegration. We will revise the abstract to state that the sure-screening guarantee is established only under stationarity, while performance for cointegrated predictors is illustrated via simulations. This removes the unsupported claim from the abstract. revision: yes

  2. Referee: [Theoretical results (wherever stated)] No section supplies the required non-stationary extension or explicit growth condition on p that would keep the SICS property intact under cointegration, as required by the weakest assumption identified in the stress test. Without this, the simulation performance cannot be taken as evidence that the procedure works in the regime claimed.

    Authors: We concur that no section provides the non-stationary extension or the explicit growth condition on p for the cointegrated case. Simulations alone cannot substitute for the missing high-dimensional theory. We will add a clarifying paragraph in the theoretical results section that explicitly restricts the consistency claims to the stationary setting and notes the absence of a full extension to I(1) processes under exponential p. revision: yes

Circularity Check

0 steps flagged

No circularity; procedures are independently proposed

full rationale

The manuscript proposes two new screening procedures (SICS for active predictors and RCRS for breakpoints) followed by IC-based refinement. These steps are defined directly from the data and model assumptions without reducing to prior fitted quantities, self-citations, or ansatzes imported from the authors' earlier work. The consistency claims rest on the explicit algorithmic definitions and the stated high-dimensional regime rather than on any definitional equivalence or load-bearing self-reference. No equation or section equates a derived quantity to its own input by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are stated. The method implicitly relies on standard high-dimensional sparsity and time-series assumptions, but these are not detailed.

pith-pipeline@v0.9.1-grok · 5652 in / 1053 out tokens · 25918 ms · 2026-06-26T20:09:23.402683+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

129 extracted references · 45 canonical work pages

  1. [1]

    Journal of Financial and Quantitative Analysis , volume=

    Predictive regressions: A reduced-bias estimation method , author=. Journal of Financial and Quantitative Analysis , volume=

  2. [2]

    NBER Macroeconomics Annual , volume=

    Has the business cycle changed and why? , author=. NBER Macroeconomics Annual , volume=. 2002 , publisher=

  3. [3]

    Journal of the American statistical association , volume=

    Forecasting using principal components from a large number of predictors , author=. Journal of the American statistical association , volume=. 2002 , publisher=

  4. [4]

    Bai, Jushan and Ng, Serena , year =. A. Econometrica , volume =. doi:10.1111/j.1468-0262.2004.00528.x , urldate =

  5. [5]

    Journal of Business & Economic Statistics , volume=

    Testing for common trends in nonstationary large datasets , author=. Journal of Business & Economic Statistics , volume=. 2022 , publisher=

  6. [6]

    Testing for Structural Change of

    Berkes, Istv. Testing for Structural Change of. 2011 , journal =

  7. [7]

    2015 , journal =

    Instrumental Variable and Variable Addition Based Inference in Predictive Regressions , author =. 2015 , journal =

  8. [8]

    Variable Selection in High-Dimensional Linear Models: Partially Faithful Distributions and the Pc-Simple Algorithm , shorttitle =

    B. Variable Selection in High-Dimensional Linear Models: Partially Faithful Distributions and the Pc-Simple Algorithm , shorttitle =. Biometrika , volume =. doi:10.1093/biomet/asq008 , urldate =

  9. [9]

    2022 , journal =

    A New Robust Inference for Predictive Quantile Regression , author =. 2022 , journal =

  10. [10]

    2014 , journal =

    Testing Predictive Regression Models with Nonstationary Regressors , author =. 2014 , journal =

  11. [11]

    and Shiller, Robert J

    Campbell, John Y. and Shiller, Robert J. , year =. The. The Review of Financial Studies , volume =

  12. [12]

    2006 , journal =

    Efficient Tests of Stock Return Predictability , author =. 2006 , journal =

  13. [13]

    Implementing the

    Campbell, John Y and Yogo, Motohiro , langid =. Implementing the

  14. [14]

    , year =

    Caner, Mehmet and Hansen, Bruce E. , year =. Threshold. Econometrica , volume =

  15. [15]

    Chan, Ngaihang and Yau, Chunyip and Zhang, Rongmao , year =. Group. Journal of the American Statistical Association , volume =. doi:10.1080/01621459.2013.866566 , urldate =

  16. [16]

    Journal of Econometrics , series =

    Chan, Ngai Hang and Yau, Chun Yip and Zhang, Rong-Mao , year =. Journal of Econometrics , series =. doi:10.1016/j.jeconom.2015.03.023 , urldate =

  17. [17]

    Annals of Statistics , volume=

    Limiting distributions of least squares estimates of unstable autoregressive processes , author=. Annals of Statistics , volume=. 1988 , publisher=

  18. [18]

    Biometrika , volume=

    Extended Bayesian information criteria for model selection with large model spaces , author=. Biometrika , volume=. 2008 , publisher=

  19. [19]

    doi: 10.1080/01621459.2016.1211016

    Error Variance Estimation in Ultrahigh-Dimensional Additive Models , author =. 2018 , journal =. doi:10.1080/01621459.2016.1251440 , urldate =

  20. [20]

    and Deo, Rohit S

    Chen, Willa W. and Deo, Rohit S. and Yi, Yanping , year =. Uniform. Journal of Business & Economic Statistics , volume =. doi:10.1080/07350015.2013.818008 , urldate =

  21. [21]

    Demetrescu, Matei and Georgiev, Iliyan and Rodrigues, Paulo M. M. and Taylor, A. M. Robert , year =. Extensions to. Journal of Econometrics , issn =

  22. [22]

    and Fuller, Wayne A

    Dickey, David A. and Fuller, Wayne A. , year =. Distribution of the. Journal of the American Statistical Association , volume =

  23. [23]

    1920 , publisher=

    Scientific stock speculation , author=. 1920 , publisher=

  24. [24]

    2011 , journal =

    A Control Function Approach for Testing the Usefulness of Trending Variables in Forecast Models and Linear Regression , author =. 2011 , journal =

  25. [25]

    , year =

    Elliott, Graham and Stock, James H. , year =. Inference in. Econometric Theory , volume =

  26. [26]

    1993 , journal =

    Common Risk Factors in the Returns on Stocks and Bonds , author =. 1993 , journal =

  27. [27]

    Journal of the American Statistical Association , year =

    Are Latent Factor Regression and Sparse Regression Adequate? , author =. 2024 , journal =. doi:10.1080/01621459.2023.2169700 , urldate =

  28. [28]

    Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume =

    Sure Independence Screening for Ultrahigh Dimensional Feature Space , author =. 2008 , journal =. doi:10.1111/j.1467-9868.2008.00674.x , urldate =

  29. [29]

    Journal of Business & Economic Statistics , year =

    Determination of the Effective Cointegration Rank in High-Dimensional Time-Series Predictive Regressions , author =. Journal of Business & Economic Statistics , year =. doi:10.1080/07350015.2025.2550473 , urldate =

  30. [30]

    Predictive Quantile Regression with Mixed Roots and Increasing Dimensions:

    Fan, Rui and Lee, Ji Hyung and Shin, Youngki , year =. Predictive Quantile Regression with Mixed Roots and Increasing Dimensions:. Journal of Econometrics , volume =. doi:10.1016/j.jeconom.2022.11.006 , urldate =

  31. [31]

    2019 , journal =

    Predictive Quantile Regressions under Persistence and Conditional Heteroskedasticity , author =. 2019 , journal =. doi:10.1016/j.jeconom.2019.04.014 , urldate =

  32. [32]

    Journal of the American Statistical Association , volume =

    Gao, Zhaoxing and Tsay, Ruey S. , year =. Modeling. Journal of the American Statistical Association , volume =. doi:10.1080/01621459.2020.1862668 , urldate =

  33. [33]

    2026 , publisher=

    Gao, Zhan and Lee, Ji Hyung and Mei, Ziwei and Shi, Zhentao , journal=. 2026 , publisher=

  34. [34]

    2018 , journal =

    Testing for Parameter Instability in Predictive Regression Models , author =. 2018 , journal =

  35. [35]

    2013 , series =

    Matrix Computations , author =. 2013 , series =

  36. [36]

    Gonzalo, Jes. Regime-. 2012 , journal =

  37. [37]

    Inferring the

    Gonzalo, Jes. Inferring the. 2017 , journal =

  38. [38]

    , year =

    Hansen, Bruce E. , year =. Convergence to. Econometric Theory , volume =. doi:10.1017/S0266466600013189 , urldate =

  39. [39]

    Ing, Ching-Kang , year =. Model. The Annals of Statistics , volume =. 26931545 , eprinttype =

  40. [40]

    2015 , journal =

    Nonparametric Predictive Regression , author =. 2015 , journal =. doi:10.1016/j.jeconom.2014.05.015 , urldate =

  41. [41]

    Journal of Econometrics , volume=

    The limit distribution of the estimates in cointegrated regression models with multiple structural changes , author=. Journal of Econometrics , volume=. 2008 , publisher=

  42. [42]

    2017 , journal =

    Sure Screening by Ranking the Canonical Correlations , author =. 2017 , journal =. doi:10.1007/s11749-016-0497-z , urldate =

  43. [43]

    2020 , journal =

    High-Dimensional Predictive Regression in the Presence of Cointegration , author =. 2020 , journal =

  44. [44]

    , year =

    Kostakis, Alexandros and Magdalinos, Tassos and Stamatogiannis, Michalis P. , year =. Robust. The Review of Financial Studies , volume =

  45. [45]

    The Annals of Statistics , pages=

    Factor modeling for high-dimensional time series: inference for the number of factors , author=. The Annals of Statistics , pages=. 2012 , publisher=

  46. [46]

    Lee, JiHyung and Shi, Zhentao and Gao, Zhan , year =. On. Journal of Econometrics , volume =. doi:10.1016/j.jeconom.2021.02.002 , urldate =

  47. [47]

    Electronic Journal of Statistics , number =

    Yingbo Li and Robert Lund and Anuradha Hewaarachchi , title =. Electronic Journal of Statistics , number =

  48. [48]

    Meta-Analysis of Rare Binary Adverse Event Data

    Feature screening via distance correlation learning , author =. 2012 , journal =. doi:10.1080/01621459.2012.695654 , urldate =

  49. [49]

    Statistica Sinica , volume =

    Variable Selection via Partial Correlation , author =. Statistica Sinica , volume =. doi:10.5705/ss.202015.0473 , urldate =

  50. [51]

    arXiv preprint arXiv:2409.10860 , year=

    Cointegrated matrix autoregression models , author=. arXiv preprint arXiv:2409.10860 , year=

  51. [52]

    2018 , journal =

    A Perspective on Recent Methods on Testing Predictability of Asset Returns , author =. 2018 , journal =

  52. [53]

    Journal of Econometrics , volume=

    Estimation for double-nonlinear cointegration , author=. Journal of Econometrics , volume=. 2020 , publisher=

  53. [54]

    Li, Dong and Ling, Shiqing and Zhang, Rongmao , year =. On a. Journal of Business & Economic Statistics , volume =

  54. [55]

    Li, Chenxue and Li, Deyuan and Peng, Liang , year =. Uniform. Journal of Business & Economic Statistics , volume =. doi:10.1080/07350015.2015.1052460 , urldate =

  55. [56]

    Liu, Xiaohui and Long, Wei and Peng, Liang and Yang, Bingduo , year =. A. Journal of the American Statistical Association , volume =. doi:10.1080/01621459.2023.2203354 , urldate =

  56. [57]

    2023 , journal =

    Robust Inference with Stochastic Local Unit Root Regressors in Predictive Regressions , author =. 2023 , journal =. doi:10.1016/j.jeconom.2022.06.002 , urldate =

  57. [58]

    Group Fused

    Ma, Chenchen and Tu, Yundong , year =. Group Fused. Journal of Econometrics , volume =. doi:10.1016/j.jeconom.2022.02.003 , urldate =

  58. [59]

    Journal of Business & Economic Statistics , volume=

    McCracken, Michael W. and Ng, Serena , year =. Journal of Business & Economic Statistics , volume =. doi:10.1080/07350015.2015.1086655 , urldate =

  59. [60]

    Mei, Ziwei and Shi, Zhentao , journal=. On. 2024 , publisher=

  60. [61]

    Spurious

    Onatski, Alexei and Wang, Chen , year =. Spurious. Econometrica , volume =. doi:10.3982/ECTA16703 , urldate =

  61. [62]

    Fractal and Fractional , volume=

    A Novel Approach for Testing Fractional Cointegration in Panel Data Models with Fixed Effects , author=. Fractal and Fractional , volume=. 2024 , publisher=

  62. [63]

    , year =

    Owen, Art B. , year =. Empirical

  63. [64]

    Phillips, P. C. B. , year =. Regression. Econometrica , volume =

  64. [65]

    2013 , journal =

    Predictive Regression under Various Degrees of Persistence and Robust Long-Horizon Regression , author =. 2013 , journal =. doi:10.1016/j.jeconom.2013.04.011 , urldate =

  65. [66]

    arXiv preprint arXiv:2408.05665 , year=

    Change-Point Detection in Time Series Using Mixed Integer Programming , author=. arXiv preprint arXiv:2408.05665 , year=

  66. [67]

    Econometrica , FJOURNAL =

    Qu, Zhongjun and Perron, Pierre , year =. Estimating and. Econometrica , volume =. doi:10.1111/j.1468-0262.2006.00754.x , urldate =

  67. [68]

    2016 , journal =

    Short Interest and Aggregate Stock Returns , author =. 2016 , journal =

  68. [69]

    2019 , journal =

    Balanced Predictive Regressions , author =. 2019 , journal =

  69. [70]

    Safikhani, Abolfazl and Shojaie, Ali , year =. Joint. Journal of the American Statistical Association , volume =. doi:10.1080/01621459.2020.1770097 , urldate =

  70. [71]

    Schweikert, Karsten , year =. Oracle. Journal of Time Series Analysis , volume =. doi:10.1111/jtsa.12593 , urldate =

  71. [72]

    Applied Mathematical Modelling , volume=

    Change-points analysis for generalized integer-valued autoregressive model via minimum description length principle , author=. Applied Mathematical Modelling , volume=. 2024 , publisher=

  72. [73]

    arXiv preprint arXiv:1911.10552 , year=

    High-dimensional forecasting in the presence of unit roots and cointegration , author=. arXiv preprint arXiv:1911.10552 , year=

  73. [74]

    1999 , journal =

    Predictive Regressions , author =. 1999 , journal =

  74. [75]

    2023 , journal =

    Penetrating Sporadic Return Predictability , author =. 2023 , journal =. doi:10.1016/j.jeconom.2023.105509 , urldate =

  75. [76]

    Biometrics , volume=

    Multikink quantile regression for longitudinal data with application to progesterone data analysis , author=. Biometrics , volume=. 2023 , publisher=

  76. [77]

    Wang, Hansheng , year =. Forward. Journal of the American Statistical Association , volume =. doi:10.1198/jasa.2008.tm08516 , urldate =

  77. [78]

    2008 , journal =

    A Note on Adaptive Group Lasso , author =. 2008 , journal =. doi:10.1016/j.csda.2008.05.006 , urldate =

  78. [79]

    2008 , journal =

    A comprehensive Look at The Empirical Performance of Equity Premium Prediction , author =. 2008 , journal =

  79. [80]

    Proceedings of the National Academy of Sciences , volume=

    Nonlinear system theory: Another look at dependence , author=. Proceedings of the National Academy of Sciences , volume=. 2005 , publisher=

  80. [81]

    Journal of Business & Economic Statistics , volume =

    Regime-Specific Return Predictability in Quantiles , author=. Journal of Business & Economic Statistics , volume =. 2026 , publisher=

Showing first 80 references.