pith. sign in

arxiv: 2505.04884 · v2 · submitted 2025-05-08 · 📊 stat.ME · math.ST· stat.TH

Model Selection for Unit-root Time Series with Many Predictors

Pith reviewed 2026-05-22 17:07 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH
keywords unit-root time seriesmodel selectionhigh-dimensional predictorsforward stepwise regressioninformation criterionsure screeningselection consistencyconditional heteroscedasticity
0
0 comments X

The pith

FHTD achieves sure screening and selection consistency for unit-root time series with many predictors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops FHTD, a model selection algorithm that combines forward stepwise regression, a high-dimensional information criterion, backward elimination, and data-driven thresholding. It proves that under mild assumptions allowing unknown locations and multiplicities of unit roots plus conditional heteroscedasticity, the forward stepwise regression step has the sure screening property while the full procedure achieves selection consistency. These guarantees matter for fields like economics where unit-root series often come with large numbers of candidate predictors. The proofs introduce two new technical tools: a functional central limit theorem for multivariate linear processes and a uniform lower bound on the minimum eigenvalue of sample covariance matrices. Simulations and applications to U.S. housing starts and unemployment data support the results.

Core claim

The paper claims that for general unit-root time series including those with many predictors, the FHTD procedure, built from forward stepwise regression (FSR), high-dimensional information criterion (HDIC), backward elimination using HDIC, and data-driven thresholding (DDT), has the sure screening property for the FSR stage and overall selection consistency, provided the mild assumptions on characteristic roots and heteroscedasticity hold. The proofs depend on two new results: a functional central limit theorem for multivariate linear processes and a uniform lower bound on the minimum eigenvalue of sample covariance matrices.

What carries the argument

FHTD, the composite model selection procedure using forward stepwise regression with high-dimensional information criterion, backward elimination, and data-driven thresholding

If this is right

  • FSR retains all relevant predictors with probability approaching one while eliminating irrelevant ones.
  • FHTD identifies the correct sparse model consistently as sample size grows.
  • The guarantees continue to hold when both predictors and errors exhibit conditional heteroscedasticity.
  • The new functional central limit theorem and eigenvalue bounds apply to other dependent high-dimensional processes.
  • The method can be applied directly to macroeconomic series such as housing starts and unemployment rates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The new technical results on linear processes and covariance eigenvalues could support analysis of other non-stationary high-dimensional time series.
  • Econometricians working with many candidate indicators might replace ad-hoc selection with FHTD for more reliable forecasts.
  • The framework invites extensions that relax the unit-root assumptions or incorporate additional forms of dependence.
  • Direct comparison of FHTD against existing high-dimensional selectors on benchmark macroeconomic datasets would test its relative advantage.

Load-bearing premise

The novel functional central limit theorem for multivariate linear processes and the uniform lower bound on the minimum eigenvalue of the sample covariance matrices must hold under the mild assumptions on the roots and heteroscedasticity.

What would settle it

A large-sample simulation study with unit-root series whose root locations or multiplicities violate the mild assumptions, in which FHTD fails to identify the true model with probability approaching one.

read the original abstract

This paper studies model selection for general unit-root time series, including the case with many exogenous predictors. We propose a new model selection algorithm, FHTD, that leverages forward stepwise regression (FSR), a high-dimensional information criterion (HDIC), a backward elimination method based on HDIC, and a data-driven thresholding (DDT) approach. Under some mild assumptions that allow for unknown locations and multiplicities of the characteristic roots on the unit circle of the time series and conditional heteroscedasticity in the predictors and errors, we establish the sure screening property of FSR and the selection consistency of FHTD. Our theoretical analysis relies on two novel technical contributions, namely a functional central limit theorem for multivariate linear processes and a uniform lower bound for the minimum eigenvalue of the sample covariance matrices, both of which are of independent interest. Simulation results corroborate the theoretical properties and show the superior performance of FHTD in model selection. We apply the proposed FHTD to model U.S. monthly housing starts and unemployment data, showcasing its practical utility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes the FHTD algorithm for model selection in general unit-root time series with many exogenous predictors. FHTD combines forward stepwise regression (FSR), a high-dimensional information criterion (HDIC), backward elimination based on HDIC, and data-driven thresholding (DDT). Under mild assumptions allowing unknown locations and multiplicities of characteristic roots on the unit circle plus conditional heteroscedasticity, the paper claims the sure screening property for FSR and selection consistency for FHTD. The proofs rely on two novel technical results: a functional central limit theorem for multivariate linear processes and a uniform lower bound on the minimum eigenvalue of sample covariance matrices.

Significance. If the central claims hold, the work addresses an important gap in high-dimensional model selection for nonstationary time series, with direct relevance to macroeconomic applications. The two novel technical results are explicitly positioned as being of independent interest and could support further work on inference under general unit-root configurations. The simulation study and empirical application to U.S. monthly housing starts and unemployment data provide concrete evidence of practical performance.

major comments (2)
  1. [§4.2] §4.2 (uniform lower bound result): the claimed uniform lower bound on the minimum eigenvalue of the sample covariance matrix is invoked to control the HDIC thresholding step and to separate signal from noise in the sure-screening argument. Under the paper's stated assumptions (unknown root multiplicities and conditional heteroscedasticity), the sample covariance can contain directions in which quadratic forms behave like partial-sum processes rather than scaling linearly with T; a concrete counter-example or additional rate condition is needed to confirm the bound holds at the rate required for sure screening.
  2. [Theorem 3.1] Theorem 3.1 (sure screening of FSR): the proof reduces the screening property to the FCLT plus the eigenvalue bound. Because the FCLT is stated for multivariate linear processes with the given root structure, it is necessary to verify explicitly that the uniform eigenvalue control remains valid when the multiplicity of unit-circle roots is unknown and can differ across predictors.
minor comments (2)
  1. [Simulation section] Table 1 and the simulation design: report the exact values of the tuning constants used for HDIC and DDT so that the reported superior performance can be reproduced.
  2. [Section 2] Notation for the characteristic polynomial: ensure that the definition of the multiplicity vector m is introduced before its first use in the assumptions and is carried consistently into the statements of the FCLT and eigenvalue bound.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and indicate the revisions we plan to make.

read point-by-point responses
  1. Referee: [§4.2] §4.2 (uniform lower bound result): the claimed uniform lower bound on the minimum eigenvalue of the sample covariance matrix is invoked to control the HDIC thresholding step and to separate signal from noise in the sure-screening argument. Under the paper's stated assumptions (unknown root multiplicities and conditional heteroscedasticity), the sample covariance can contain directions in which quadratic forms behave like partial-sum processes rather than scaling linearly with T; a concrete counter-example or additional rate condition is needed to confirm the bound holds at the rate required for sure screening.

    Authors: We appreciate this observation. The uniform lower bound in Section 4.2 is derived using the functional central limit theorem for the general unit-root processes, which explicitly handles the partial-sum behavior through the limiting Brownian motion processes. The assumptions on the roots and heteroscedasticity are designed to ensure the bound holds uniformly. To make this clearer, we will add a detailed remark explaining why no additional rate condition is needed and how the FCLT controls the quadratic forms in the directions of unit roots. We will also consider including a simple illustrative example in the revision. revision: partial

  2. Referee: [Theorem 3.1] Theorem 3.1 (sure screening of FSR): the proof reduces the screening property to the FCLT plus the eigenvalue bound. Because the FCLT is stated for multivariate linear processes with the given root structure, it is necessary to verify explicitly that the uniform eigenvalue control remains valid when the multiplicity of unit-circle roots is unknown and can differ across predictors.

    Authors: The FCLT in our paper is formulated for multivariate linear processes where each component can have its own characteristic roots on the unit circle with unknown multiplicities. The uniform eigenvalue bound is proven to hold across all such configurations by considering the worst-case partial sum processes. Therefore, it applies directly when multiplicities differ across predictors. We will revise the proof of Theorem 3.1 to include an explicit statement verifying this uniformity with respect to varying multiplicities. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results rest on newly derived technical theorems

full rationale

The paper's central claims (sure screening property for FSR and selection consistency of FHTD) are derived from two explicitly novel technical results: a functional central limit theorem for multivariate linear processes and a uniform lower bound on the minimum eigenvalue of sample covariance matrices. These are stated to hold under the paper's mild assumptions on unit-root roots and conditional heteroscedasticity and are presented as contributions of independent interest. No step reduces a prediction or main result to a fitted parameter, self-definition, or load-bearing self-citation by construction. The derivation chain is self-contained via the new proofs rather than renaming or importing prior fitted quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on two newly proved technical results and a set of mild domain assumptions whose precise statements are not visible in the abstract.

axioms (1)
  • domain assumption Mild assumptions allowing unknown locations and multiplicities of the characteristic roots on the unit circle together with conditional heteroscedasticity in predictors and errors
    These assumptions are invoked to establish sure screening of FSR and selection consistency of FHTD.

pith-pipeline@v0.9.0 · 5718 in / 1269 out tokens · 61049 ms · 2026-05-22T17:07:05.952583+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.