Tail-robust estimation of factor-adjusted vector autoregressive models for high-dimensional time series

Dylan Dijk; Haeran Cho

arxiv: 2509.22235 · v2 · submitted 2025-09-26 · 📊 stat.ME

Tail-robust estimation of factor-adjusted vector autoregressive models for high-dimensional time series

Dylan Dijk , Haeran Cho This is my paper

Pith reviewed 2026-05-18 12:34 UTC · model grok-4.3

classification 📊 stat.ME

keywords heavy-tailed time seriesfactor-adjusted VARrobust estimationdata truncationhigh-dimensional time seriessparse vector autoregressionmoment conditions

0 comments

The pith

Element-wise truncation lets factor-adjusted VAR models achieve light-tailed estimation rates under heavy tails with only (2 + 2ε) moments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a truncation-based procedure for high-dimensional time series that follow a factor-adjusted vector autoregressive structure. An element-wise truncation step is applied first to tame heavy tails, after which a two-stage method extracts the latent factors and then estimates the sparse VAR coefficients on the adjusted residuals. The resulting convergence rates are derived under the minimal assumption of (2 + 2ε) moments for ε in (0, 1) and are shown to approach the rates available in light-tailed settings as ε tends to 1. This matters for macroeconomic and financial series whose tails are heavier than Gaussian but still possess slightly more than second moments.

Core claim

The central claim is that element-wise truncation followed by a two-stage estimation procedure consistently estimates the factors and the sparse VAR parameter matrices in a factor-adjusted model, with explicit rates that depend on the tail index ε and become comparable to those obtained under stronger moment conditions as ε approaches 1.

What carries the argument

The element-wise truncation operator applied before the two-stage procedure that first estimates latent factors and then fits a sparse VAR to the factor-adjusted residuals.

If this is right

The derived rates make the effect of tail heaviness explicit through the parameter ε.
The two-stage separation isolates factor estimation from the sparse VAR step on the residuals.
Numerical experiments confirm competitive performance on simulated heavy-tailed data.
The procedure yields competitive forecasts for macroeconomic indicators in real data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The truncation device might transfer directly to other sparse high-dimensional time-series models that currently require stronger moment assumptions.
Because the method decouples factor recovery from the VAR fit, each stage could be replaced by newer robust estimators without redesigning the overall procedure.
The explicit dependence on ε supplies a practical diagnostic: heavier empirical tails should produce visibly slower finite-sample convergence.

Load-bearing premise

The time series must satisfy a factor-adjusted VAR structure in which the latent factors are identifiable and the remaining VAR coefficients obey standard sparsity conditions.

What would settle it

If Monte Carlo experiments with simulated series of increasing tail heaviness show that the empirical estimation error does not decrease toward the light-tailed benchmark as the moment index ε is raised from near 0 to near 1, the rate-comparability claim would be falsified.

Figures

Figures reproduced from arXiv: 2509.22235 by Dylan Dijk, Haeran Cho.

**Figure 1.** Figure 1: The largest eigenvalue (y-axis) of the covariance matrix estimated from the US macroeconomic dataset analysed in Section 6 (February 1960 to November 2023, n = 767) with subsets of cross-sections randomly sampled 100 times for each given dimension p ∈ {5, . . . , 108} (x-axis). As well as strong cross-sectional correlations, another common characteristic of high-dimensional time series is heavy-tailedness… view at source ↗

**Figure 2.** Figure 2: Boxplots of estimation errors measured as in (15) over 200 realisations as ( [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Boxplots of estimation errors measured as in (15) over 200 realisations as ( [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Box plots of estimation errors measured as in (15) over 200 realisations as ( [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Time path of the fluctuation test statistics for six variables, which are scaled rolling [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

read the original abstract

We study the problem of modelling high-dimensional, heavy-tailed time series data via a factor-adjusted vector autoregressive (VAR) model, which simultaneously accounts for pervasive co-movements of the variables by a handful of factors, as well as their remaining interconnectedness using a sparse VAR model. To handle heavy tails, we propose an element-wise data truncation step followed by a two-stage estimation procedure for estimating the latent factors and the VAR parameter matrices. Assuming the existence of the $(2 + 2\epsilon)$-th moment only for some $\epsilon \in (0, 1)$, we derive the rates of estimation which, making explicit the effect of heavy tails through $\epsilon$, are comparable to the rates attainable in light-tailed settings as $\epsilon \to 1$. Numerically, we demonstrate the competitive performance of the proposed estimators on simulated datasets and in an application to forecasting macroeconomics indicators.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Truncation before the two-stage factor and sparse VAR steps gives explicit rates under (2+2ε) moments that approach the light-tailed case as ε nears 1.

read the letter

The main takeaway is that element-wise truncation followed by standard two-stage factor-adjusted VAR estimation produces rates that depend on the tail index ε and recover the usual rates when the data are only mildly heavy-tailed. The paper shows this under the assumption that the series obey a factor-adjusted sparse VAR structure after truncation, and it backs the claim with simulations and a macro forecasting example. That combination of truncation with the existing two-stage procedure under weak moments is the concrete advance relative to prior robust VAR work. The explicit tracking of how the rate worsens with smaller ε is useful for applications where fourth moments may not exist. The macro example is a reasonable check that the method is not just theoretical. The soft spot is the truncation step itself. Because truncation is a coordinate-wise nonlinear map, it can distort the common factor component if large deviations are driven by the factors rather than the idiosyncratic noise. The abstract states that the rates hold after truncation, but the argument needs a clear bound on how much the low-rank plus sparse structure is altered; without that, the claim that rates stay comparable to light-tailed settings rests on an unverified approximation. If the proofs control the distortion tightly, the result stands; otherwise the comparability is weaker than stated. This paper is for people working on high-dimensional time series with fat tails, particularly in macro or finance settings where robust estimation matters more than asymptotic optimality under Gaussian assumptions. A reader who needs implementable methods with explicit tail dependence will get value from it. It deserves a serious referee because the theoretical target is specific and the practical motivation is clear.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a tail-robust estimator for high-dimensional factor-adjusted VAR models by first applying element-wise truncation to the observed series and then performing a two-stage procedure that estimates latent factors followed by sparse VAR coefficients on the truncated data. Under the sole assumption of (2 + 2ε) moments for some ε ∈ (0,1), it derives convergence rates for the factor loadings, factor scores, and VAR parameter matrices that are stated to recover the light-tailed rates as ε → 1. The claims are supported by numerical experiments on simulated heavy-tailed data and an application to macroeconomic forecasting.

Significance. If the truncation step can be shown to preserve the factor-adjusted VAR structure up to controllable error, the explicit dependence on ε would constitute a useful theoretical advance over existing robust factor and VAR methods that typically require stronger moment conditions or do not quantify the tail effect. The numerical results indicate practical competitiveness, but the contribution rests on the validity of the post-truncation model approximation.

major comments (2)

[Theoretical derivation of rates (main theorems and proofs)] The central rates are derived after element-wise truncation (see the description of the procedure and the statements of the main theorems). Because truncation is a coordinate-wise nonlinear map, it can distort the low-rank factor component whenever large deviations are driven by the common factors rather than the idiosyncratic noise. An explicit bound on the resulting perturbation to the factor loadings, the factor scores, or the effective VAR coefficients is required to justify that the derived rates remain valid and approach the light-tailed benchmark as ε → 1. Without such a bound, the comparability claim is not fully supported by the stated assumptions.
[Two-stage estimation procedure and model assumptions] The two-stage procedure invokes the usual identifiability and sparsity conditions on the latent factors and the VAR coefficient matrices after truncation. The manuscript should verify that these conditions continue to hold (approximately) for the truncated series or quantify the additional error they introduce into the subsequent sparse estimation step. This verification is load-bearing for the claimed rates under only (2 + 2ε) moments.

minor comments (2)

[Method description] Clarify the precise definition of the truncation threshold and whether it is chosen adaptively or fixed; any dependence on unknown quantities should be stated explicitly.
[Numerical experiments] In the simulation section, label the panels or legends with the specific value of ε used to generate each heavy-tailed scenario to facilitate direct comparison with the theoretical rates.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. The points raised highlight the need for more explicit justification of how truncation interacts with the factor structure and model assumptions. We address each major comment below and will incorporate the suggested clarifications and bounds into a revised version.

read point-by-point responses

Referee: The central rates are derived after element-wise truncation (see the description of the procedure and the statements of the main theorems). Because truncation is a coordinate-wise nonlinear map, it can distort the low-rank factor component whenever large deviations are driven by the common factors rather than the idiosyncratic noise. An explicit bound on the resulting perturbation to the factor loadings, the factor scores, or the effective VAR coefficients is required to justify that the derived rates remain valid and approach the light-tailed benchmark as ε → 1. Without such a bound, the comparability claim is not fully supported by the stated assumptions.

Authors: We appreciate this observation on the potential distortion from truncation. The current proofs derive rates directly on the truncated data under the (2+2ε) moment condition, but we agree an explicit perturbation analysis would make the argument more complete. In the revision we will add a supporting lemma that bounds the truncation error in Frobenius and operator norms; under the stated moment assumption this error is shown to be of strictly lower order than the main estimation rates for loadings and scores. The lemma will be invoked in the proofs of the main theorems to confirm that the rates recover the light-tailed benchmark as ε → 1. revision: yes
Referee: The two-stage procedure invokes the usual identifiability and sparsity conditions on the latent factors and the VAR coefficient matrices after truncation. The manuscript should verify that these conditions continue to hold (approximately) for the truncated series or quantify the additional error they introduce into the subsequent sparse estimation step. This verification is load-bearing for the claimed rates under only (2 + 2ε) moments.

Authors: We concur that explicit verification of the post-truncation conditions is necessary. The manuscript currently relies on the fact that truncation preserves the low-rank plus sparse structure up to controllable error, but we will strengthen this by adding a short argument (in Section 3 and the appendix) showing that the identifiability normalizations remain valid and that the sparsity pattern of the VAR coefficients is preserved with an additive error term that vanishes at the claimed rate. This additional error will be absorbed into the existing bounds without changing the final convergence rates. revision: yes

Circularity Check

0 steps flagged

No circularity: rates derived from stated moment assumptions and standard concentration bounds

full rationale

The paper derives estimation rates for the truncated factor-adjusted VAR under (2+2ε) moments via a two-stage procedure. No step reduces the claimed rates to a fitted quantity by construction, nor does any load-bearing premise collapse to a self-citation or ansatz imported from the authors' prior work. The truncation step is presented as a preprocessing device whose effect on the low-rank-plus-sparse structure is controlled by the moment assumption and standard inequalities; the final rates are obtained from these controls rather than by re-labeling inputs. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the factor-adjusted VAR model structure, moment existence, and standard identifiability conditions for factors and sparsity; no new invented entities are introduced.

axioms (2)

domain assumption The observed series admits a factor-adjusted VAR representation with pervasive factors and sparse idiosyncratic VAR coefficients.
Invoked to justify the two-stage separation of factor and VAR estimation.
domain assumption The (2 + 2ε)-th moment exists for some ε ∈ (0,1).
Stated explicitly as the only moment condition used to derive rates.

pith-pipeline@v0.9.0 · 5679 in / 1345 out tokens · 32006 ms · 2026-05-18T12:34:20.295595+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

[1]

and Alex R

Ahn, Seung C. and Alex R. Horenstein (2013). Eigenvalue ratio test for the number of factors. Econometrica81 (3):1203–1227. Akaike, Hirotogu (1998). Information theory and an extension of the maximum likelihood princi- ple.Selected Papers of Hirotugu Akaike. Ed. by Emanuel Parzen, Kunio Tanabe, and Genshiro Kitagawa. New York: Springer, 199–213. Alessi, L...

work page 2013
[2]

Chen, Jiahua and Zehua Chen (2008)

Kendrick Press. Chen, Jiahua and Zehua Chen (2008). Extended Bayesian information criteria for model selection with large model spaces.Biometrika95 (3):759–771. Diebold, Francis X. and Kamil Yilmaz (2008). Measuring financial asset return and volatility spillovers, with application to global equity markets.Econ. J.119 (534):158–171. Fan, Jianqing, Jianhua...

work page 2008
[3]

A direct estimation of high dimensional stationary vector autoregressions.J

Han, Fang, Huanran Lu, and Han Liu (2015). A direct estimation of high dimensional stationary vector autoregressions.J. Mach. Learn. Res.16 (97):3115–3150. Han, Fang and Wei Biao Wu (2023). Probability inequalities for high-dimensional time series under a triangular array framework.Springer Handbook of Engineering Statistics. Ed. by Hoang Pham. London: Sp...

work page arXiv 2015
[4]

Berlin: Springer Science and Business Media

L¨ utkepohl, Helmut (2005).New Introduction to Multiple Time Series Analysis. Berlin: Springer Science and Business Media. McCracken, Michael W. and Serena Ng (2016). FRED-MD: A monthly database for macroeconomic research.J. Bus. Econ. Stat.,34 (4):574–589. Merlev` ede, Florence, Magda Peligrad, and Emmanuel Rio (2009). Bernstein inequality and mod- erate...

work page 2005
[5]

Inst. Math. Stat., 273–292. Onatski, Alexei (2010). Determining the number of factors from empirical distribution of eigenvalues. Review of Economics and Statistics92 (4):1004–1016. Pan, Xiaoou and Wen-Xin Zhou (2022).adaHuber: Adaptive Huber Estimation and Regression. R package version 1.1. Qiu, Huitong, Sheng Xu, Fang Han, Han Liu, and Brian Caffo (2015...

work page 2010
[6]

In addition, we provide figures presenting how the choice ofτfrom our CV tuning procedure, described in Section 3.3, varies withnandp. A.2.1 Covariance estimation Table A.3 and Table A.4, present a summary of the relative performance of the sample covariance estimator with truncated data against that of the sample covariance with no truncation (withτ= ∞),...

work page arXiv
[7]

The data are generated as in (F2), (V1), (S1)

and sample sizen (pfixed at 100), and as the innovation distribution varies (see (I1)–(I3)). The data are generated as in (F2), (V1), (S1). The first row shows the averageτchosen across simulations, the numbers of each point is the average percentage of data points in absolute values that is less thanτ. The second row gives the box plots of the chosenτacr...

work page 2009
[8]

We select our choice of truncation parameterτ, by matching the rate of the variance component and the upper bound of the bias component

to compute the estimation convergence rate of the truncated autocovariance estimator. We select our choice of truncation parameterτ, by matching the rate of the variance component and the upper bound of the bias component. Theorem B.1(Theorem 2 from Merlev` ede, Peligrad, and Rio (2009)).Let(X j)j≥1 be a sequence of centred real-valued random variables. S...

work page 2009
[9]

Proof.By recursively using Minkowski’s inequality we have that for all 1≤i≤p: ∥Xit∥2+2ϵ ≤λ i1∥F1t∥2+2ϵ +· · ·+λ ir∥Frt∥2+2ϵ +∥ξ it∥2+2ϵ

(b) The bivariate process{(X it, Xjt)}isα-mixing, with the mixing coefficients decaying exponen- tially such thatα X(m)≤exp(−2cm),c >0, for all1≤i, j≤p. Proof.By recursively using Minkowski’s inequality we have that for all 1≤i≤p: ∥Xit∥2+2ϵ ≤λ i1∥F1t∥2+2ϵ +· · ·+λ ir∥Frt∥2+2ϵ +∥ξ it∥2+2ϵ . Then, by Assumption 1.(ii), Assumption 3, and thatris fixed, we ha...

work page 2007
[10]

35 (ii) 1 p ∥bΓx(τ)−Γ χ∥2 =O p ψn,p ∨ 1 p

Lemma B.3.There exists a diagonal matrixOwith±1on its diagonal, such that (i)∥ bEx −E χO∥F =O p ψn,p ∨ 1 p . 35 (ii) 1 p ∥bΓx(τ)−Γ χ∥2 =O p ψn,p ∨ 1 p . Proof.Applying Theorem 2 from Yu, Wang, and Samworth (2015) iteratively to the eigenvectors of bEx andE χ, there exists a diagonal orthogonal matrixOsuch that ∥bEx −E χO∥F ≤ r∥bΓx(τ)−Γ χ∥2 min min 1≤i≤r−1...

work page 2015
[11]

By definition of bβwe have that −2bβ ⊤ jbγ(j) +bβ ⊤ j bΓbβj +λ|bβj|1 ≤ −2β ⊤bγ(j) +β ⊤bΓβ+λ|β| 1 for anyβ∈R pd

ψn,p ∨ 1√p ∀1≤j≤p .(B.23) Equipped with (B.22) and (B.23), the remainder of the proof follows that of Proposition 4.1 from Basu and Michailidis (2015). By definition of bβwe have that −2bβ ⊤ jbγ(j) +bβ ⊤ j bΓbβj +λ|bβj|1 ≤ −2β ⊤bγ(j) +β ⊤bΓβ+λ|β| 1 for anyβ∈R pd. Settingβ=β ∗ j and rearranging the above inequality, we get v⊤bΓv≤2v ⊤ bγ(j) −bΓβ∗ j +λ |β∗ j...

work page 2015

[1] [1]

and Alex R

Ahn, Seung C. and Alex R. Horenstein (2013). Eigenvalue ratio test for the number of factors. Econometrica81 (3):1203–1227. Akaike, Hirotogu (1998). Information theory and an extension of the maximum likelihood princi- ple.Selected Papers of Hirotugu Akaike. Ed. by Emanuel Parzen, Kunio Tanabe, and Genshiro Kitagawa. New York: Springer, 199–213. Alessi, L...

work page 2013

[2] [2]

Chen, Jiahua and Zehua Chen (2008)

Kendrick Press. Chen, Jiahua and Zehua Chen (2008). Extended Bayesian information criteria for model selection with large model spaces.Biometrika95 (3):759–771. Diebold, Francis X. and Kamil Yilmaz (2008). Measuring financial asset return and volatility spillovers, with application to global equity markets.Econ. J.119 (534):158–171. Fan, Jianqing, Jianhua...

work page 2008

[3] [3]

A direct estimation of high dimensional stationary vector autoregressions.J

Han, Fang, Huanran Lu, and Han Liu (2015). A direct estimation of high dimensional stationary vector autoregressions.J. Mach. Learn. Res.16 (97):3115–3150. Han, Fang and Wei Biao Wu (2023). Probability inequalities for high-dimensional time series under a triangular array framework.Springer Handbook of Engineering Statistics. Ed. by Hoang Pham. London: Sp...

work page arXiv 2015

[4] [4]

Berlin: Springer Science and Business Media

L¨ utkepohl, Helmut (2005).New Introduction to Multiple Time Series Analysis. Berlin: Springer Science and Business Media. McCracken, Michael W. and Serena Ng (2016). FRED-MD: A monthly database for macroeconomic research.J. Bus. Econ. Stat.,34 (4):574–589. Merlev` ede, Florence, Magda Peligrad, and Emmanuel Rio (2009). Bernstein inequality and mod- erate...

work page 2005

[5] [5]

Inst. Math. Stat., 273–292. Onatski, Alexei (2010). Determining the number of factors from empirical distribution of eigenvalues. Review of Economics and Statistics92 (4):1004–1016. Pan, Xiaoou and Wen-Xin Zhou (2022).adaHuber: Adaptive Huber Estimation and Regression. R package version 1.1. Qiu, Huitong, Sheng Xu, Fang Han, Han Liu, and Brian Caffo (2015...

work page 2010

[6] [6]

In addition, we provide figures presenting how the choice ofτfrom our CV tuning procedure, described in Section 3.3, varies withnandp. A.2.1 Covariance estimation Table A.3 and Table A.4, present a summary of the relative performance of the sample covariance estimator with truncated data against that of the sample covariance with no truncation (withτ= ∞),...

work page arXiv

[7] [7]

The data are generated as in (F2), (V1), (S1)

and sample sizen (pfixed at 100), and as the innovation distribution varies (see (I1)–(I3)). The data are generated as in (F2), (V1), (S1). The first row shows the averageτchosen across simulations, the numbers of each point is the average percentage of data points in absolute values that is less thanτ. The second row gives the box plots of the chosenτacr...

work page 2009

[8] [8]

We select our choice of truncation parameterτ, by matching the rate of the variance component and the upper bound of the bias component

to compute the estimation convergence rate of the truncated autocovariance estimator. We select our choice of truncation parameterτ, by matching the rate of the variance component and the upper bound of the bias component. Theorem B.1(Theorem 2 from Merlev` ede, Peligrad, and Rio (2009)).Let(X j)j≥1 be a sequence of centred real-valued random variables. S...

work page 2009

[9] [9]

Proof.By recursively using Minkowski’s inequality we have that for all 1≤i≤p: ∥Xit∥2+2ϵ ≤λ i1∥F1t∥2+2ϵ +· · ·+λ ir∥Frt∥2+2ϵ +∥ξ it∥2+2ϵ

(b) The bivariate process{(X it, Xjt)}isα-mixing, with the mixing coefficients decaying exponen- tially such thatα X(m)≤exp(−2cm),c >0, for all1≤i, j≤p. Proof.By recursively using Minkowski’s inequality we have that for all 1≤i≤p: ∥Xit∥2+2ϵ ≤λ i1∥F1t∥2+2ϵ +· · ·+λ ir∥Frt∥2+2ϵ +∥ξ it∥2+2ϵ . Then, by Assumption 1.(ii), Assumption 3, and thatris fixed, we ha...

work page 2007

[10] [10]

35 (ii) 1 p ∥bΓx(τ)−Γ χ∥2 =O p ψn,p ∨ 1 p

Lemma B.3.There exists a diagonal matrixOwith±1on its diagonal, such that (i)∥ bEx −E χO∥F =O p ψn,p ∨ 1 p . 35 (ii) 1 p ∥bΓx(τ)−Γ χ∥2 =O p ψn,p ∨ 1 p . Proof.Applying Theorem 2 from Yu, Wang, and Samworth (2015) iteratively to the eigenvectors of bEx andE χ, there exists a diagonal orthogonal matrixOsuch that ∥bEx −E χO∥F ≤ r∥bΓx(τ)−Γ χ∥2 min min 1≤i≤r−1...

work page 2015

[11] [11]

By definition of bβwe have that −2bβ ⊤ jbγ(j) +bβ ⊤ j bΓbβj +λ|bβj|1 ≤ −2β ⊤bγ(j) +β ⊤bΓβ+λ|β| 1 for anyβ∈R pd

ψn,p ∨ 1√p ∀1≤j≤p .(B.23) Equipped with (B.22) and (B.23), the remainder of the proof follows that of Proposition 4.1 from Basu and Michailidis (2015). By definition of bβwe have that −2bβ ⊤ jbγ(j) +bβ ⊤ j bΓbβj +λ|bβj|1 ≤ −2β ⊤bγ(j) +β ⊤bΓβ+λ|β| 1 for anyβ∈R pd. Settingβ=β ∗ j and rearranging the above inequality, we get v⊤bΓv≤2v ⊤ bγ(j) −bΓβ∗ j +λ |β∗ j...

work page 2015