Functional Autoregression Without Truncation: A Continuous-Regularization Approach

Yao Zhao

arxiv: 2604.25205 · v1 · submitted 2026-04-28 · 📊 stat.ME

Functional Autoregression Without Truncation: A Continuous-Regularization Approach

Yao Zhao This is my paper

Pith reviewed 2026-05-07 15:33 UTC · model grok-4.3

classification 📊 stat.ME

keywords functional autoregressionTikhonov regularizationfunctional principal componentsconvergence ratesdata-driven parameter selectionforecastingoperator estimation

0 comments

The pith

Tikhonov regularization replaces discrete truncation with a continuous data-driven parameter in functional autoregression estimation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard estimation of functional autoregressive models projects curves onto leading functional principal components then fits a vector autoregression on the scores, but selecting the truncation level K is ad hoc and performs differently across data regimes. The paper replaces this discrete choice with a Tikhonov-regularized estimator that inverts a continuously penalized covariance operator using a single tuning parameter alpha chosen from the data. It proves that this estimator converges at rate n to the power of minus beta over two times beta plus one under a source condition on the target operator. Monte Carlo experiments show the method tracks the best possible FPCA truncation across regimes and beats it when the spectrum is wide, while a real-data example on daily air pollution curves cuts forecast error by nearly ten percent compared with common variance thresholds.

Core claim

The central claim is that the estimator defined bywidehat{Psi}_alpha equals widehat{C}_1 times the inverse of widehat{C}_0 plus alpha I achieves the convergence rate n to the power minus beta over two times beta plus one for beta in zero to one, saturating at n to the power minus one fourth, and delivers forecast performance that matches or exceeds the oracle-best discrete truncation without any prior knowledge of the effective dimension.

What carries the argument

The Tikhonov-regularized estimator widehat{Psi}_alpha equals widehat{C}_1 times open parenthesis widehat{C}_0 plus alpha I close parenthesis to the minus one, which continuously penalizes the inverse of the lagged covariance operator instead of cutting off small eigenvalues.

If this is right

The method converges without requiring knowledge of the operator's rank or eigenvalue decay.
Forecast accuracy remains stable when the spectrum of the covariance is spread out rather than concentrated on the first few components.
The saturation rate of n to the power minus one fourth is reached automatically for smoother targets.
Real-data forecast error drops by about ten percent relative to the 80 percent variance threshold commonly used in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same continuous-regularization idea could replace truncation steps in functional linear regression or functional principal component regression.
Data-driven alpha selection may transfer to other ill-posed inverse problems that arise in functional time series.
The approach suggests testing whether a single regularization parameter suffices for higher-order functional autoregressions as well.

Load-bearing premise

A data-driven rule for choosing the regularization level alpha works reliably without knowing the smoothness or effective dimension of the true operator in advance.

What would settle it

Run the Monte Carlo study with a known smooth target operator and check whether the data-driven alpha version attains the n to the power minus one fourth rate or whether its mean squared forecast error exceeds that of the oracle-best truncation.

Figures

Figures reproduced from arXiv: 2604.25205 by Yao Zhao.

**Figure 1.** Figure 1: One-step mean integrated squared forecast error by sample size, across three regimes. view at source ↗

**Figure 2.** Figure 2: Worst-case mean MISFE across the three regimes, by method and sample size. view at source ↗

**Figure 3.** Figure 3: Vienna TAB station PM10 data. (a) Daily mean of the preprocessed (square-root view at source ↗

**Figure 4.** Figure 4: Forecast accuracy for Vienna TAB PM10 data. (a) Box plots of ISE across all 2,635 view at source ↗

read the original abstract

Functional autoregressive models of order one (FAR(1)) are predominantly estimated by projecting curves onto leading functional principal components and fitting a vector autoregression in score space, requiring a discrete truncation level $K$ chosen by an \emph{ad hoc} variance threshold. We demonstrate via Monte Carlo experiments that the truncation choice is both consequential and highly regime dependent: the optimal $K$ can differ by an order of magnitude across data-generating regimes, while commonly used high variance thresholds (95\%, 99\%) lead to substantial forecast deterioration, inflating error by up to $35 \%$ relative to an oracle benchmark. We propose a Tikhonov-regularized estimator $\widehat{\Psi}_\alpha = \widehat{C}_1(\widehat{C}_0 + \alpha I)^{-1}$ that replaces the discrete truncation choice with a continuous regularization parameter, selected in a data-driven manner. We establish the convergence rate $n^{-\beta/(2(\beta+1))}$ under a source condition with smoothness parameter $\beta \in (0, 1]$, achieving the saturation rate $n^{-1/4}$ for smoother targets. Across three contrasting regimes and four sample sizes, the proposed estimator closely tracks the oracle-best FPCA rule and outperforms it in the most challenging wide-spectrum regime, without prior knowledge of the effective operator dimension. An application to 2{,}735 daily intraday PM10 curves from Vienna confirms a 9.7\% reduction in mean forecast error relative to the popular 80\% threshold and exhibits more stable parameter adaptation across 16 winter seasons.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes replacing discrete truncation in FPCA-based estimation of functional autoregressive operators with a Tikhonov-regularized estimator defined as widehat{Psi}_alpha = widehat{C}_1 (widehat{C}_0 + alpha I)^{-1}, where alpha is chosen in a data-driven manner. It derives the convergence rate n^{-beta/(2(beta+1))} under a source condition with smoothness beta in (0,1], with saturation at n^{-1/4}, and reports Monte Carlo results across three regimes showing the estimator tracks or exceeds oracle FPCA performance (especially in wide-spectrum cases) plus a 9.7% forecast-error reduction on 2735 daily PM10 curves.

Significance. If the theoretical rates and empirical claims hold, the work provides a practical, less regime-sensitive alternative to ad-hoc truncation thresholds in functional time series, with rates that align with standard regularization theory and simulation evidence of robustness without prior knowledge of effective dimension. The design across contrasting regimes and the real-data application are clear strengths.

major comments (3)

[Abstract / Theoretical Results] Abstract and theoretical section: the rate n^{-beta/(2(beta+1))} requires alpha to scale as n^{-1/(beta+1)} (balancing bias and variance under the source condition), yet the manuscript does not establish that the data-driven alpha selector (GCV, discrepancy, or otherwise) is provably adaptive to unknown beta; without such a guarantee the rate claim is not supported for general regimes.
[Monte Carlo Experiments] Simulation study: the claim that the estimator 'outperforms the oracle-best FPCA rule in the most challenging wide-spectrum regime' without prior knowledge of effective dimension rests on the specific alpha-selection rule; the Monte Carlo description must detail the exact procedure, its tuning, and whether it adapts to eigenvalue decay, as non-adaptive selection would undermine both the rate and the outperformance result.
[Application] §4 (or equivalent real-data section): the reported 9.7% reduction relative to the 80% threshold is presented as evidence of practical advantage, but without reporting the selected alpha values across the 16 seasons or comparing against a range of fixed truncation levels, it is difficult to attribute the gain specifically to the continuous-regularization approach rather than to a favorable alpha choice.

minor comments (2)

[Methodology] Notation: the operator C_0 and C_1 should be defined explicitly (empirical covariance and cross-covariance) at first use to avoid ambiguity with population quantities.
[Monte Carlo Experiments] Figure clarity: the Monte Carlo plots comparing estimators across regimes would benefit from explicit indication of the selected alpha values or effective truncation levels for each method.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below, clarifying the theoretical assumptions, expanding the simulation details, and strengthening the empirical presentation as suggested.

read point-by-point responses

Referee: [Abstract / Theoretical Results] Abstract and theoretical section: the rate n^{-beta/(2(beta+1))} requires alpha to scale as n^{-1/(beta+1)} (balancing bias and variance under the source condition), yet the manuscript does not establish that the data-driven alpha selector (GCV, discrepancy, or otherwise) is provably adaptive to unknown beta; without such a guarantee the rate claim is not supported for general regimes.

Authors: We agree that the convergence rate n^{-β/(2(β+1))} is established under the assumption that α is chosen to satisfy the balancing condition α ∼ n^{-1/(β+1)} for the given source condition. The manuscript does not prove that any particular data-driven selector (GCV or otherwise) is adaptive to unknown β. We will revise the abstract and theoretical section to state explicitly that the rate applies to the oracle-tuned α, while the data-driven implementation is justified by the Monte Carlo evidence of competitive performance across regimes. A complete adaptivity proof is left for future work. revision: yes
Referee: [Monte Carlo Experiments] Simulation study: the claim that the estimator 'outperforms the oracle-best FPCA rule in the most challenging wide-spectrum regime' without prior knowledge of effective dimension rests on the specific alpha-selection rule; the Monte Carlo description must detail the exact procedure, its tuning, and whether it adapts to eigenvalue decay, as non-adaptive selection would undermine both the rate and the outperformance result.

Authors: We will expand the Monte Carlo section to specify the exact α-selection procedure (generalized cross-validation), its implementation, any tuning constants, and its observed behavior with respect to eigenvalue decay in each regime. This addition will document that the selection operates without prior knowledge of effective dimension and will support the reported outperformance in the wide-spectrum case. revision: yes
Referee: [Application] §4 (or equivalent real-data section): the reported 9.7% reduction relative to the 80% threshold is presented as evidence of practical advantage, but without reporting the selected alpha values across the 16 seasons or comparing against a range of fixed truncation levels, it is difficult to attribute the gain specifically to the continuous-regularization approach rather than to a favorable alpha choice.

Authors: We will revise the application section to report the selected α values for each of the 16 seasons and to include forecast-error comparisons against a range of fixed truncation thresholds (80%, 90%, 95%, 99%) in addition to the oracle benchmark. These additions will allow readers to assess whether the observed improvement is attributable to the continuous-regularization method. revision: yes

Circularity Check

0 steps flagged

No circularity: estimator definition and rate are standard and independent of inputs

full rationale

The paper explicitly defines the Tikhonov estimator as the regularized inverse of the empirical covariance operators and states that the convergence rate is established under an external source condition with parameter beta. No equation reduces the claimed rate or the data-driven alpha choice to a fitted quantity by construction, nor does any load-bearing step rely on a self-citation that itself assumes the target result. The comparison to oracle FPCA is external and the theoretical guarantee is conditional on the source condition rather than tautological. The derivation chain remains self-contained against standard regularization theory.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The theoretical convergence relies on the source condition assumption, while practical use depends on the data-driven choice of the regularization parameter without needing regime-specific knowledge.

free parameters (1)

regularization parameter alpha
Chosen in a data-driven manner for each dataset, affecting the estimator's performance.

axioms (1)

domain assumption Source condition with smoothness parameter beta in (0,1] on the target operator
Invoked to derive the convergence rate of the estimator.

pith-pipeline@v0.9.0 · 5577 in / 1487 out tokens · 67683 ms · 2026-05-07T15:33:46.030332+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

Aue, A., Norinho, D. D. and H¨ ormann, S. (2015), ‘On the prediction of stationary functional time series’,Journal of the American Statistical Association110(509), 378–392

work page 2015
[2]

(2022), ‘Historical air quality data vienna, 1986–2021’, TU Wien Research Data Repository

Augustyn-Gal, R. (2022), ‘Historical air quality data vienna, 1986–2021’, TU Wien Research Data Repository. Provided by Umweltbundesamt Austria. 22

work page 2022
[3]

(2000),Linear Processes in Function Spaces: Theory and Applications, Vol

Bosq, D. (2000),Linear Processes in Function Spaces: Theory and Applications, Vol. 149 of Lecture Notes in Statistics, Springer-Verlag, New York

work page 2000
[4]

(2011), Inverse problems in statistics,inP

Cavalier, L. (2011), Inverse problems in statistics,inP. Alquier, E. Gautier and G. Stoltz, eds, ‘Inverse Problems and High-Dimensional Estimation’, Vol. 203 ofLecture Notes in Statistics,

work page 2011
[5]

and Sarda, P

Crambes, C., Kneip, A. and Sarda, P. (2009), ‘Smoothing splines estimators for functional linear regression’,The Annals of Statistics37(1), 35–72

work page 2009
[6]

W., Hanke, M

Engl, H. W., Hanke, M. and Neubauer, A. (1996),Regularization of Inverse Problems, Vol. 375 ofMathematics and Its Applications, Kluwer Academic Publishers, Dordrecht

work page 1996
[7]

and Horowitz, J

Hall, P. and Horowitz, J. L. (2007), ‘Methodology and convergence rates for functional linear regression’,The Annals of Statistics35(1), 70–91. H¨ ormann, S. and Kokoszka, P. (2010), ‘Weakly dependent functional data’,The Annals of Statistics38(3), 1845–1884. Horv´ ath, L. and Kokoszka, P. (2012),Inference for Functional Data with Applications, Springer, New York

work page 2007
[8]

and Reimherr, M

Kokoszka, P. and Reimherr, M. (2017),Introduction to Functional Data Analysis, Chapman and Hall/CRC, Boca Raton

work page 2017
[9]

V., Mammen, E

Lepski, O. V., Mammen, E. and Spokoiny, V. G. (1997), ‘Optimal spatial adaptation to inhomogeneous smoothness: An approach based on kernel estimates with variable bandwidth selectors’,The Annals of Statistics25(3), 929–947

work page 1997
[10]

and Shang, H

Paparoditis, E. and Shang, H. L. (2021), ‘Bootstrap prediction bands for functional time series’, Journal of the American Statistical Association. Verify exact volume, issue, pages, and year; Paparoditis has several related papers in this period

work page 2021
[11]

Ramsay, J. O. and Silverman, B. W. (2005),Functional Data Analysis, 2nd edn, Springer, New York

work page 2005
[12]

and Nicolae, D

Reimherr, M. and Nicolae, D. (2016), ‘Estimating variance components in functional linear models with applications to genetic heritability’,Journal of the American Statistical Association 111(513), 407–422

work page 2016
[13]

(1990),Spline Models for Observational Data, Vol

Wahba, G. (1990),Spline Models for Observational Data, Vol. 59 ofCBMS-NSF Regional Conference Series in Applied Mathematics, SIAM. Appendix: Selection ofαby cross-validation The regularization parameter α is selected by one-step-ahead cross-validation on a held-out portion of the training path. Let nv = max(⌊0.2n⌋, 20) denote the size of the validation bl...

work page 1990

[1] [1]

Aue, A., Norinho, D. D. and H¨ ormann, S. (2015), ‘On the prediction of stationary functional time series’,Journal of the American Statistical Association110(509), 378–392

work page 2015

[2] [2]

(2022), ‘Historical air quality data vienna, 1986–2021’, TU Wien Research Data Repository

Augustyn-Gal, R. (2022), ‘Historical air quality data vienna, 1986–2021’, TU Wien Research Data Repository. Provided by Umweltbundesamt Austria. 22

work page 2022

[3] [3]

(2000),Linear Processes in Function Spaces: Theory and Applications, Vol

Bosq, D. (2000),Linear Processes in Function Spaces: Theory and Applications, Vol. 149 of Lecture Notes in Statistics, Springer-Verlag, New York

work page 2000

[4] [4]

(2011), Inverse problems in statistics,inP

Cavalier, L. (2011), Inverse problems in statistics,inP. Alquier, E. Gautier and G. Stoltz, eds, ‘Inverse Problems and High-Dimensional Estimation’, Vol. 203 ofLecture Notes in Statistics,

work page 2011

[5] [5]

and Sarda, P

Crambes, C., Kneip, A. and Sarda, P. (2009), ‘Smoothing splines estimators for functional linear regression’,The Annals of Statistics37(1), 35–72

work page 2009

[6] [6]

W., Hanke, M

Engl, H. W., Hanke, M. and Neubauer, A. (1996),Regularization of Inverse Problems, Vol. 375 ofMathematics and Its Applications, Kluwer Academic Publishers, Dordrecht

work page 1996

[7] [7]

and Horowitz, J

Hall, P. and Horowitz, J. L. (2007), ‘Methodology and convergence rates for functional linear regression’,The Annals of Statistics35(1), 70–91. H¨ ormann, S. and Kokoszka, P. (2010), ‘Weakly dependent functional data’,The Annals of Statistics38(3), 1845–1884. Horv´ ath, L. and Kokoszka, P. (2012),Inference for Functional Data with Applications, Springer, New York

work page 2007

[8] [8]

and Reimherr, M

Kokoszka, P. and Reimherr, M. (2017),Introduction to Functional Data Analysis, Chapman and Hall/CRC, Boca Raton

work page 2017

[9] [9]

V., Mammen, E

Lepski, O. V., Mammen, E. and Spokoiny, V. G. (1997), ‘Optimal spatial adaptation to inhomogeneous smoothness: An approach based on kernel estimates with variable bandwidth selectors’,The Annals of Statistics25(3), 929–947

work page 1997

[10] [10]

and Shang, H

Paparoditis, E. and Shang, H. L. (2021), ‘Bootstrap prediction bands for functional time series’, Journal of the American Statistical Association. Verify exact volume, issue, pages, and year; Paparoditis has several related papers in this period

work page 2021

[11] [11]

Ramsay, J. O. and Silverman, B. W. (2005),Functional Data Analysis, 2nd edn, Springer, New York

work page 2005

[12] [12]

and Nicolae, D

Reimherr, M. and Nicolae, D. (2016), ‘Estimating variance components in functional linear models with applications to genetic heritability’,Journal of the American Statistical Association 111(513), 407–422

work page 2016

[13] [13]

(1990),Spline Models for Observational Data, Vol

Wahba, G. (1990),Spline Models for Observational Data, Vol. 59 ofCBMS-NSF Regional Conference Series in Applied Mathematics, SIAM. Appendix: Selection ofαby cross-validation The regularization parameter α is selected by one-step-ahead cross-validation on a held-out portion of the training path. Let nv = max(⌊0.2n⌋, 20) denote the size of the validation bl...

work page 1990