pith. sign in

arxiv: 2605.15358 · v1 · pith:FE6QBH6Pnew · submitted 2026-05-14 · 💰 econ.EM

Double Descent and Benign Overfitting in Macroeconomic Forecasting

Pith reviewed 2026-05-19 15:22 UTC · model grok-4.3

classification 💰 econ.EM
keywords double descentbenign overfittingmacroeconomic forecastingfactor modelsdata augmentationkernel ridge regressionFRED-MDFRED-QD
0
0 comments X p. Extension
pith:FE6QBH6P Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{FE6QBH6P}

Prints a linked pith:FE6QBH6P badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Augmenting macroeconomic datasets with synthetic copies from an estimated factor model produces an estimator that outperforms the Stock-Watson factor model for point forecasting across all series and horizons.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines double descent and benign overfitting in macroeconomic forecasting on standard US datasets driven by a small number of latent factors. It shows that the conditions for benign overfitting hold under exact factor models and under approximate ones when idiosyncratic variances are not too dispersed across series. Because natural macro panels have only moderate dimensions, the authors augment the data with synthetic copies generated from the estimated factor model; this augmentation converges to kernel ridge regression with a factor-structured kernel. The resulting estimator delivers consistent, statistically significant forecasting gains over the Stock-Watson benchmark, with improvements that grow at longer horizons. The authors conclude that overparameterization helps here by implicitly building a well-behaved kernel rather than being desirable on its own.

Core claim

In monthly FRED-MD and quarterly FRED-QD data, double-descent risk curves appear. The benign-overfitting mechanism of Bartlett et al. holds under the exact factor model and under the approximate factor model provided idiosyncratic variances are not too dispersed. Augmenting the original panel with synthetic copies from the estimated factor model achieves the overparameterization ratio needed for the theory and yields an estimator that converges to kernel ridge regression with a factor-structured kernel. This estimator consistently outperforms the Stock-Watson factor model for point forecasts across all series and horizons, with pervasive, statistically significant gains that increase with h.

What carries the argument

Data augmentation with synthetic copies drawn from an estimated factor model, which reaches the required overparameterization ratio and converges to kernel ridge regression with a factor-structured kernel.

If this is right

  • The augmented estimator produces point forecasts that beat the Stock-Watson factor model across every series and every horizon examined.
  • Forecasting gains are statistically significant and become larger as the horizon lengthens.
  • Benign overfitting improves performance by constructing a suitable kernel through overparameterization rather than through overparameterization itself.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same augmentation strategy could be applied to other moderate-dimensional economic panels to test whether similar kernel benefits appear.
  • Direct measurement of idiosyncratic variance dispersion in real macro datasets would show how often the required condition for benign overfitting is met in practice.
  • Viewing the procedure as implicit kernel construction suggests exploring other factor-structured kernels that might achieve comparable gains with less computation.

Load-bearing premise

Idiosyncratic variances must not be too dispersed across series for the benign-overfitting conditions to hold under the approximate factor model.

What would settle it

Finding that the forecasting gains over the Stock-Watson model disappear or lose significance in panels where idiosyncratic variances are highly dispersed would falsify the claim that the mechanism operates under realistic approximate factor conditions.

Figures

Figures reproduced from arXiv: 2605.15358 by Andrea Carriero, Davide Pettenuzzo, Florian Huber.

Figure 1
Figure 1. Figure 1: Double-descent MSFE curves for four FRED-MD series. The thin purple line traces [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of the relative improvement of the factor kernel over the FM benchmark [PITH_FULL_IMAGE:figures/full_fig_p030_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Variable persistence vs. relative improvement of the factor kernel over the FM [PITH_FULL_IMAGE:figures/full_fig_p031_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of the relative improvement of the factor kernel over the FM benchmark [PITH_FULL_IMAGE:figures/full_fig_p033_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Target persistence vs. relative improvement of the factor kernel over the FM [PITH_FULL_IMAGE:figures/full_fig_p034_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Subsample analysis for FRED-MD: Persistence vs. relative improvement of the [PITH_FULL_IMAGE:figures/full_fig_p035_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Subsample analysis for FRED-QD: Persistence vs. relative improvement of the [PITH_FULL_IMAGE:figures/full_fig_p036_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Double-descent MSFE curves for four FRED-QD targets. The thin purple line [PITH_FULL_IMAGE:figures/full_fig_p043_8.png] view at source ↗
read the original abstract

We study double descent and benign overfitting in macroeconomic forecasting. We document that double-descent risk curves arise in standard macroeconomic datasets that are driven by a small number of latent factors, and we characterize when the underlying benign-overfitting mechanism holds. The conditions of Bartlett et al. (2020) are satisfied under the exact factor model and can also hold under the more realistic approximate factor model, provided idiosyncratic variances are not too dispersed across series. Because macroeconomic panels have only moderate dimensions, the overparameterization ratio N/T required by the theory is not naturally available. Our solution is to augment the data with synthetic copies from an estimated factor model and we prove that this strategy converges to a kernel ridge regression with a factor-structured kernel. Using monthly (FRED-MD) and quarterly (FRED-QD) US data, the resulting estimator consistently outperforms the Stock-Watson factor model for point forecasting across all series and horizons, with gains that are pervasive, statistically significant, and increasing with the forecast horizon. Our results suggest that benign overfitting, when it works, succeeds because overparameterization implicitly constructs a well-behaved kernel, not because overparameterization is intrinsically desirable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies double descent and benign overfitting in macroeconomic forecasting. It documents double-descent risk curves in factor-driven macro datasets, characterizes when the benign-overfitting mechanism of Bartlett et al. (2020) holds under exact and approximate factor models (provided idiosyncratic variances are not too dispersed), and proposes data augmentation with synthetic draws from an estimated factor model to induce the required overparameterization in moderate-dimensional panels. The authors prove that this augmentation converges to kernel ridge regression with a factor-structured kernel. Empirically, on monthly FRED-MD and quarterly FRED-QD US data, the resulting estimator outperforms the Stock-Watson factor model in point forecasting across all series and horizons, with pervasive, statistically significant gains that increase with the forecast horizon.

Significance. If the convergence result and the attribution of gains to benign overfitting hold, the paper offers a valuable bridge between recent high-dimensional statistics and macroeconomic forecasting. The explicit proof of convergence to a factor-structured kernel and the use of public FRED datasets for reproducible comparisons are strengths. The findings suggest that overparameterization can be engineered to construct well-behaved kernels rather than being desirable per se, which could inform practical forecasting methods in economics.

major comments (2)
  1. [Section characterizing conditions under approximate factor model] The characterization of benign overfitting under the approximate factor model (the section discussing conditions from Bartlett et al. (2020)) states that the mechanism requires idiosyncratic variances not to be too dispersed across series. This dispersion condition is load-bearing for linking the reported outperformance to the benign-overfitting regime rather than to the implicit factor-structured kernel alone. No diagnostic, bound, or table reporting the realized dispersion of idiosyncratic variances after factor estimation on FRED-MD or FRED-QD is provided, leaving the attribution open to alternative explanations.
  2. [Proof of convergence to kernel ridge regression] The proof that the augmentation strategy converges to kernel ridge regression with a factor-structured kernel is central to the theoretical contribution. The dependence introduced by fitting the factor model parameters to the same data used for forecasting creates potential circularity; the tightness of the convergence and any uniformity conditions over the estimated factors should be stated explicitly (e.g., rates or high-probability bounds).
minor comments (2)
  1. [Empirical results section] The abstract and introduction refer to 'gains that are pervasive, statistically significant, and increasing with the forecast horizon,' but the precise test for significance (e.g., Diebold-Mariano or bootstrap) and the exact number of series/horizons should be summarized in a table for clarity.
  2. Figure captions for the double-descent risk curves should explicitly label the overparameterization ratio N/T and the augmentation factor to aid interpretation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below, indicating the revisions we plan to incorporate in the next version of the manuscript.

read point-by-point responses
  1. Referee: [Section characterizing conditions under approximate factor model] The characterization of benign overfitting under the approximate factor model (the section discussing conditions from Bartlett et al. (2020)) states that the mechanism requires idiosyncratic variances not to be too dispersed across series. This dispersion condition is load-bearing for linking the reported outperformance to the benign-overfitting regime rather than to the implicit factor-structured kernel alone. No diagnostic, bound, or table reporting the realized dispersion of idiosyncratic variances after factor estimation on FRED-MD or FRED-QD is provided, leaving the attribution open to alternative explanations.

    Authors: We agree that providing empirical evidence on the dispersion of idiosyncratic variances strengthens the link between the theoretical conditions and the observed forecasting gains. In the revised manuscript we will add a new table (and accompanying discussion) that reports summary statistics on the estimated idiosyncratic variances for both the FRED-MD and FRED-QD datasets after extracting the factors. These will include the ratio of the largest to smallest variance, the coefficient of variation of the variances, and selected quantiles. This diagnostic will allow readers to assess whether the “not too dispersed” condition is satisfied in the data. revision: yes

  2. Referee: [Proof of convergence to kernel ridge regression] The proof that the augmentation strategy converges to kernel ridge regression with a factor-structured kernel is central to the theoretical contribution. The dependence introduced by fitting the factor model parameters to the same data used for forecasting creates potential circularity; the tightness of the convergence and any uniformity conditions over the estimated factors should be stated explicitly (e.g., rates or high-probability bounds).

    Authors: We appreciate the referee’s emphasis on making the dependence structure and convergence rates fully explicit. The current proof already conditions on the estimated factors, but we acknowledge that additional uniformity statements would clarify the result. In the revision we will augment the proof appendix with explicit high-probability bounds on the approximation error, invoking standard rates for principal-component estimation of factors (under the usual assumptions of Bai (2003) and related literature). We will also state the uniformity conditions over the estimated loadings and factors more precisely, showing that the convergence to the factor-structured kernel holds with high probability as both T and N grow. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives that data augmentation with synthetic draws from an estimated factor model converges to kernel ridge regression equipped with a factor-structured kernel; this is a mathematical equivalence result rather than a reduction of the target claim to the fitted inputs by construction. The benign-overfitting conditions are referenced to the external Bartlett et al. (2020) paper, and the headline empirical outperformance is evaluated on the independent FRED-MD/QD panels. No self-definitional, fitted-input-renamed-as-prediction, or self-citation load-bearing steps appear in the load-bearing theoretical or empirical claims. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the factor-model representation of macro data, the dispersion condition on idiosyncratic variances, and the convergence of the synthetic-augmentation procedure to a kernel ridge regressor. No new entities are postulated.

free parameters (1)
  • number of latent factors
    Chosen or estimated to capture the common component; directly affects the synthetic copies and the resulting kernel.
axioms (1)
  • domain assumption Idiosyncratic variances are not too dispersed across series
    Required for the Bartlett et al. (2020) conditions to hold under the approximate factor model.

pith-pipeline@v0.9.0 · 5736 in / 1448 out tokens · 36038 ms · 2026-05-19T15:22:56.552884+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

  1. [1]

    and Ng, S

    Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1):191--221

  2. [2]

    and Ng, S

    Bai, J. and Ng, S. (2008). Forecasting economic time series using targeted predictors. Journal of Econometrics, 146(2):304--317

  3. [3]

    L., Long, P

    Bartlett, P. L., Long, P. M., Lugosi, G., and Tsigler, A. (2020). Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063--30070

  4. [4]

    Belkin, M., Hsu, D., Ma, S., and Mandal, S. (2019). Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proceedings of the National Academy of Sciences, 116(32):15849--15854

  5. [5]

    and Nibbering, D

    Boot, T. and Nibbering, D. (2019). Forecasting using random subspace methods. Journal of Econometrics, 209(2):391--406

  6. [6]

    Bunea, F., Strimas-Mackey, S., and Wegkamp, M. (2021). Interpolating predictors in high-dimensional factor regression. arXiv preprint arXiv:2002.02525v3

  7. [7]

    and Rothschild, M

    Chamberlain, G. and Rothschild, M. (1983). Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica, 51(5):1281--1304

  8. [8]

    M., Giannone, D., and Wang, Z

    Chi, T.-C., Fan, T.-H., Ghigliazza, R. M., Giannone, D., and Wang, Z. K. (2025). Macroeconomic forecasting and machine learning. arXiv preprint arXiv:2510.11008

  9. [9]

    G., Leroux, M., Stevanovi \'c , D., and Surprenant, S

    Coulombe, P. G., Leroux, M., Stevanovi \'c , D., and Surprenant, S. (2022). How is machine learning useful for macroeconomic forecasting? Journal of Applied Econometrics, 37(5):920--964

  10. [10]

    De Mol, C., Giannone, D., and Reichlin, L. (2008). Forecasting using a large number of predictors: Is Bayesian shrinkage a valid alternative to principal components? Journal of Econometrics, 146(2):318--328

  11. [11]

    Exterkate, P., Groenen, P. J. F., Heij, C., and van Dijk, D. (2016). Nonlinear forecasting with many predictors using kernel ridge regression. International Journal of Forecasting, 32(3):736--753

  12. [12]

    Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2000). The generalized dynamic-factor model: Identification and estimation. Review of Economics and Statistics, 82(4):540--554

  13. [13]

    Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2005). The generalized dynamic factor model: One-sided estimation and forecasting. Journal of the American Statistical Association, 100(471):830--840

  14. [14]

    and Ibragimov, R

    Gabaix, X. and Ibragimov, R. (2011). Rank - 1/2: A simple way to improve the OLS estimation of tail exponents. Journal of Business & Economic Statistics, 29(1):24--39

  15. [15]

    Hastie, T., Montanari, A., Rosset, S., and Tibshirani, R. J. (2022). Surprises in high-dimensional ridgeless least squares interpolation. Annals of Statistics, 50(2):949--986

  16. [16]

    Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. Annals of Statistics, 3(5):1163--1174

  17. [17]

    Horn, R. A. and Johnson, C. R. (2012). Matrix Analysis. Cambridge University Press, 2nd edition

  18. [18]

    Inoue, A., Jin, L., and Rossi, B. (2017). Rolling window selection for out-of-sample forecasting with time-varying parameters. Journal of Econometrics, 196(1):55--67

  19. [19]

    McCracken, M. W. and Ng, S. (2016). FRED-MD: A monthly database for macroeconomic research. Journal of Business & Economic Statistics, 34(4):574--589

  20. [20]

    C., Vasconcelos, G

    Medeiros, M. C., Vasconcelos, G. F., Veiga, \'A ., and Zilberman, E. (2021). Forecasting inflation in a data-rich environment: The benefits of machine learning methods. Journal of Business & Economic Statistics, 39(1):98--119

  21. [21]

    and Imaizumi, M

    Nakakita, S. and Imaizumi, M. (2025). Benign overfitting in time series linear model with over-parameterization. arXiv preprint arXiv:2204.08369v3

  22. [22]

    Stock, J. H. and Watson, M. W. (2002a). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association, 97(460):1167--1179

  23. [23]

    Stock, J. H. and Watson, M. W. (2002b). Macroeconomic forecasting using diffusion indexes. Journal of Business & Economic Statistics, 20(2):147--162

  24. [24]

    and Bartlett, P

    Tsigler, A. and Bartlett, P. L. (2023). Benign overfitting in ridge regression. Journal of Machine Learning Research, 24(123):1--76