Benign Overfitting in Economic Forecasting via Noise Regularization
Pith reviewed 2026-05-24 04:48 UTC · model grok-4.3
The pith
A ridgeless regression augmented with noise predictors matches the asymptotic forecast accuracy of an oracle that knows the true factors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When the outcome and the high-dimensional predictors share a low-dimensional factor structure, the population best linear predictor is dense. A ridgeless regression that deliberately augments the predictor matrix with pure noise variables attains the same asymptotic out-of-sample mean squared error as an oracle regression on the true factors. The mechanism is eigenvalue shrinkage of the design matrix, which reduces the variance term in the forecast error decomposition without any factor estimation or strong-factor assumption.
What carries the argument
ridgeless regression augmented with noise predictors, which shrinks the eigenvalues of the Gram matrix and thereby controls out-of-sample variance
If this is right
- Forecasts achieve oracle accuracy without estimating or even identifying the latent factors.
- Perfect variable selection that discards noise variables can increase forecast error when the retained dimension is close to sample size.
- The same noise-augmented procedure improves and stabilizes predictions for U.S. inflation, international GDP growth, and the equity risk premium.
- The gain is produced by a reduction in the variance component of forecast error rather than by bias reduction.
Where Pith is reading between the lines
- The same regularization may be useful in other high-dimensional economic series that exhibit approximate factor structure.
- It offers a simple alternative to explicit factor extraction or penalized sparse methods when the goal is pure forecasting.
- The finding raises the question of how much deliberate noise is optimal when the factor dimension is unknown.
Load-bearing premise
Both the outcome variable and the high-dimensional predictors are generated by a small number of latent factors, which forces the linear forecast model to be dense.
What would settle it
A Monte Carlo design in which the true factors are known and the mean squared forecast error of the noise-augmented ridgeless estimator is strictly larger than that of the oracle factor regression for large samples.
Figures
read the original abstract
This paper studies linear overparameterized models in economic forecasting and highlights that including noise variables (regressors with no predictive power) regularizes the estimator. We consider a setting where both the outcome variable and the high-dimensional predictors are driven by a small number of latent factors, and show that the linear forecast model is dense rather than sparse. It turns out that a ridgeless regression augmented with noise predictors attains the same asymptotic forecast accuracy as an oracle with known true factors, without estimating the factors or assuming them to be strong. The gain comes from shrinkage of the eigenvalues of the design matrix, which reduces the out-of-sample variance. In contrast, perfect variable selection that removes noise variables can worsen forecasts when the number of retained predictors is comparable to the sample size. Empirically, we apply this approach to forecasting U.S. inflation, international GDP growth, and the U.S. equity risk premium, finding that noise regularization improves and stabilizes predictive performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that in economic forecasting settings where both the outcome y and high-dimensional predictors X are driven by a small number of latent factors (making the population linear projection dense rather than sparse), a ridgeless regression augmented with noise predictors (regressors with no predictive power) attains the same asymptotic out-of-sample forecast accuracy as an oracle that knows the true factors. The mechanism is eigenvalue shrinkage of the design matrix that reduces variance; this is contrasted with perfect variable selection, which can worsen performance when the number of retained predictors is comparable to sample size. The result is supported by theory under the factor model and by empirical applications to U.S. inflation, international GDP growth, and the U.S. equity risk premium.
Significance. If the central asymptotic equivalence holds, the paper supplies a practical regularization device for high-dimensional economic forecasting that avoids explicit factor estimation and does not require strong-factor assumptions. It explicitly credits the theoretical equivalence result and the empirical finding that noise augmentation improves and stabilizes predictive performance relative to selection-based alternatives.
major comments (2)
- [Abstract and §3] Abstract and §3 (theoretical setup): the oracle-equivalence claim is load-bearing on the population coefficient vector β being dense under X = ΛF + e and y = γ'F + u. The manuscript states this density result but supplies no explicit rate conditions on the number of added noise variables relative to n, p, or factor strength that would keep the equivalence intact when factors are weak or loadings heterogeneous; without such conditions the eigenvalue-shrinkage benefit need not dominate selection-based alternatives.
- [Empirical applications] Empirical applications (forecasting tables for inflation, GDP, and equity premium): the reported gains in accuracy and stability are presented without accompanying standard errors, confidence bands, or robustness checks to the exact count of noise variables, which is required to assess whether the finite-sample improvements are statistically distinguishable from the oracle benchmark.
minor comments (1)
- [Notation and estimator definition] The definition of the ridgeless estimator after noise augmentation would benefit from an explicit equation (e.g., the augmented design matrix and the resulting β̂) placed in the main text rather than only in an appendix.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. We address each major comment below, clarifying the theoretical scope and committing to empirical enhancements where appropriate.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (theoretical setup): the oracle-equivalence claim is load-bearing on the population coefficient vector β being dense under X = ΛF + e and y = γ'F + u. The manuscript states this density result but supplies no explicit rate conditions on the number of added noise variables relative to n, p, or factor strength that would keep the equivalence intact when factors are weak or loadings heterogeneous; without such conditions the eigenvalue-shrinkage benefit need not dominate selection-based alternatives.
Authors: The density of β follows immediately from the factor model assumptions in Section 3 (Assumptions 1–3), which allow weak factors and heterogeneous loadings without requiring strong-factor conditions. Theorems 1–2 derive the asymptotic equivalence by showing that noise augmentation induces eigenvalue shrinkage that matches the oracle variance term, and the proofs hold under the stated rates on p/n and the factor structure; no additional rate restrictions on the number of noise variables are needed beyond those already implicit in the high-dimensional regime. We will add a clarifying paragraph in §3.2 explicitly noting that the equivalence continues to hold for weak factors provided the loadings satisfy the moment conditions in Assumption 2, thereby addressing the concern about dominance over selection methods. revision: partial
-
Referee: [Empirical applications] Empirical applications (forecasting tables for inflation, GDP, and equity premium): the reported gains in accuracy and stability are presented without accompanying standard errors, confidence bands, or robustness checks to the exact count of noise variables, which is required to assess whether the finite-sample improvements are statistically distinguishable from the oracle benchmark.
Authors: We agree that standard errors and robustness checks would strengthen the empirical section. In the revision we will (i) report bootstrap standard errors for the out-of-sample R² and MSFE differences relative to the oracle benchmark, (ii) add a new table (or appendix figure) showing results for a range of noise-variable counts around the values used in the main tables, and (iii) include Diebold–Mariano tests where feasible. These additions will allow readers to assess statistical distinguishability. revision: yes
Circularity Check
No significant circularity; derivation is model-based asymptotic analysis
full rationale
The paper derives density of the population projection coefficients from the shared latent factor structure (X = ΛF + e, y = γ'F + u) and shows asymptotic equivalence of ridgeless regression plus noise to the oracle that uses F directly. These steps are explicit mathematical results under the maintained assumptions rather than reductions by construction, fitted-parameter renamings, or load-bearing self-citations. The oracle benchmark is internal to the factor model but is not tautological; the equivalence is obtained via eigenvalue shrinkage arguments that are independent of the target risk quantity. No self-citation chains or ansatz smuggling are indicated in the provided text. The density claim follows directly from the factor loadings without redefining the target quantity in terms of itself.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption both the outcome variable and the high-dimensional predictors are driven by a small number of latent factors
Reference graph
Works this paper leans on
- [1]
-
[2]
Atanasov, V., S. V. M ller, and R. Priestley (2020). Consumption fluctuations and expected returns. The Journal of Finance\/ 75\/ (3), 1677--1713
work page 2020
-
[3]
Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica\/ 71 , 135--171
work page 2003
-
[4]
Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Econometrica\/ 70 , 191--221
work page 2002
-
[5]
Bai, J. and S. Ng (2006). Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions. Econometrica\/ 74\/ (4), 1133--1150
work page 2006
-
[6]
Bai, Z. and Y. Yin (1993). Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. The Annals of Probability\/ 21\/ (3), 1275--1294
work page 1993
-
[7]
Ball, R. and V. V. Nikolaev (2022). On earnings and cash flows as predictors of future cash flows. Journal of Accounting and Economics\/ 73\/ (1), 101430
work page 2022
-
[8]
Barro, R. J. and J.-W. Lee (1994). Sources of economic growth. In Carnegie-Rochester conference series on public policy , Volume 40, pp.\ 1--46. Elsevier
work page 1994
-
[9]
Bekaert, G. and M. Hoerova (2014). The vix, the variance premium and stock market volatility. Journal of econometrics\/ 183\/ (2), 181--192
work page 2014
-
[10]
Belkin, M., D. Hsu, S. Ma, and S. Mandal (2019). Reconciling modern machine-learning practice and the classical bias--variance trade-off. Proceedings of the National Academy of Sciences\/ 116\/ (32), 15849--15854
work page 2019
-
[11]
Belkin, M., D. Hsu, and J. Xu (2020). Two models of double descent for weak features. SIAM Journal on Mathematics of Data Science\/ 2\/ (4), 1167--1180
work page 2020
-
[12]
Chao, J. C. and N. R. Swanson (2022). Selecting the relevant variables for factor estimation in favar models. Available at SSRN 4308280\/
work page 2022
-
[13]
Chava, S., M. Gallmeyer, and H. Park (2015). Credit conditions and stock return predictability. Journal of Monetary Economics\/ 74 , 117--132
work page 2015
-
[14]
Chen, X., Y. H. Cho, Y. Dou, and B. Lev (2022). Predicting future earnings changes using machine learning and detailed financial data. Journal of Accounting Research\/ 60\/ (2), 467--515
work page 2022
-
[15]
Chen, Y., G. W. Eaton, and B. S. Paye (2018). Micro (structure) before macro? the predictive power of aggregate illiquidity for stock returns and economic activity. Journal of Financial Economics\/ 130\/ (1), 48--73
work page 2018
-
[16]
Chernozhukov, V., C. Hansen, and Y. Liao (2017). A lava attack on the recovery of sums of dense and sparse signals. The Annals of Statistics\/ 45\/ (1), 39--76
work page 2017
-
[17]
Chinot, G., M. L \"o ffler, and S. van de Geer (2022). On the robustness of minimum norm interpolators and regularized empirical risk minimizers. The Annals of Statistics\/ 50\/ (4), 2306--2333
work page 2022
-
[18]
Colacito, R., E. Ghysels, J. Meng, and W. Siwasarit (2016). Skewness in expected macro fundamentals and the predictability of equity returns: Evidence and theory. The Review of Financial Studies\/ 29\/ (8), 2069--2109
work page 2016
-
[19]
Connor, G. and R. A. Korajczyk (1988). Risk and return in an equilibrium apt: Application of a new test methodology. Journal of financial economics\/ 21\/ (2), 255--289
work page 1988
-
[20]
Didisheim, A., S. B. Ke, B. T. Kelly, and S. Malamud (2023). Complexity in factor pricing models. Technical report, National Bureau of Economic Research
work page 2023
-
[21]
Fairfield, P. M., R. J. Sweeney, and T. L. Yohn (1996). Accounting classification and the predictive content of earnings. Accounting Review\/ , 337--355
work page 1996
- [22]
-
[23]
Fan, J., Z. T. Ke, Y. Liao, and A. Neuhierl (2022). Structural deep learning in conditional asset pricing. Available at SSRN 4117882\/
work page 2022
-
[24]
Fan, J., Y. Liao, and M. Mincheva (2013). Large covariance estimation by thresholding principal orthogonal complements (with discussion). Journal of the Royal Statistical Society, Series B\/ 75 , 603--680
work page 2013
-
[25]
Feltham, G. A. and J. A. Ohlson (1995). Valuation and clean surplus accounting for operating and financial activities. Contemporary accounting research\/ 11\/ (2), 689--731
work page 1995
- [26]
-
[27]
Giannone, D., M. Lenza, and G. E. Primiceri (2021). Economic predictions with big data: The illusion of sparsity. Econometrica\/ 89\/ (5), 2409--2437
work page 2021
-
[28]
Giglio, S., D. Xiu, and D. Zhang (2023). Prediction when factors are weak. University of Chicago, Becker Friedman Institute for Economics Working Paper\/ (2023-47)
work page 2023
-
[29]
Goyal, A., I. Welch, and A. Zafirov (2023). A comprehensive 2021 look at the empirical performance of equity premium prediction ii. Swiss Finance Institute Research Paper\/ (21-85)
work page 2023
-
[30]
Gu, S., B. Kelly, and D. Xiu (2020). Empirical asset pricing via machine learning. The Review of Financial Studies\/ 33\/ (5), 2223--2273
work page 2020
-
[31]
Hansen, C. and Y. Liao (2018). The factor-lasso and k-step bootstrap approach for inference in high-dimensional economic applications. Econometric Theory\/ , 1--45
work page 2018
-
[32]
Hastie, T., A. Montanari, S. Rosset, and R. J. Tibshirani (2022). Surprises in high-dimensional ridgeless least squares interpolation. Annals of statistics\/ 50\/ (2), 949
work page 2022
-
[33]
He, Y. (2023). Ridge regression under dense factor augmented models. Journal of the American Statistical Association\/ , 1--13
work page 2023
-
[34]
Hirshleifer, D., K. Hou, and S. H. Teoh (2009). Accruals, cash flows, and aggregate stock returns. Journal of Financial Economics\/ 91\/ (3), 389--406
work page 2009
- [35]
-
[36]
Jondeau, E., Q. Zhang, and X. Zhu (2019). Average skewness matters. Journal of Financial Economics\/ 134\/ (1), 29--47
work page 2019
-
[37]
Jones, C. S. and S. Tuzel (2013). New orders and asset prices. The Review of Financial Studies\/ 26\/ (1), 115--157
work page 2013
-
[38]
Kelly, B. and S. Pruitt (2013). Market expectations in the cross-section of present values. The Journal of Finance\/ 68\/ (5), 1721--1756
work page 2013
-
[39]
Kelly, B. T., S. Malamud, and K. Zhou (2022). The virtue of complexity in return prediction. Technical report, National Bureau of Economic Research
work page 2022
- [40]
-
[41]
Marchenko, V. A. and L. A. Pastur (1967). Distribution of eigenvalues for some sets of random matrices. Matematicheskii Sbornik\/ 114\/ (4), 507--536
work page 1967
-
[42]
Martin, I. (2017). What is the expected return on the market? The Quarterly Journal of Economics\/ 132\/ (1), 367--433
work page 2017
-
[43]
McCracken, M. W. and S. Ng (2016). Fred-md: A monthly database for macroeconomic research. Journal of Business & Economic Statistics\/ 34\/ (4), 574--589
work page 2016
-
[44]
Mei, S. and A. Montanari (2019). The generalization error of random features regression: Precise asymptotics and the double descent curve. Communications on Pure and Applied Mathematics\/
work page 2019
-
[45]
M ller, S. V. and J. Rangvid (2015). End-of-the-year economic growth and time-varying expected returns. Journal of Financial Economics\/ 115\/ (1), 136--154
work page 2015
-
[46]
Ng, S. (2013). Variable selection in predictive regressions. Handbook of economic forecasting\/ 2 , 752--789
work page 2013
-
[47]
Nissim, D. and S. H. Penman (2001). Ratio analysis and equity valuation: From research to practice. Review of accounting studies\/ 6 , 109--154
work page 2001
-
[48]
Ohlson, J. A. (1995). Earnings, book values, and dividends in equity valuation. Contemporary accounting research\/ 11\/ (2), 661--687
work page 1995
-
[49]
Penman, S. H. (1998). A synthesis of equity valuation techniques and the terminal value calculation for the dividend discount model. Review of accounting studies\/ 2 , 303--323
work page 1998
-
[50]
Penman, S. H. and T. Sougiannis (1998). A comparison of dividend, cash flow, and earnings approaches to equity valuation. Contemporary accounting research\/ 15\/ (3), 343--383
work page 1998
-
[51]
Rapach, D. E., M. C. Ringgenberg, and G. Zhou (2016). Short interest and aggregate stock returns. Journal of Financial Economics\/ 121\/ (1), 46--65
work page 2016
-
[52]
So, E. C. (2013). A new approach to predicting analyst forecast errors: Do investors overweight analyst forecasts? Journal of Financial Economics\/ 108\/ (3), 615--640
work page 2013
-
[53]
Spiess, J., G. Imbens, and A. Venugopal (2023). Double and single descent in causal inference with an application to high-dimensional synthetic control. arXiv preprint arXiv:2305.00700\/
-
[54]
Stock, J. and M. Watson (2002). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association\/ 97 , 1167--1179
work page 2002
-
[55]
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 58\/ (1), 267--288
work page 1996
-
[56]
Welch, I. and A. Goyal (2008). A comprehensive look at the empirical performance of equity premium prediction. The Review of Financial Studies\/ 21\/ (4), 1455--1508
work page 2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.