Benign Overfitting in Economic Forecasting via Noise Regularization

Andreas Neuhierl; Xinjie Ma; Yuan Liao; Zhentao Shi

arxiv: 2312.05593 · v3 · submitted 2023-12-09 · 💰 econ.EM · stat.ME

Benign Overfitting in Economic Forecasting via Noise Regularization

Yuan Liao , Xinjie Ma , Andreas Neuhierl , Zhentao Shi This is my paper

Pith reviewed 2026-05-24 04:48 UTC · model grok-4.3

classification 💰 econ.EM stat.ME

keywords benign overfittingnoise regularizationeconomic forecastingridgeless regressionlatent factorshigh-dimensional predictorsdense linear modelsforecast accuracy

0 comments

The pith

A ridgeless regression augmented with noise predictors matches the asymptotic forecast accuracy of an oracle that knows the true factors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that when both the target variable and high-dimensional predictors are generated by a small number of latent factors, the best linear forecast is a dense model rather than a sparse one. Adding predictors that contain only noise regularizes a ridgeless least-squares estimator by shrinking the eigenvalues of the sample Gram matrix, which lowers out-of-sample variance. This approach reaches the same limiting mean-squared forecast error as an oracle that knows the factors exactly, without ever estimating the factors or requiring them to be strong. In contrast, removing the noise variables through perfect selection can increase forecast error when the number of retained predictors is comparable to sample size. The result is shown both theoretically under the factor structure and empirically in U.S. inflation, international GDP growth, and equity risk premium series.

Core claim

When the outcome and the high-dimensional predictors share a low-dimensional factor structure, the population best linear predictor is dense. A ridgeless regression that deliberately augments the predictor matrix with pure noise variables attains the same asymptotic out-of-sample mean squared error as an oracle regression on the true factors. The mechanism is eigenvalue shrinkage of the design matrix, which reduces the variance term in the forecast error decomposition without any factor estimation or strong-factor assumption.

What carries the argument

ridgeless regression augmented with noise predictors, which shrinks the eigenvalues of the Gram matrix and thereby controls out-of-sample variance

If this is right

Forecasts achieve oracle accuracy without estimating or even identifying the latent factors.
Perfect variable selection that discards noise variables can increase forecast error when the retained dimension is close to sample size.
The same noise-augmented procedure improves and stabilizes predictions for U.S. inflation, international GDP growth, and the equity risk premium.
The gain is produced by a reduction in the variance component of forecast error rather than by bias reduction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same regularization may be useful in other high-dimensional economic series that exhibit approximate factor structure.
It offers a simple alternative to explicit factor extraction or penalized sparse methods when the goal is pure forecasting.
The finding raises the question of how much deliberate noise is optimal when the factor dimension is unknown.

Load-bearing premise

Both the outcome variable and the high-dimensional predictors are generated by a small number of latent factors, which forces the linear forecast model to be dense.

What would settle it

A Monte Carlo design in which the true factors are known and the mean squared forecast error of the noise-augmented ridgeless estimator is strictly larger than that of the oracle factor regression for large samples.

Figures

Figures reproduced from arXiv: 2312.05593 by Andreas Neuhierl, Xinjie Ma, Yuan Liao, Zhentao Shi.

**Figure 1.** Figure 1: Theoretical predictive variance and squared bias (left panel) and MSE (right panel), averaged over 500 replications. The horizontal axis is the number of predictors increasing from 3 to 500, and we fix n = 100. The first p0 = min{p, 0.9n} are informative predictors, generated using a 3-factor model of strong factors. The remaining p − p0 are i.i.d. Gaussian noises. The vertical dashed line is where p equal… view at source ↗

**Figure 2.** Figure 2: Predictive MSE P50 j=1(yj − ybj) 2 averaged from 50 replications as the number of predictors p increases. The vertical red dashed line indicates the number of informative predictors p0; the black dashed line indicates the sample size n = 100 [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗

**Figure 3.** Figure 3: Predictive MSE P50 j=1(yj − ybj) 2 averaged from 50 replications as the number of predictors p increases. The vertical red dashed line indicates the number of informative predictors p0; the vertical black dashed line indicates the sample size n = 100. The vertical blue dashed line in the last panel indicates the averaged p chosen by the cross validation [PITH_FULL_IMAGE:figures/full_fig_p026_3.png] view at source ↗

**Figure 4.** Figure 4: plots the predictive MSE of the pseudo-OLS, CV-Ridge and CV-Lasso. For the pseudo-OLS, we set the maximum value for p as pmax = C×n √p0, and choose C using cross-validation in a range so that pmax varies from 2,000 to 15,037 (equals 0.5 × n √p0). The result shows that the MSE of pseudo-OLS starts to decrease when the total number of predictors are over 1250, and surpasses that of Ridge and Lasso when p = 2… view at source ↗

**Figure 5.** Figure 5: Out-of-sample R2 for predicting the U.S. equity premium, using the dataset described by Welch and Goyal (2008), and updated on the webpage by Amit Goyal. The yearly data spans from 1948 to 2015, with p0 = 16 original predictors. We use rolling windows of n = 17 year for one-year horizon forecast. The vertical axis is OOS R2 . The horizontal axis is plotted as log(p), and ticked using p. Regardless of p, bo… view at source ↗

**Figure 6.** Figure 6: Predictive MSE using 123 Macroeconomic data from McCracken and Ng (2016). Data spans from 1960-May to 2019-December with p0 = 123 predictors. We use rolling windows of n = 120 months for one-month horizon forecast. The vertical axis is P n (yn+1 − ybn+1) 2 , the horizontal axis is log(p), and the horizontal tick is p. Regardless of p, the PCA, CV-Lasso and CV-Ridge use the p0 macrovariables, whereas the ps… view at source ↗

**Figure 7.** Figure 7: Predictive MSE using 60 socio-economic and geographical characteristics from Barro and Lee (1994). Data for the growth rate of GDP from 90 countries. We estimate the model on a randomly selected sample of n = 45 countries, evaluating its predictions for the remaining 45 countries. We repeat this exercise 100 times. The vertical axis is P n (yn+1 − ybn+1) 2 , the horizontal axis is log(p), and the horizonta… view at source ↗

**Figure 8.** Figure 8: Predictive MSE P50 j=1(yj − ybj ) 2 averaged from 10 replications as the number of predictors p increases. The number of informative predictors p0 = 0.5p. 35 [PITH_FULL_IMAGE:figures/full_fig_p035_8.png] view at source ↗

read the original abstract

This paper studies linear overparameterized models in economic forecasting and highlights that including noise variables (regressors with no predictive power) regularizes the estimator. We consider a setting where both the outcome variable and the high-dimensional predictors are driven by a small number of latent factors, and show that the linear forecast model is dense rather than sparse. It turns out that a ridgeless regression augmented with noise predictors attains the same asymptotic forecast accuracy as an oracle with known true factors, without estimating the factors or assuming them to be strong. The gain comes from shrinkage of the eigenvalues of the design matrix, which reduces the out-of-sample variance. In contrast, perfect variable selection that removes noise variables can worsen forecasts when the number of retained predictors is comparable to the sample size. Empirically, we apply this approach to forecasting U.S. inflation, international GDP growth, and the U.S. equity risk premium, finding that noise regularization improves and stabilizes predictive performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Noise regularization matches oracle factor accuracy in dense linear projections but the density condition is the fragile part.

read the letter

The central claim is that when both the target and the high-dimensional predictors follow a low-dimensional latent factor structure, adding pure noise variables to ridgeless regression produces the same asymptotic out-of-sample risk as an oracle that observes the factors directly. The mechanism is eigenvalue shrinkage of the design matrix that lowers variance. This holds without estimating the factors and without requiring them to be strong. The paper also notes that the population projection is dense under this setup, so perfect selection of the observed predictors can increase risk when the number kept is close to sample size. Empirically they apply the method to U.S. inflation, international GDP growth, and the equity risk premium and report gains in accuracy and stability. That combination of a clean theoretical equivalence and direct forecasting applications is the useful part. The result is not just a restatement of existing benign overfitting work; it is tied specifically to the factor-driven dense case common in economic data. The main soft spot is exactly the one flagged in the stress test. The equivalence relies on every observed predictor having a nonzero population coefficient in the projection onto X. If loadings are heterogeneous or factors are weak enough that some columns of X carry no signal, the model becomes sparse and the noise-augmentation benefit need not deliver the oracle rate. The abstract states the density result but supplies no explicit rate conditions on factor strength or the number of noise variables that would keep the claim intact under weaker factors. Without those conditions or robustness checks, the practical scope is narrower than the headline suggests. The empirical section is described at summary level only, so the magnitude and reliability of the reported improvements are hard to judge from the abstract. This is for readers working on high-dimensional forecasting in macro and finance who already use linear methods and want a regularization route that sidesteps factor estimation. It is worth sending to referees because the theoretical claim is specific, the applications are standard, and the density assumption can be checked or relaxed in revision.

Referee Report

2 major / 1 minor

Summary. The paper claims that in economic forecasting settings where both the outcome y and high-dimensional predictors X are driven by a small number of latent factors (making the population linear projection dense rather than sparse), a ridgeless regression augmented with noise predictors (regressors with no predictive power) attains the same asymptotic out-of-sample forecast accuracy as an oracle that knows the true factors. The mechanism is eigenvalue shrinkage of the design matrix that reduces variance; this is contrasted with perfect variable selection, which can worsen performance when the number of retained predictors is comparable to sample size. The result is supported by theory under the factor model and by empirical applications to U.S. inflation, international GDP growth, and the U.S. equity risk premium.

Significance. If the central asymptotic equivalence holds, the paper supplies a practical regularization device for high-dimensional economic forecasting that avoids explicit factor estimation and does not require strong-factor assumptions. It explicitly credits the theoretical equivalence result and the empirical finding that noise augmentation improves and stabilizes predictive performance relative to selection-based alternatives.

major comments (2)

[Abstract and §3] Abstract and §3 (theoretical setup): the oracle-equivalence claim is load-bearing on the population coefficient vector β being dense under X = ΛF + e and y = γ'F + u. The manuscript states this density result but supplies no explicit rate conditions on the number of added noise variables relative to n, p, or factor strength that would keep the equivalence intact when factors are weak or loadings heterogeneous; without such conditions the eigenvalue-shrinkage benefit need not dominate selection-based alternatives.
[Empirical applications] Empirical applications (forecasting tables for inflation, GDP, and equity premium): the reported gains in accuracy and stability are presented without accompanying standard errors, confidence bands, or robustness checks to the exact count of noise variables, which is required to assess whether the finite-sample improvements are statistically distinguishable from the oracle benchmark.

minor comments (1)

[Notation and estimator definition] The definition of the ridgeless estimator after noise augmentation would benefit from an explicit equation (e.g., the augmented design matrix and the resulting β̂) placed in the main text rather than only in an appendix.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, clarifying the theoretical scope and committing to empirical enhancements where appropriate.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (theoretical setup): the oracle-equivalence claim is load-bearing on the population coefficient vector β being dense under X = ΛF + e and y = γ'F + u. The manuscript states this density result but supplies no explicit rate conditions on the number of added noise variables relative to n, p, or factor strength that would keep the equivalence intact when factors are weak or loadings heterogeneous; without such conditions the eigenvalue-shrinkage benefit need not dominate selection-based alternatives.

Authors: The density of β follows immediately from the factor model assumptions in Section 3 (Assumptions 1–3), which allow weak factors and heterogeneous loadings without requiring strong-factor conditions. Theorems 1–2 derive the asymptotic equivalence by showing that noise augmentation induces eigenvalue shrinkage that matches the oracle variance term, and the proofs hold under the stated rates on p/n and the factor structure; no additional rate restrictions on the number of noise variables are needed beyond those already implicit in the high-dimensional regime. We will add a clarifying paragraph in §3.2 explicitly noting that the equivalence continues to hold for weak factors provided the loadings satisfy the moment conditions in Assumption 2, thereby addressing the concern about dominance over selection methods. revision: partial
Referee: [Empirical applications] Empirical applications (forecasting tables for inflation, GDP, and equity premium): the reported gains in accuracy and stability are presented without accompanying standard errors, confidence bands, or robustness checks to the exact count of noise variables, which is required to assess whether the finite-sample improvements are statistically distinguishable from the oracle benchmark.

Authors: We agree that standard errors and robustness checks would strengthen the empirical section. In the revision we will (i) report bootstrap standard errors for the out-of-sample R² and MSFE differences relative to the oracle benchmark, (ii) add a new table (or appendix figure) showing results for a range of noise-variable counts around the values used in the main tables, and (iii) include Diebold–Mariano tests where feasible. These additions will allow readers to assess statistical distinguishability. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is model-based asymptotic analysis

full rationale

The paper derives density of the population projection coefficients from the shared latent factor structure (X = ΛF + e, y = γ'F + u) and shows asymptotic equivalence of ridgeless regression plus noise to the oracle that uses F directly. These steps are explicit mathematical results under the maintained assumptions rather than reductions by construction, fitted-parameter renamings, or load-bearing self-citations. The oracle benchmark is internal to the factor model but is not tautological; the equivalence is obtained via eigenvalue shrinkage arguments that are independent of the target risk quantity. No self-citation chains or ansatz smuggling are indicated in the provided text. The density claim follows directly from the factor loadings without redefining the target quantity in terms of itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that both outcome and predictors are generated by a small number of latent factors; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption both the outcome variable and the high-dimensional predictors are driven by a small number of latent factors
Explicitly stated as the setting considered in the abstract.

pith-pipeline@v0.9.0 · 5700 in / 1165 out tokens · 22018 ms · 2026-05-24T04:48:48.856559+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages

[1]

Cohen, W

Arora, S., N. Cohen, W. Hu, and Y. Luo (2019). Implicit regularization in deep matrix factorization. Advances in Neural Information Processing Systems\/ 32 , 7413--7424

work page 2019
[2]

Atanasov, V., S. V. M ller, and R. Priestley (2020). Consumption fluctuations and expected returns. The Journal of Finance\/ 75\/ (3), 1677--1713

work page 2020
[3]

Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica\/ 71 , 135--171

work page 2003
[4]

Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Econometrica\/ 70 , 191--221

work page 2002
[5]

Bai, J. and S. Ng (2006). Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions. Econometrica\/ 74\/ (4), 1133--1150

work page 2006
[6]

Bai, Z. and Y. Yin (1993). Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. The Annals of Probability\/ 21\/ (3), 1275--1294

work page 1993
[7]

Ball, R. and V. V. Nikolaev (2022). On earnings and cash flows as predictors of future cash flows. Journal of Accounting and Economics\/ 73\/ (1), 101430

work page 2022
[8]

Barro, R. J. and J.-W. Lee (1994). Sources of economic growth. In Carnegie-Rochester conference series on public policy , Volume 40, pp.\ 1--46. Elsevier

work page 1994
[9]

Bekaert, G. and M. Hoerova (2014). The vix, the variance premium and stock market volatility. Journal of econometrics\/ 183\/ (2), 181--192

work page 2014
[10]

Belkin, M., D. Hsu, S. Ma, and S. Mandal (2019). Reconciling modern machine-learning practice and the classical bias--variance trade-off. Proceedings of the National Academy of Sciences\/ 116\/ (32), 15849--15854

work page 2019
[11]

Hsu, and J

Belkin, M., D. Hsu, and J. Xu (2020). Two models of double descent for weak features. SIAM Journal on Mathematics of Data Science\/ 2\/ (4), 1167--1180

work page 2020
[12]

Chao, J. C. and N. R. Swanson (2022). Selecting the relevant variables for factor estimation in favar models. Available at SSRN 4308280\/

work page 2022
[13]

Gallmeyer, and H

Chava, S., M. Gallmeyer, and H. Park (2015). Credit conditions and stock return predictability. Journal of Monetary Economics\/ 74 , 117--132

work page 2015
[14]

Chen, X., Y. H. Cho, Y. Dou, and B. Lev (2022). Predicting future earnings changes using machine learning and detailed financial data. Journal of Accounting Research\/ 60\/ (2), 467--515

work page 2022
[15]

Chen, Y., G. W. Eaton, and B. S. Paye (2018). Micro (structure) before macro? the predictive power of aggregate illiquidity for stock returns and economic activity. Journal of Financial Economics\/ 130\/ (1), 48--73

work page 2018
[16]

Hansen, and Y

Chernozhukov, V., C. Hansen, and Y. Liao (2017). A lava attack on the recovery of sums of dense and sparse signals. The Annals of Statistics\/ 45\/ (1), 39--76

work page 2017
[17]

L \"o ffler, and S

Chinot, G., M. L \"o ffler, and S. van de Geer (2022). On the robustness of minimum norm interpolators and regularized empirical risk minimizers. The Annals of Statistics\/ 50\/ (4), 2306--2333

work page 2022
[18]

Ghysels, J

Colacito, R., E. Ghysels, J. Meng, and W. Siwasarit (2016). Skewness in expected macro fundamentals and the predictability of equity returns: Evidence and theory. The Review of Financial Studies\/ 29\/ (8), 2069--2109

work page 2016
[19]

Connor, G. and R. A. Korajczyk (1988). Risk and return in an equilibrium apt: Application of a new test methodology. Journal of financial economics\/ 21\/ (2), 255--289

work page 1988
[20]

Didisheim, A., S. B. Ke, B. T. Kelly, and S. Malamud (2023). Complexity in factor pricing models. Technical report, National Bureau of Economic Research

work page 2023
[21]

Fairfield, P. M., R. J. Sweeney, and T. L. Yohn (1996). Accounting classification and the predictive content of earnings. Accounting Review\/ , 337--355

work page 1996
[22]

Ke, and K

Fan, J., Y. Ke, and K. Wang (2020). Factor-adjusted regularized model selection. Journal of Econometrics\/ 216\/ (1), 71--85

work page 2020
[23]

Fan, J., Z. T. Ke, Y. Liao, and A. Neuhierl (2022). Structural deep learning in conditional asset pricing. Available at SSRN 4117882\/

work page 2022
[24]

Liao, and M

Fan, J., Y. Liao, and M. Mincheva (2013). Large covariance estimation by thresholding principal orthogonal complements (with discussion). Journal of the Royal Statistical Society, Series B\/ 75 , 603--680

work page 2013
[25]

Feltham, G. A. and J. A. Ohlson (1995). Valuation and clean surplus accounting for operating and financial activities. Contemporary accounting research\/ 11\/ (2), 689--731

work page 1995
[26]

Hallin, M

Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2005). The generalized dynamic factor model: one-sided estimation and forecasting. Journal of the American Statistical Association\/ 100\/ (471), 830--840

work page 2005
[27]

Lenza, and G

Giannone, D., M. Lenza, and G. E. Primiceri (2021). Economic predictions with big data: The illusion of sparsity. Econometrica\/ 89\/ (5), 2409--2437

work page 2021
[28]

Xiu, and D

Giglio, S., D. Xiu, and D. Zhang (2023). Prediction when factors are weak. University of Chicago, Becker Friedman Institute for Economics Working Paper\/ (2023-47)

work page 2023
[29]

Welch, and A

Goyal, A., I. Welch, and A. Zafirov (2023). A comprehensive 2021 look at the empirical performance of equity premium prediction ii. Swiss Finance Institute Research Paper\/ (21-85)

work page 2023
[30]

Kelly, and D

Gu, S., B. Kelly, and D. Xiu (2020). Empirical asset pricing via machine learning. The Review of Financial Studies\/ 33\/ (5), 2223--2273

work page 2020
[31]

Hansen, C. and Y. Liao (2018). The factor-lasso and k-step bootstrap approach for inference in high-dimensional economic applications. Econometric Theory\/ , 1--45

work page 2018
[32]

Montanari, S

Hastie, T., A. Montanari, S. Rosset, and R. J. Tibshirani (2022). Surprises in high-dimensional ridgeless least squares interpolation. Annals of statistics\/ 50\/ (2), 949

work page 2022
[33]

He, Y. (2023). Ridge regression under dense factor augmented models. Journal of the American Statistical Association\/ , 1--13

work page 2023
[34]

Hou, and S

Hirshleifer, D., K. Hou, and S. H. Teoh (2009). Accruals, cash flows, and aggregate stock returns. Journal of Financial Economics\/ 91\/ (3), 389--406

work page 2009
[35]

Jiang, J

Huang, D., F. Jiang, J. Tu, and G. Zhou (2015). Investor sentiment aligned: A powerful predictor of stock returns. The Review of Financial Studies\/ 28\/ (3), 791--837

work page 2015
[36]

Zhang, and X

Jondeau, E., Q. Zhang, and X. Zhu (2019). Average skewness matters. Journal of Financial Economics\/ 134\/ (1), 29--47

work page 2019
[37]

Jones, C. S. and S. Tuzel (2013). New orders and asset prices. The Review of Financial Studies\/ 26\/ (1), 115--157

work page 2013
[38]

Kelly, B. and S. Pruitt (2013). Market expectations in the cross-section of present values. The Journal of Finance\/ 68\/ (5), 1721--1756

work page 2013
[39]

Kelly, B. T., S. Malamud, and K. Zhou (2022). The virtue of complexity in return prediction. Technical report, National Bureau of Economic Research

work page 2022
[40]

Lee, S. and S. Lee (2023). The mean squared error of the ridgeless least squares estimator under general assumptions on regression errors. arXiv preprint arXiv:2305.12883\/

work page arXiv 2023
[41]

Marchenko, V. A. and L. A. Pastur (1967). Distribution of eigenvalues for some sets of random matrices. Matematicheskii Sbornik\/ 114\/ (4), 507--536

work page 1967
[42]

Martin, I. (2017). What is the expected return on the market? The Quarterly Journal of Economics\/ 132\/ (1), 367--433

work page 2017
[43]

McCracken, M. W. and S. Ng (2016). Fred-md: A monthly database for macroeconomic research. Journal of Business & Economic Statistics\/ 34\/ (4), 574--589

work page 2016
[44]

Mei, S. and A. Montanari (2019). The generalization error of random features regression: Precise asymptotics and the double descent curve. Communications on Pure and Applied Mathematics\/

work page 2019
[45]

M ller, S. V. and J. Rangvid (2015). End-of-the-year economic growth and time-varying expected returns. Journal of Financial Economics\/ 115\/ (1), 136--154

work page 2015
[46]

Ng, S. (2013). Variable selection in predictive regressions. Handbook of economic forecasting\/ 2 , 752--789

work page 2013
[47]

Nissim, D. and S. H. Penman (2001). Ratio analysis and equity valuation: From research to practice. Review of accounting studies\/ 6 , 109--154

work page 2001
[48]

Ohlson, J. A. (1995). Earnings, book values, and dividends in equity valuation. Contemporary accounting research\/ 11\/ (2), 661--687

work page 1995
[49]

Penman, S. H. (1998). A synthesis of equity valuation techniques and the terminal value calculation for the dividend discount model. Review of accounting studies\/ 2 , 303--323

work page 1998
[50]

Penman, S. H. and T. Sougiannis (1998). A comparison of dividend, cash flow, and earnings approaches to equity valuation. Contemporary accounting research\/ 15\/ (3), 343--383

work page 1998
[51]

Rapach, D. E., M. C. Ringgenberg, and G. Zhou (2016). Short interest and aggregate stock returns. Journal of Financial Economics\/ 121\/ (1), 46--65

work page 2016
[52]

So, E. C. (2013). A new approach to predicting analyst forecast errors: Do investors overweight analyst forecasts? Journal of Financial Economics\/ 108\/ (3), 615--640

work page 2013
[53]

Imbens, and A

Spiess, J., G. Imbens, and A. Venugopal (2023). Double and single descent in causal inference with an application to high-dimensional synthetic control. arXiv preprint arXiv:2305.00700\/

work page arXiv 2023
[54]

Stock, J. and M. Watson (2002). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association\/ 97 , 1167--1179

work page 2002
[55]

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 58\/ (1), 267--288

work page 1996
[56]

Welch, I. and A. Goyal (2008). A comprehensive look at the empirical performance of equity premium prediction. The Review of Financial Studies\/ 21\/ (4), 1455--1508

work page 2008

[1] [1]

Cohen, W

Arora, S., N. Cohen, W. Hu, and Y. Luo (2019). Implicit regularization in deep matrix factorization. Advances in Neural Information Processing Systems\/ 32 , 7413--7424

work page 2019

[2] [2]

Atanasov, V., S. V. M ller, and R. Priestley (2020). Consumption fluctuations and expected returns. The Journal of Finance\/ 75\/ (3), 1677--1713

work page 2020

[3] [3]

Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica\/ 71 , 135--171

work page 2003

[4] [4]

Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Econometrica\/ 70 , 191--221

work page 2002

[5] [5]

Bai, J. and S. Ng (2006). Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions. Econometrica\/ 74\/ (4), 1133--1150

work page 2006

[6] [6]

Bai, Z. and Y. Yin (1993). Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. The Annals of Probability\/ 21\/ (3), 1275--1294

work page 1993

[7] [7]

Ball, R. and V. V. Nikolaev (2022). On earnings and cash flows as predictors of future cash flows. Journal of Accounting and Economics\/ 73\/ (1), 101430

work page 2022

[8] [8]

Barro, R. J. and J.-W. Lee (1994). Sources of economic growth. In Carnegie-Rochester conference series on public policy , Volume 40, pp.\ 1--46. Elsevier

work page 1994

[9] [9]

Bekaert, G. and M. Hoerova (2014). The vix, the variance premium and stock market volatility. Journal of econometrics\/ 183\/ (2), 181--192

work page 2014

[10] [10]

Belkin, M., D. Hsu, S. Ma, and S. Mandal (2019). Reconciling modern machine-learning practice and the classical bias--variance trade-off. Proceedings of the National Academy of Sciences\/ 116\/ (32), 15849--15854

work page 2019

[11] [11]

Hsu, and J

Belkin, M., D. Hsu, and J. Xu (2020). Two models of double descent for weak features. SIAM Journal on Mathematics of Data Science\/ 2\/ (4), 1167--1180

work page 2020

[12] [12]

Chao, J. C. and N. R. Swanson (2022). Selecting the relevant variables for factor estimation in favar models. Available at SSRN 4308280\/

work page 2022

[13] [13]

Gallmeyer, and H

Chava, S., M. Gallmeyer, and H. Park (2015). Credit conditions and stock return predictability. Journal of Monetary Economics\/ 74 , 117--132

work page 2015

[14] [14]

Chen, X., Y. H. Cho, Y. Dou, and B. Lev (2022). Predicting future earnings changes using machine learning and detailed financial data. Journal of Accounting Research\/ 60\/ (2), 467--515

work page 2022

[15] [15]

Chen, Y., G. W. Eaton, and B. S. Paye (2018). Micro (structure) before macro? the predictive power of aggregate illiquidity for stock returns and economic activity. Journal of Financial Economics\/ 130\/ (1), 48--73

work page 2018

[16] [16]

Hansen, and Y

Chernozhukov, V., C. Hansen, and Y. Liao (2017). A lava attack on the recovery of sums of dense and sparse signals. The Annals of Statistics\/ 45\/ (1), 39--76

work page 2017

[17] [17]

L \"o ffler, and S

Chinot, G., M. L \"o ffler, and S. van de Geer (2022). On the robustness of minimum norm interpolators and regularized empirical risk minimizers. The Annals of Statistics\/ 50\/ (4), 2306--2333

work page 2022

[18] [18]

Ghysels, J

Colacito, R., E. Ghysels, J. Meng, and W. Siwasarit (2016). Skewness in expected macro fundamentals and the predictability of equity returns: Evidence and theory. The Review of Financial Studies\/ 29\/ (8), 2069--2109

work page 2016

[19] [19]

Connor, G. and R. A. Korajczyk (1988). Risk and return in an equilibrium apt: Application of a new test methodology. Journal of financial economics\/ 21\/ (2), 255--289

work page 1988

[20] [20]

Didisheim, A., S. B. Ke, B. T. Kelly, and S. Malamud (2023). Complexity in factor pricing models. Technical report, National Bureau of Economic Research

work page 2023

[21] [21]

Fairfield, P. M., R. J. Sweeney, and T. L. Yohn (1996). Accounting classification and the predictive content of earnings. Accounting Review\/ , 337--355

work page 1996

[22] [22]

Ke, and K

Fan, J., Y. Ke, and K. Wang (2020). Factor-adjusted regularized model selection. Journal of Econometrics\/ 216\/ (1), 71--85

work page 2020

[23] [23]

Fan, J., Z. T. Ke, Y. Liao, and A. Neuhierl (2022). Structural deep learning in conditional asset pricing. Available at SSRN 4117882\/

work page 2022

[24] [24]

Liao, and M

Fan, J., Y. Liao, and M. Mincheva (2013). Large covariance estimation by thresholding principal orthogonal complements (with discussion). Journal of the Royal Statistical Society, Series B\/ 75 , 603--680

work page 2013

[25] [25]

Feltham, G. A. and J. A. Ohlson (1995). Valuation and clean surplus accounting for operating and financial activities. Contemporary accounting research\/ 11\/ (2), 689--731

work page 1995

[26] [26]

Hallin, M

Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2005). The generalized dynamic factor model: one-sided estimation and forecasting. Journal of the American Statistical Association\/ 100\/ (471), 830--840

work page 2005

[27] [27]

Lenza, and G

Giannone, D., M. Lenza, and G. E. Primiceri (2021). Economic predictions with big data: The illusion of sparsity. Econometrica\/ 89\/ (5), 2409--2437

work page 2021

[28] [28]

Xiu, and D

Giglio, S., D. Xiu, and D. Zhang (2023). Prediction when factors are weak. University of Chicago, Becker Friedman Institute for Economics Working Paper\/ (2023-47)

work page 2023

[29] [29]

Welch, and A

Goyal, A., I. Welch, and A. Zafirov (2023). A comprehensive 2021 look at the empirical performance of equity premium prediction ii. Swiss Finance Institute Research Paper\/ (21-85)

work page 2023

[30] [30]

Kelly, and D

Gu, S., B. Kelly, and D. Xiu (2020). Empirical asset pricing via machine learning. The Review of Financial Studies\/ 33\/ (5), 2223--2273

work page 2020

[31] [31]

Hansen, C. and Y. Liao (2018). The factor-lasso and k-step bootstrap approach for inference in high-dimensional economic applications. Econometric Theory\/ , 1--45

work page 2018

[32] [32]

Montanari, S

Hastie, T., A. Montanari, S. Rosset, and R. J. Tibshirani (2022). Surprises in high-dimensional ridgeless least squares interpolation. Annals of statistics\/ 50\/ (2), 949

work page 2022

[33] [33]

He, Y. (2023). Ridge regression under dense factor augmented models. Journal of the American Statistical Association\/ , 1--13

work page 2023

[34] [34]

Hou, and S

Hirshleifer, D., K. Hou, and S. H. Teoh (2009). Accruals, cash flows, and aggregate stock returns. Journal of Financial Economics\/ 91\/ (3), 389--406

work page 2009

[35] [35]

Jiang, J

Huang, D., F. Jiang, J. Tu, and G. Zhou (2015). Investor sentiment aligned: A powerful predictor of stock returns. The Review of Financial Studies\/ 28\/ (3), 791--837

work page 2015

[36] [36]

Zhang, and X

Jondeau, E., Q. Zhang, and X. Zhu (2019). Average skewness matters. Journal of Financial Economics\/ 134\/ (1), 29--47

work page 2019

[37] [37]

Jones, C. S. and S. Tuzel (2013). New orders and asset prices. The Review of Financial Studies\/ 26\/ (1), 115--157

work page 2013

[38] [38]

Kelly, B. and S. Pruitt (2013). Market expectations in the cross-section of present values. The Journal of Finance\/ 68\/ (5), 1721--1756

work page 2013

[39] [39]

Kelly, B. T., S. Malamud, and K. Zhou (2022). The virtue of complexity in return prediction. Technical report, National Bureau of Economic Research

work page 2022

[40] [40]

Lee, S. and S. Lee (2023). The mean squared error of the ridgeless least squares estimator under general assumptions on regression errors. arXiv preprint arXiv:2305.12883\/

work page arXiv 2023

[41] [41]

Marchenko, V. A. and L. A. Pastur (1967). Distribution of eigenvalues for some sets of random matrices. Matematicheskii Sbornik\/ 114\/ (4), 507--536

work page 1967

[42] [42]

Martin, I. (2017). What is the expected return on the market? The Quarterly Journal of Economics\/ 132\/ (1), 367--433

work page 2017

[43] [43]

McCracken, M. W. and S. Ng (2016). Fred-md: A monthly database for macroeconomic research. Journal of Business & Economic Statistics\/ 34\/ (4), 574--589

work page 2016

[44] [44]

Mei, S. and A. Montanari (2019). The generalization error of random features regression: Precise asymptotics and the double descent curve. Communications on Pure and Applied Mathematics\/

work page 2019

[45] [45]

M ller, S. V. and J. Rangvid (2015). End-of-the-year economic growth and time-varying expected returns. Journal of Financial Economics\/ 115\/ (1), 136--154

work page 2015

[46] [46]

Ng, S. (2013). Variable selection in predictive regressions. Handbook of economic forecasting\/ 2 , 752--789

work page 2013

[47] [47]

Nissim, D. and S. H. Penman (2001). Ratio analysis and equity valuation: From research to practice. Review of accounting studies\/ 6 , 109--154

work page 2001

[48] [48]

Ohlson, J. A. (1995). Earnings, book values, and dividends in equity valuation. Contemporary accounting research\/ 11\/ (2), 661--687

work page 1995

[49] [49]

Penman, S. H. (1998). A synthesis of equity valuation techniques and the terminal value calculation for the dividend discount model. Review of accounting studies\/ 2 , 303--323

work page 1998

[50] [50]

Penman, S. H. and T. Sougiannis (1998). A comparison of dividend, cash flow, and earnings approaches to equity valuation. Contemporary accounting research\/ 15\/ (3), 343--383

work page 1998

[51] [51]

Rapach, D. E., M. C. Ringgenberg, and G. Zhou (2016). Short interest and aggregate stock returns. Journal of Financial Economics\/ 121\/ (1), 46--65

work page 2016

[52] [52]

So, E. C. (2013). A new approach to predicting analyst forecast errors: Do investors overweight analyst forecasts? Journal of Financial Economics\/ 108\/ (3), 615--640

work page 2013

[53] [53]

Imbens, and A

Spiess, J., G. Imbens, and A. Venugopal (2023). Double and single descent in causal inference with an application to high-dimensional synthetic control. arXiv preprint arXiv:2305.00700\/

work page arXiv 2023

[54] [54]

Stock, J. and M. Watson (2002). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association\/ 97 , 1167--1179

work page 2002

[55] [55]

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology\/ 58\/ (1), 267--288

work page 1996

[56] [56]

Welch, I. and A. Goyal (2008). A comprehensive look at the empirical performance of equity premium prediction. The Review of Financial Studies\/ 21\/ (4), 1455--1508

work page 2008