Overparametrized models with posterior drift
Pith reviewed 2026-05-19 07:32 UTC · model grok-4.3
The pith
Changes in data-generating loadings between training and testing reduce the accuracy of overparametrized models in financial forecasting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that posterior drift, defined as changes in the loadings of the data generating process between training and test samples, causes a loss in out-of-sample forecasting performance for overparametrized models. This is particularly relevant in financial settings with potential regime changes. In equity premium forecasting, market timing returns are sensitive to sub-period selection and to bandwidth choices that control model complexity, with smaller bandwidths leading to more variable outcomes over 15-year holding periods while larger ones provide consistency at the cost of lower risk-adjusted returns.
What carries the argument
Posterior drift, the change in data-generating loadings between training and testing periods, which undermines the model's ability to generalize in non-stationary environments like financial markets.
Load-bearing premise
The performance differences observed across sub-periods and bandwidths are mainly caused by changes in the underlying data relationships rather than by other issues like noise or data choices.
What would settle it
Re-running the equity premium forecasts on sub-periods where loadings are artificially held constant shows no performance drop or reduced sensitivity to bandwidth parameters.
read the original abstract
This paper investigates the impact of posterior drift on out-of-sample forecasting accuracy in overparametrized machine learning models. We document the loss in performance when the loadings of the data generating process change between the training and testing samples. This matters crucially in settings in which regime changes are likely to occur, for instance, in financial markets. Applied to equity premium forecasting, our results underline the sensitivity of a market timing strategy to sub-periods and to the bandwidth parameters that control the complexity of the model. For the average investor, we find that focusing on holding periods of 15 years can generate very heterogeneous returns, especially for small bandwidths. Large bandwidths yield much more consistent outcomes, but are far less appealing from a risk-adjusted return standpoint. All in all, our findings tend to recommend cautiousness when resorting to large linear models for stock market predictions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript investigates the impact of posterior drift on out-of-sample forecasting accuracy in overparametrized machine learning models. It documents performance losses when the loadings of the data-generating process change between training and testing samples, with an application to equity premium forecasting that highlights the sensitivity of market timing strategies to sub-period selection and to bandwidth parameters controlling model complexity. Results indicate that 15-year holding periods produce heterogeneous returns especially for small bandwidths, while large bandwidths yield more consistent outcomes but lower risk-adjusted returns, leading to a recommendation of caution when using large linear models for stock market predictions.
Significance. If the attribution of performance losses specifically to posterior drift can be isolated from other regime features, the work would provide useful evidence on the limitations of overparametrized models in non-stationary financial settings. It could inform practical choices around model complexity and holding periods for investors engaged in market timing, while underscoring the need for robustness checks in regime-shifting environments.
major comments (2)
- [Abstract and empirical results] Abstract and empirical application: the central claim that observed drops in market-timing performance across sub-periods are attributable to changes in the loadings of the data-generating process (posterior drift) is not supported by explicit controls or robustness tests that would separate this effect from changes in return volatility, finite-sample effects on the effective degrees of freedom of the estimator, or shifts in the relevance of equity-premium predictors. Without such separation, the documented sensitivities to sub-periods and bandwidth parameters may partly reflect other unmodeled factors rather than isolated posterior drift.
- [Empirical results] Results on holding periods and bandwidth: the statement that large bandwidths produce 'much more consistent outcomes' but are 'far less appealing from a risk-adjusted return standpoint' lacks reported quantitative metrics, statistical significance tests, or comparisons of Sharpe ratios or certainty-equivalent returns that would substantiate the trade-off and support the final recommendation of cautiousness.
minor comments (2)
- [Abstract] The abstract would benefit from including at least one concrete performance metric or effect size to illustrate the magnitude of the documented performance loss.
- [Methods] Notation and definitions for the bandwidth parameters and the overparametrized estimator should be introduced with greater precision and consistency in the methods section to aid readability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. These have prompted us to clarify the design of our simulations and to augment the empirical section with additional quantitative metrics. We address each major comment below and indicate the revisions made to the manuscript.
read point-by-point responses
-
Referee: [Abstract and empirical results] Abstract and empirical application: the central claim that observed drops in market-timing performance across sub-periods are attributable to changes in the loadings of the data-generating process (posterior drift) is not supported by explicit controls or robustness tests that would separate this effect from changes in return volatility, finite-sample effects on the effective degrees of freedom of the estimator, or shifts in the relevance of equity-premium predictors. Without such separation, the documented sensitivities to sub-periods and bandwidth parameters may partly reflect other unmodeled factors rather than isolated posterior drift.
Authors: We appreciate the referee highlighting the need for clearer isolation of posterior drift. Our Monte Carlo simulations are explicitly constructed to hold volatility, predictor relevance, and effective degrees of freedom fixed while varying only the loadings between training and test samples; this isolates the drift effect by design. In the empirical equity-premium application, sub-periods are selected to align with documented regime shifts in the literature, and the performance patterns across bandwidths match the theoretical predictions under posterior drift. To further address potential confounding, the revised manuscript adds robustness checks that adjust for volatility differences and examine predictor stability across sub-periods. These additions strengthen the attribution while preserving the original results. revision: yes
-
Referee: [Empirical results] Results on holding periods and bandwidth: the statement that large bandwidths produce 'much more consistent outcomes' but are 'far less appealing from a risk-adjusted return standpoint' lacks reported quantitative metrics, statistical significance tests, or comparisons of Sharpe ratios or certainty-equivalent returns that would substantiate the trade-off and support the final recommendation of cautiousness.
Authors: We agree that additional quantitative support would make the trade-off more transparent. The revised version now includes tables reporting Sharpe ratios, certainty-equivalent returns, and statistical tests for differences in performance across bandwidth parameters and holding periods. These metrics confirm that large bandwidths deliver more stable outcomes across sub-periods yet produce lower risk-adjusted returns, thereby providing firmer grounding for the recommendation of caution with large linear models. revision: yes
Circularity Check
Empirical analysis of posterior drift effects is self-contained with no circular derivation
full rationale
The paper documents observed performance losses in overparametrized models when data-generating process loadings shift between training and test samples, applied to equity premium forecasting. Claims center on empirical sensitivities to sub-periods and bandwidth parameters controlling model complexity. No equations, derivations, or results reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations. The analysis relies on standard out-of-sample evaluation practices and presents findings as documentation of sensitivities rather than first-principles results that loop back to the inputs.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
lim n,p→∞ Rm_X(β̂_is,β_oos,θ_oos) = (σ² + ∥θ_is∥²) c/(1−c) + ∥β_oos−β_is∥² + ∥θ_oos∥² (p/n→c<1)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
E[r^(s)_t+1(z)|X] → f(z;cϕ)⟨β_is,β_oos⟩
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.