A Nontrivial Upper Bound on the Out-of-Sample $R^2$ in Return Forecasting

Cheng Zhang

arxiv: 2602.07841 · v3 · submitted 2026-02-08 · 💰 econ.EM · q-fin.ST· stat.AP

A Nontrivial Upper Bound on the Out-of-Sample R² in Return Forecasting

Cheng Zhang This is my paper

Pith reviewed 2026-05-16 06:40 UTC · model grok-4.3

classification 💰 econ.EM q-fin.STstat.AP

keywords return forecastingout-of-sample R-squareddirectional accuracyupper boundoracle modelmean squared errorpredictive performance

0 comments

The pith

A coin-flip oracle model establishes a quadratic upper bound on the out-of-sample R-squared for return forecasts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper defines a coin-flip oracle that matches the directional accuracy of any given forecasting model but achieves lower mean squared error than real models can. The out-of-sample R-squared of this oracle turns out to be a simple quadratic function of the directional accuracy rate. Because the oracle is theoretically the best possible performer at any fixed accuracy level, its R-squared value acts as an upper limit that no actual model can exceed. Empirical checks across several forecasting setups confirm that existing models fall below or at this curve. The bound gives researchers a concrete way to judge whether a reported R-squared is realistically high.

Core claim

The study establishes that the R²_OOS of the coin-flip oracle model, whose analytical expression is a quadratic function of directional accuracy, serves as a tractable upper bound on the actual R²_OOS of practical return forecasting models.

What carries the argument

The coin-flip oracle model that outperforms practical models in mean squared error for any given directional accuracy.

If this is right

Practical models' out-of-sample R² cannot surpass the quadratic function evaluated at their directional accuracy.
The upper bound is independent of the specific predictor variables used.
This allows direct comparison of model performance against the theoretical maximum for their accuracy level.
Common predictive models in finance are shown to respect this bound in multiple scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers could use this bound to set realistic expectations for forecast improvements.
The approach might extend to other time-series forecasting domains beyond returns.
It highlights that directional accuracy alone does not determine R-squared; magnitude consistency also matters.
Testing the bound in non-financial prediction tasks could reveal similar limits.

Load-bearing premise

The coin-flip oracle model theoretically achieves lower mean squared error than any practical model that has the same directional accuracy.

What would settle it

Observing a practical forecasting model with out-of-sample R² higher than the quadratic value computed from its directional accuracy would falsify the claimed upper bound.

read the original abstract

This study establishes a nontrivial upper bound on the out-of-sample $R^2$ ($R^2_{\text{OOS}}$) in return forecasting. In particular, we define a coin-flip oracle model that, under the same directional accuracy, theoretically outperforms practical models in terms of MSE. The $R^2_{\text{OOS}}$ of the oracle model, whose analytical expression is a quadratic function of directional accuracy, can therefore serve as a tractable upper bound on the actual $R^2_{\text{OOS}}$. Empirical analyses across multiple forecasting scenarios reveal that the $R^2_{\text{OOS}}$ values of common predictive models are fundamentally bounded by this quadratic function.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Coin-flip oracle bound on return forecast R2 is new but hinges on unproven optimality

read the letter

The one or two things to know are that the paper introduces a coin-flip oracle to derive a quadratic upper bound on out-of-sample R2 for return forecasts as a function of directional accuracy, and it claims this bound is nontrivial and useful for benchmarking. If the derivation is correct, it addresses a real need in the literature for a practical ceiling on forecast performance. The new part is the specific oracle construction and the resulting closed-form quadratic expression. Prior work on return predictability has discussed limits from statistical or economic perspectives, but this approach using a minimal-MSE predictor for fixed sign accuracy seems fresh. The paper does well in making the bound tractable and then testing it empirically on common models like those based on dividend yield or technical indicators. Seeing that real-world R2 values sit comfortably below the bound across different assets and periods gives the result some grounding. Where it could be softer is the optimality of the oracle itself. The stress-test point is worth checking: does the coin-flip version truly achieve the lowest MSE for a given p, or can a predictor that adjusts its output magnitude based on whether the sign is correct do better while maintaining the same directional hit rate? If the latter is possible, then the bound might not be as tight as presented. The abstract states the oracle outperforms practical models, but the full algebra needs to show explicitly why alternatives cannot improve on it. This is not a minor detail since the whole upper bound rests on it being the minimal MSE case. The circularity is low because directional accuracy is observed from data, but confirming the math is key. This paper targets empirical finance researchers who work on asset return forecasting and model evaluation. A reader running out-of-sample tests or trying to interpret small R2 numbers would get value from having this benchmark in mind. It deserves a serious referee because the idea is coherent and the empirical application is straightforward, even if the proof requires close scrutiny. I would recommend sending it to peer review rather than desk rejecting. Referees can verify the derivation and suggest any needed clarifications on the oracle's properties.

Referee Report

2 major / 1 minor

Summary. The paper claims to establish a nontrivial upper bound on out-of-sample R² in return forecasting by defining a coin-flip oracle model that, for a given directional accuracy p, theoretically achieves lower MSE than practical models. The oracle's R²_OOS is derived as a quadratic function of p and is asserted to serve as a tractable upper bound, with empirical analyses across forecasting scenarios showing that common predictive models fall below this bound.

Significance. If the coin-flip oracle is indeed the MSE-minimizing predictor for fixed directional accuracy, the quadratic bound would provide a useful, data-independent benchmark for assessing the performance of return-forecasting models and quantifying fundamental limits to predictability. The analytical form strengthens the result by making the bound directly computable from observed directional accuracy alone.

major comments (2)

[Oracle model definition and MSE derivation (abstract and theoretical section)] The central claim that the coin-flip oracle minimizes MSE among all predictors sharing the same directional accuracy p is not established. An alternative predictor that outputs E[y | sign correct] on correct-sign realizations and an appropriate conditional value on errors can achieve identical p while strictly lowering MSE by exploiting magnitude information conditional on the sign outcome; this would render the derived quadratic an invalid upper bound. The manuscript provides no explicit optimality proof or comparison against such alternatives.
[Abstract and theoretical derivation] The abstract states that the oracle outperforms real models in MSE under the same directional accuracy, yet the precise prediction rule of the oracle (fixed-magnitude outputs with coin-flip errors) and the algebraic steps leading to the quadratic R²_OOS expression are not supplied, preventing verification that the bound follows directly from the construction.

minor comments (1)

[Empirical analyses] Clarify in the empirical section how directional accuracy p is computed from the data (e.g., whether it uses the same sign convention and sample periods as the theoretical oracle) to ensure the reported R²_OOS values can be directly compared to the quadratic bound.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight the need for greater clarity on the oracle construction and its optimality properties. We respond point by point below and will revise the manuscript to incorporate the requested details while preserving the central contribution.

read point-by-point responses

Referee: [Oracle model definition and MSE derivation (abstract and theoretical section)] The central claim that the coin-flip oracle minimizes MSE among all predictors sharing the same directional accuracy p is not established. An alternative predictor that outputs E[y | sign correct] on correct-sign realizations and an appropriate conditional value on errors can achieve identical p while strictly lowering MSE by exploiting magnitude information conditional on the sign outcome; this would render the derived quadratic an invalid upper bound. The manuscript provides no explicit optimality proof or comparison against such alternatives.

Authors: The coin-flip oracle is an ex-ante predictor that achieves directional accuracy p by emitting a fixed magnitude m with the correct sign chosen randomly with probability p. The referee's proposed alternative cannot be implemented as a feasible predictor because it requires conditioning the output on the ex-post realization of whether the sign is correct, which depends on the unobserved y and is unavailable at forecast time. Under the information constraint of achieving accuracy p using only a directional signal, any non-constant magnitude choice increases average squared error without raising p. We will add a formal argument establishing this optimality (under standard symmetry assumptions on returns) to the theoretical section. revision: partial
Referee: [Abstract and theoretical derivation] The abstract states that the oracle outperforms real models in MSE under the same directional accuracy, yet the precise prediction rule of the oracle (fixed-magnitude outputs with coin-flip errors) and the algebraic steps leading to the quadratic R²_OOS expression are not supplied, preventing verification that the bound follows directly from the construction.

Authors: We apologize for the omission. The oracle emits a fixed magnitude m (chosen to minimize MSE for given p) with correct sign probability p and incorrect sign probability 1-p. Its MSE equals E[y²] + m² - 2m(2p-1)E[|y|] under symmetry. Dividing by Var(y) and rearranging yields the quadratic R²_OOS = (2p-1)² (m² / E[y²]). The revised manuscript will state the prediction rule explicitly and display the full algebraic derivation immediately after the definition. revision: yes

Circularity Check

0 steps flagged

No circularity: oracle R² derived directly from model definition as function of directional accuracy

full rationale

The paper defines a specific coin-flip oracle predictor (correct sign with probability p, random sign otherwise, fixed magnitude) and computes its out-of-sample R² analytically as a quadratic function of p. This algebraic expression is then proposed as an upper bound on attainable R²_OOS for any model sharing the same directional accuracy. The derivation step itself is a straightforward calculation from the oracle's assumed error process and does not reduce to a fitted parameter, self-citation chain, or redefinition of inputs; the bound claim rests on the separate (and externally challengeable) assertion that this oracle minimizes MSE for given p. No load-bearing step collapses by construction to the paper's own inputs or prior self-citations. The result is therefore self-contained against external benchmarks for the algebraic part.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that the oracle achieves the lowest possible MSE for any given directional accuracy and on the mathematical derivation that converts that MSE into an R-squared expression.

axioms (1)

domain assumption A coin-flip oracle with the same directional accuracy as a practical model necessarily has lower or equal MSE.
This premise is invoked to establish the oracle as an upper-bound benchmark.

invented entities (1)

coin-flip oracle model no independent evidence
purpose: Theoretical benchmark that minimizes MSE for a fixed directional accuracy
Newly defined construct used to derive the quadratic bound; no independent empirical evidence supplied.

pith-pipeline@v0.9.0 · 5411 in / 1227 out tokens · 57784 ms · 2026-05-16T06:40:21.921015+00:00 · methodology