pith. sign in

arxiv: 2605.29255 · v1 · pith:N5EIHU65new · submitted 2026-05-28 · 📊 stat.ME

Outcome-Calibrated Regression and Predicted Outcome-Based Inference

classification 📊 stat.ME
keywords outcomeinferenceregressionconditionalmeanpredictedpredictionunbiased
0
0 comments X
read the original abstract

Regression is a fundamental tool in scientific research. Ordinary least squares (OLS), one of the most widely used regression methods, enjoys several desirable properties, including the best linear unbiased estimator (BLUE) property. It is well known that, under the assumptions of the standard model, the OLS is conditionally unbiased given the covariates, i.e., $\mathbb{E}(\widehat Y-Y\mid X=x)=0$. However, an often-overlooked property of OLS is that the prediction error is generally not unbiased conditional on the outcome, i.e., $\mathbb{E}(\widehat Y-Y\mid Y=y)\neq 0$. As a consequence of minimizing mean squared error, OLS predictions are systematically shrunk toward the outcome mean, which explains the classical phenomenon of regression to the mean (RTM): large outcome values tend to be underpredicted, whereas small outcome values tend to be overpredicted. This conditional prediction bias creates a nonignorable problem for predicted outcome-based inference, where scientific inference is performed using the predicted outcome $\widehat Y$ and another variable $W$. In applications such as brain-age analysis and causal inference, we show that inference based on regression-predicted outcomes can be systematically biased. To address this issue, we propose outcome-calibrated regression (OCR), a new regression framework with a closed-form solution that directly enforces outcome calibration. The proposed OCR estimator eliminates conditional prediction bias with respect to the outcome and enables valid inference using regression-predicted outcomes.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.