Statistical inference with F-statistics when fitting simple models to high-dimensional data

Hannes Leeb; Lukas Steinberger

arxiv: 1902.04304 · v1 · pith:SQEA4FQ3new · submitted 2019-02-12 · 🧮 math.ST · stat.TH

Statistical inference with F-statistics when fitting simple models to high-dimensional data

Hannes Leeb , Lukas Steinberger This is my paper

classification 🧮 math.ST stat.TH

keywords modelsimplehigh-dimensionalappropriatebetaepsilonlinearmisspecified

0 comments

read the original abstract

We study linear subset regression in the context of the high-dimensional overall model $y = \vartheta+\theta' z + \epsilon$ with univariate response $y$ and a $d$-vector of random regressors $z$, independent of $\epsilon$. Here, "high-dimensional" means that the number $d$ of available explanatory variables is much larger than the number $n$ of observations. We consider simple linear sub-models where $y$ is regressed on a set of $p$ regressors given by $x = M'z$, for some $d \times p$ matrix $M$ of full rank $p < n$. The corresponding simple model, i.e., $y=\alpha+\beta' x + e$, can be justified by imposing appropriate restrictions on the unknown parameter $\theta$ in the overall model; otherwise, this simple model can be grossly misspecified. In this paper, we establish asymptotic validity of the standard $F$-test on the surrogate parameter $\beta$, in an appropriate sense, even when the simple model is misspecified.

This paper has not been read by Pith yet.

Statistical inference with F-statistics when fitting simple models to high-dimensional data

discussion (0)