Valid F-screening in linear regression

Daniela Witten; Daniel Kessler; Olivia McGough

arxiv: 2505.23113 · v3 · submitted 2025-05-29 · 📊 stat.ME · math.ST· stat.AP· stat.TH

Valid F-screening in linear regression

Olivia McGough , Daniela Witten , Daniel Kessler This is my paper

Pith reviewed 2026-05-19 13:27 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.APstat.TH

keywords selective inferencelinear regressionF-testpost-selection inferenceconditional p-valuesselective coverageretrospective analysis

0 comments

The pith

Selective p-values control type 1 error for regression coefficients after conditioning on rejection of the overall F-test.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Analysts often fit a linear regression and only report or interpret individual coefficients once the global F-test rejects the hypothesis that all slopes are zero. Standard p-values and intervals lose their guarantees once this screening step has occurred. The paper builds selective p-values that control the conditional type 1 error rate given that the overall null was rejected, along with confidence intervals that attain nominal selective coverage and point estimates that adjust for the screening. These quantities are obtained directly from the usual least-squares summary statistics without needing the raw observations, so they support retrospective analysis of published studies. The derivations rely on the conditional distribution of the coefficient estimates under the Gaussian linear model.

Core claim

We develop selective p-values for the coefficients in a least squares linear regression that control the selective Type 1 error, that is, the type 1 error conditional on having rejected the overall null hypothesis via the F-test. These p-values yield consistent tests and are computed using only the standard outputs of ordinary least squares regression. We also supply confidence intervals with nominal selective coverage and point estimates that account for the F-screening step, and we compare the resulting Fisher information to that obtained from sample splitting.

What carries the argument

Selective p-values constructed from the conditional distribution of the least-squares estimates given rejection of the overall null hypothesis.

If this is right

Tests based on the selective p-values control error rates conditional on having rejected the overall null.
Confidence intervals attain their nominal coverage level conditional on the F-screen.
Point estimates can be adjusted to reflect the selection induced by the F-test.
All quantities can be computed from published regression summary statistics alone.
The Fisher information under this approach can be compared directly to that from sample splitting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Published regressions that were only interpreted after an F-test could be re-analyzed with these tools to restore valid inference.
The same conditional-distribution idea could be applied to other global screening tests or to generalized linear models.
For large numbers of predictors the closed-form conditional distributions remain tractable, but numerical integration may be needed in non-Gaussian settings.

Load-bearing premise

The linear model is correctly specified and the errors are Gaussian.

What would settle it

Simulate data from the Gaussian linear model under a null coefficient, apply the F-screen, and check whether the selective p-value for that coefficient is distributed as uniform on [0,1] conditional on screening.

read the original abstract

Suppose that a data analyst wishes to report the results of a least squares linear regression only if the overall null hypothesis, $H_0^{1:p}: \beta_1= \beta_2 = \ldots = \beta_p=0$, is rejected. This practice, which we refer to as F-screening (since the overall null hypothesis is typically tested using an $F$-statistic), is in fact common across a number of applied fields. Unfortunately, it poses a problem: standard guarantees for the inferential outputs of linear regression, such as Type 1 error control of hypothesis tests and nominal coverage of confidence intervals, hold unconditionally, but fail to hold conditional on rejection of the overall null hypothesis. In this paper, we develop an inferential toolbox for the coefficients in a least squares model that are valid conditional on rejection of the overall null hypothesis. We develop selective p-values that lead to tests that are consistent and control the selective Type 1 error, i.e., the Type 1 error conditional on having rejected the overall null hypothesis. Furthermore, they can be computed without access to the raw data, i.e., using only the standard outputs of a least squares linear regression, and therefore are suitable for use in a retrospective analysis of a published study. We also develop confidence intervals that attain nominal selective coverage, and point estimates that account for having rejected the overall null hypothesis. We derive an expression for the Fisher information about the coefficients resulting from the proposed approach, and compare this to the Fisher information that results from an alternative approach that relies on sample splitting. We investigate the proposed approach in simulation and via re-analysis of two datasets from the biomedical literature.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives selective p-values, intervals, and estimates valid after F-screening, all from regression summary stats alone.

read the letter

The main thing to know is that the authors have built selective inference tools tailored to F-screening, the common habit of only reporting a regression if the overall F-test rejects. They derive p-values that control Type 1 error conditional on that rejection, plus intervals with selective coverage and adjusted estimates, all computable from the usual least-squares outputs without raw data. This makes the method usable for retrospective checks on published studies, which is a practical step beyond sample splitting.

Referee Report

2 major / 2 minor

Summary. The manuscript develops selective p-values, confidence intervals, and point estimates for individual regression coefficients that control the selective Type I error (i.e., error conditional on rejection of the global null via the overall F-test). These quantities are derived to be computable from standard least-squares outputs (coefficient estimates, standard errors, and the F-statistic) without requiring the raw data, enabling retrospective analysis of published regressions. The authors also derive the Fisher information under the proposed conditioning and compare it to sample splitting, with supporting simulation studies and re-analyses of two biomedical datasets.

Significance. If the derivations hold, the work provides a practical toolbox for valid post-F-screening inference in linear models, addressing a widespread applied practice where models are reported only after overall significance. The summary-statistic-only computation is a notable strength for re-analysis settings, and the explicit Fisher-information comparison to sample splitting offers a clear efficiency benchmark. Simulation and real-data results help quantify the practical gains over naive approaches.

major comments (2)

[§3.1, Eq. (7)–(9)] §3.1, Eq. (7)–(9): The selective p-value is obtained by integrating the tail of the usual t-statistic under the law of (β̂, σ̂) conditional on the event {F > c}. This construction uses the joint normality of β̂ and its independence from σ̂, which holds if and only if the errors are i.i.d. Gaussian. The manuscript should state explicitly whether exact selective Type I error control is claimed only under this assumption or whether asymptotic or robust versions are also derived, because the Gaussian requirement is load-bearing for the exact conditional distribution used throughout the paper.
[§4.2, Algorithm 1] §4.2, Algorithm 1: The claim that the truncation probabilities can be evaluated from standard regression outputs alone presupposes that the relevant quadratic forms and the selection region boundaries can be recovered from the reported β̂, SE(β̂), and F-statistic without the design matrix X. A concrete numerical example or pseudocode showing the exact recovery of the conditioning constants from these summaries would strengthen the retrospective-analysis claim.

minor comments (2)

The abstract states that the procedures 'can be computed without access to the raw data'; a short parenthetical clarifying that the design matrix is not needed but that the reported (XᵀX)⁻¹ or equivalent information must be available would prevent misinterpretation.
In the simulation section, the number of Monte Carlo replications and the exact grid of signal strengths used to assess consistency should be stated more prominently so that readers can judge the precision of the reported Type I error and power curves.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped clarify important aspects of our derivations and implementation. We address each major comment below and indicate the planned revisions.

read point-by-point responses

Referee: [§3.1, Eq. (7)–(9)] §3.1, Eq. (7)–(9): The selective p-value is obtained by integrating the tail of the usual t-statistic under the law of (β̂, σ̂) conditional on the event {F > c}. This construction uses the joint normality of β̂ and its independence from σ̂, which holds if and only if the errors are i.i.d. Gaussian. The manuscript should state explicitly whether exact selective Type I error control is claimed only under this assumption or whether asymptotic or robust versions are also derived, because the Gaussian requirement is load-bearing for the exact conditional distribution used throughout the paper.

Authors: We agree that the exact finite-sample selective Type I error control and nominal coverage rely on the i.i.d. Gaussian errors assumption, which delivers both the joint normality of the least-squares coefficient vector and its independence from the residual variance estimator. The manuscript works throughout under the classical linear model with these properties and does not derive asymptotic or robust analogues. In the revised version we will add an explicit statement of this modeling assumption in the introduction and at the start of Section 3.1, together with a brief remark that extensions to asymptotic regimes under weaker conditions remain an interesting direction for future research. revision: yes
Referee: [§4.2, Algorithm 1] §4.2, Algorithm 1: The claim that the truncation probabilities can be evaluated from standard regression outputs alone presupposes that the relevant quadratic forms and the selection region boundaries can be recovered from the reported β̂, SE(β̂), and F-statistic without the design matrix X. A concrete numerical example or pseudocode showing the exact recovery of the conditioning constants from these summaries would strengthen the retrospective-analysis claim.

Authors: We appreciate the request for greater transparency on the retrospective-analysis procedure. The observed F-statistic directly supplies the value of the quadratic form β̂'(X'X)β̂ that defines the selection event, while the reported standard errors supply the marginal scales needed to evaluate the conditional tail probabilities via one-dimensional numerical integration of the joint distribution of the relevant t-statistic and the F-statistic. To make this explicit, we will insert a short numerical example (using a small simulated regression whose summary statistics are fully reported) and accompanying pseudocode for Algorithm 1 in the revised Section 4.2, demonstrating step-by-step recovery of the truncation constants from β̂, SE(β̂), F, and the degrees of freedom alone. revision: yes

Circularity Check

0 steps flagged

Selective p-value derivation is self-contained under standard linear model assumptions

full rationale

The paper derives selective p-values and confidence intervals by explicitly conditioning the usual t-statistics on the event {overall F > critical value}, using the known multivariate normal distribution of the least-squares estimator and its independence from the residual variance under i.i.d. Gaussian errors. This is a direct application of the conditional distribution implied by the model assumptions stated in the abstract and methods; it does not reduce any target quantity to a fitted parameter or prior self-citation by construction. The claim that only standard regression outputs are needed follows from the closed-form truncation probabilities under those assumptions rather than from re-labeling inputs. No load-bearing step matches any of the enumerated circularity patterns, and the central inferential guarantees remain independent of the specific fitted values being tested.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Approach rests on standard linear model assumptions for deriving conditional distributions after F-screening; no free parameters or invented entities are introduced in the abstract description.

axioms (1)

domain assumption Errors are independent and normally distributed with constant variance in the linear model
Required for the exact distribution of the F-statistic and for the selective p-value calculations to hold.

pith-pipeline@v0.9.0 · 5835 in / 1107 out tokens · 45771 ms · 2026-05-19T13:27:40.011900+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop selective p-values that lead to tests that are consistent and control the selective Type 1 error, i.e., the Type 1 error conditional on having rejected the overall null hypothesis... using only the standard outputs of a least squares linear regression.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under the null hypothesis H_M^0, the F-statistic ... follows an F_{m,n-p-1} distribution.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.