More powerful post-selection inference, with application to the Lasso

Keli Liu , Jelena Markovic , Robert Tibshirani

Authors on Pith no claims yet

classification 📊 stat.ME

keywords datahypothesesintervalsgenerateinferencecomponentsconditioningconfidence

read the original abstract

Investigators often use the data to generate interesting hypotheses and then perform inference for the generated hypotheses. P-values and confidence intervals must account for this explorative data analysis. A fruitful method for doing so is to condition any inferences on the components of the data used to generate the hypotheses, thus preventing information in those components from being used again. Some currently popular methods "over-condition", leading to wide intervals. We show how to perform the minimal conditioning in a computationally tractable way. In high dimensions, even this minimal conditioning can lead to intervals that are too wide to be useful, suggesting that up to now the cost of hypothesis generation has been underestimated. We show how to generate hypotheses in a strategic manner that sharply reduces the cost of data exploration and results in useful confidence intervals. Our discussion focuses on the problem of post-selection inference after fitting a lasso regression model, but we also outline its extension to a much more general setting.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Post-ADC Inference: Valid Inference After Active Data Collection
stat.ML 2026-05 unverdicted novelty 7.0

Post-ADC inference supplies valid p-values and confidence intervals for data-dependent targets after active data collection by extending selective inference to correct for both adaptive sampling bias and post-hoc targ...