pith. sign in

arxiv: 2505.20178 · v2 · pith:24ZDLJHOnew · submitted 2025-05-26 · 📊 stat.ML · cs.LG

No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference

classification 📊 stat.ML cs.LG
keywords estimationgold-standardfreelunchresultvariancealoneanalysis
0
0 comments X
read the original abstract

Prediction-Powered Inference (PPI) is a popular strategy for combining gold-standard and possibly noisy pseudo-labels to perform statistical estimation. Prior work has shown an asymptotic \enquote{free lunch} for PPI++, an adaptive form of PPI, showing that the \textit{asymptotic} variance of PPI++ is always less than or equal to the variance obtained from using gold-standard labels alone. Notably, this result holds \textit{regardless of the quality of the pseudo-labels}. In this work, we demystify this result by conducting an exact finite-sample analysis of the estimation error of PPI++ on the mean estimation problem. We give a \enquote{no free lunch} result, characterizing the settings (and sample sizes) where PPI++ has provably worse estimation error than using gold-standard labels alone. Specifically, PPI++ will outperform if and only if the correlation between pseudo- and gold-standard is above a certain level that depends on the number of labeled samples ($n$). In some cases our results simplify considerably: For Gaussian data, for instance, the correlation must be at least $1/\sqrt{n - 2}$ in order to see improvement. More broadly, by providing exact non-asymptotic expressions for the variance of PPI++ under sample splitting, we aim to empower practitioners to transparently reason about the benefits of PPI++ in specific applications. In experiments, we illustrate that our theoretical findings hold on real-world datasets.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Revisiting Active Sequential Prediction-Powered Mean Estimation

    stat.ML 2026-04 unverdicted novelty 5.0

    Non-asymptotic analysis of prediction-powered mean estimation shows that no-regret learning for query probabilities converges to the maximum allowed constant value, independent of covariates.