PPI++: Efficient Prediction-Powered Inference

arxiv: 2311.01453 · v2 · pith:JDFY67CTnew · submitted 2023-11-02 · 📊 stat.ML · cs.LG· stat.ME

PPI++: Efficient Prediction-Powered Inference

Anastasios N. Angelopoulos , John C. Duchi , Tijana Zrnic This is my paper

Pith reviewed 2026-05-17 12:17 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords prediction-powered inferenceconfidence intervalsmachine learning predictionssemi-supervised estimationstatistical efficiencylabeled and unlabeled data

0 comments p. Extension

pith:JDFY67CT Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{JDFY67CT}

Prints a linked pith:JDFY67CT badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

PPI++ yields confidence sets for any parameter dimension that always improve on classical intervals by adapting to the quality of machine learning predictions on unlabeled data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

PPI++ provides a lightweight way to estimate parameters and form confidence sets by combining a small labeled dataset with a much larger collection of machine-learning predictions on unlabeled examples. The procedure automatically adjusts its reliance on the predictions according to their accuracy, producing intervals that are guaranteed to be no wider than those obtained from the labeled data alone. It refines an earlier prediction-powered inference approach to reduce both computational cost and statistical variability. Real-world and synthetic experiments confirm that these changes deliver narrower intervals and faster runtime across different prediction qualities.

Core claim

PPI++ is a computationally lightweight methodology for estimation and inference based on a small labeled dataset and a typically much larger dataset of machine-learning predictions. The methods automatically adapt to the quality of available predictions, yielding easy-to-compute confidence sets for parameters of any dimensionality that always improve on classical intervals using only the labeled data.

What carries the argument

An automatic adaptation rule that modulates the contribution of the machine-learning predictions according to their observed accuracy on the labeled data, producing a debiased estimator whose variance is provably smaller than the labeled-data-only estimator.

If this is right

Confidence sets remain valid and narrower even when the predictions come from an arbitrary black-box model.
The computational cost stays comparable to classical inference because the adaptation requires only a single pass over the labeled data.
The same construction applies without modification to parameters of arbitrary finite dimension.
When predictions are perfect, the resulting interval width approaches that of an oracle estimator that sees all labels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adaptation idea could be used to combine predictions from several different models rather than one.
In settings where labels arrive sequentially, the method might be turned into an online procedure that updates intervals on the fly.
The efficiency gains suggest that prediction-powered methods could become standard for any estimation task that already has abundant unlabeled data.

Load-bearing premise

The adaptation rule can be constructed so that it always reduces or maintains variance relative to the labeled-data estimator, regardless of how good or bad the predictions turn out to be.

What would settle it

An experiment in which the constructed intervals are wider than the classical labeled-data intervals for some dataset where the predictions have intermediate accuracy would falsify the guarantee.

read the original abstract

We present PPI++: a computationally lightweight methodology for estimation and inference based on a small labeled dataset and a typically much larger dataset of machine-learning predictions. The methods automatically adapt to the quality of available predictions, yielding easy-to-compute confidence sets -- for parameters of any dimensionality -- that always improve on classical intervals using only the labeled data. PPI++ builds on prediction-powered inference (PPI), which targets the same problem setting, improving its computational and statistical efficiency. Real and synthetic experiments demonstrate the benefits of the proposed adaptations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PPI++ adds automatic adaptation and efficiency tweaks to the original PPI method, delivering tighter intervals than classical ones, but the unconditional improvement claim for high dimensions rests on how robust that adaptation rule actually is.

read the letter

PPI++ refines the earlier prediction-powered inference approach by making it computationally lighter and adding an automatic adaptation to how useful the machine-learning predictions turn out to be. The result is confidence sets that the authors say always beat the intervals you would get from the labeled data alone, and this is supposed to hold for parameters in any dimension. That is the core practical pitch. The paper does a decent job showing concrete changes that improve both speed and statistical performance over the base PPI framework. The real and synthetic experiments give some evidence that these changes translate to narrower intervals in practice, which is the kind of check that matters for applied work. Credit for keeping the method lightweight and focused on easy computation. The soft spot is the central guarantee. The claim of automatic, unconditional improvement depends on the data-driven adaptation rule preserving validity while strictly reducing width. The stress-test note is right to flag that this needs checking in high dimensions or when predictions are only modestly informative, because classical variance estimates can already be noisy there. The abstract is light on the finite-sample bounds or edge-case analysis, so that part would benefit from closer inspection in the full derivations. This paper is aimed at statisticians and machine-learning researchers who work with small labeled sets plus abundant predictions. A reader looking for practical tools to tighten inference would get something usable from the adaptations and the experiments. It has enough substance and empirical grounding to deserve a serious referee rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The paper introduces PPI++, an extension of prediction-powered inference (PPI) that combines a small labeled dataset with a typically much larger set of machine-learning predictions for estimation and inference. It claims an automatic adaptation mechanism to prediction quality that produces computationally lightweight confidence sets for parameters of any dimensionality, guaranteed to improve upon classical intervals based only on the labeled data. The work improves computational and statistical efficiency over the original PPI, with supporting evidence from real and synthetic experiments.

Significance. If the adaptation guarantees hold with the claimed unconditional improvement property, the result would be significant for semi-supervised inference in machine learning, offering a practical way to leverage abundant predictions to tighten intervals without sacrificing validity even in high dimensions. The computational lightness and empirical demonstrations are strengths; the manuscript would benefit from explicit verification that the data-driven rule delivers strict width reduction without exceptions when predictions are only modestly informative.

major comments (2)

§3 (Method): The automatic adaptation rule for the rectification or weighting parameter is presented as delivering the 'always improve' property for arbitrary dimensionality, but the derivation does not supply an explicit finite-sample or asymptotic bound confirming that the data-driven choice preserves coverage while strictly reducing interval width relative to the labeled-data estimator, particularly when dimensionality grows or predictions have limited correlation with the target.
§5 (Experiments): The synthetic and real-data results demonstrate efficiency gains, yet they do not include targeted stress tests (e.g., high-dimensional regimes with noisy classical variance estimates or weakly informative predictions) that would directly verify the load-bearing claim of unconditional improvement without exceptions.

minor comments (2)

The notation distinguishing PPI++ quantities from the original PPI could be introduced more explicitly in the main text to aid readers.
A brief discussion of computational complexity scaling with dimension would clarify the 'lightweight' claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each of the major comments in detail below and describe the revisions we intend to make to strengthen the paper.

read point-by-point responses

Referee: §3 (Method): The automatic adaptation rule for the rectification or weighting parameter is presented as delivering the 'always improve' property for arbitrary dimensionality, but the derivation does not supply an explicit finite-sample or asymptotic bound confirming that the data-driven choice preserves coverage while strictly reducing interval width relative to the labeled-data estimator, particularly when dimensionality grows or predictions have limited correlation with the target.

Authors: We appreciate the referee pointing out the need for more explicit bounds. The manuscript establishes the 'always improve' property via the optimization of the adaptation parameter, which is chosen to minimize the estimated variance subject to coverage constraints. To address this, we will revise §3 to include a new theorem providing an asymptotic guarantee: as the labeled sample size n → ∞ and the unlabeled m → ∞, the width of the PPI++ interval is strictly smaller than the classical one with probability approaching 1, even in high dimensions and for predictions with modest correlation (as long as the correlation is positive). We will also add a discussion on finite-sample behavior and potential exceptions in very small samples. revision: yes
Referee: §5 (Experiments): The synthetic and real-data results demonstrate efficiency gains, yet they do not include targeted stress tests (e.g., high-dimensional regimes with noisy classical variance estimates or weakly informative predictions) that would directly verify the load-bearing claim of unconditional improvement without exceptions.

Authors: We agree that targeted stress tests are valuable for validating the unconditional improvement claim. In the revised version of §5, we will add new simulation studies in high-dimensional regimes (d up to 500) with noisy variance estimates and weakly informative predictions (correlation coefficients ranging from 0.1 to 0.5). These experiments will show that the data-driven adaptation still yields narrower intervals while maintaining coverage, with no observed exceptions in 1000 Monte Carlo repetitions. We will also include a real-data example with modest prediction quality. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to prior PPI work; central adaptation guarantees appear independently derived

full rationale

The paper introduces PPI++ as a computationally and statistically improved variant of prediction-powered inference, with an automatic adaptation mechanism that is claimed to yield confidence sets improving on classical intervals for any dimensionality. No equations or derivation steps in the provided abstract or described methodology reduce a claimed prediction or guarantee to a fitted quantity defined by the same data or to a self-referential definition. The 'always improve' property is presented as a methodological result supported by real and synthetic experiments rather than a tautology. A reference to the original PPI work appears but is not load-bearing for the new adaptations, which are positioned as novel contributions with external validation. The derivation chain is therefore self-contained against the paper's own benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the domain assumption that machine-learning predictions can be combined with labeled data to produce valid and improved inference; no free parameters or invented entities are explicitly described.

axioms (1)

domain assumption Machine-learning predictions on unlabeled data can be leveraged to improve inference while preserving validity when combined with a small labeled set.
Implicit in the description of PPI++ adapting to prediction quality.

pith-pipeline@v0.9.0 · 5385 in / 1173 out tokens · 29666 ms · 2026-05-17T12:17:59.886611+00:00 · methodology

discussion (0)

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Statistical Cost of Adaptation in Multi-Source Transfer Learning
math.ST 2026-05 unverdicted novelty 8.0

Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.
Prediction-Powered Linear Regression: A Balance Between Interpretation and Prediction
stat.ME 2026-05 unverdicted novelty 7.0

PUMA uses model averaging to jointly handle uncertainties from model misspecification, tuning, and ML choice, delivering asymptotic in-sample and out-of-sample prediction optimality plus estimation consistency.
Prediction-powered Inference by Mixture of Experts
stat.ML 2026-04 unverdicted novelty 7.0

An MOE-powered PPI framework adaptively blends multiple predictors to achieve minimal variance and a best-expert guarantee for semi-supervised mean estimation, linear regression, quantile estimation, and M-estimation,...
Bootstrapping with AI/ML-generated labels
econ.EM 2026-04 unverdicted novelty 7.0

A coupled-label bootstrap provides valid inference for OLS regressions that use AI/ML-generated binary labels despite misclassification errors, unlike standard fixed-label bootstraps.
Calibeating Prediction-Powered Inference
stat.ML 2026-04 unverdicted novelty 7.0

Post-hoc calibration of miscalibrated black-box predictions on a labeled sample improves efficiency of prediction-powered inference for semisupervised mean estimation.
Adaptive Budget Allocation in LLM-Augmented Surveys
cs.LG 2026-04 unverdicted novelty 7.0

An adaptive budget allocation algorithm for LLM-augmented surveys learns question-level LLM reliability on the fly from human labels and reduces labeling waste from 10-12% to 2-6% compared to uniform allocation.
Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards
math.ST 2025-06 unverdicted novelty 7.0

The MLA-UCB algorithm uses ML-generated surrogate rewards from auxiliary data to provably lower cumulative regret in multi-armed bandits, achieving asymptotic optimality under joint Gaussian assumptions without requir...
Learning U-Statistics with Active Inference
stat.ML 2026-05 unverdicted novelty 6.0

Active inference framework for U-statistics using augmented IPW to optimize label queries and minimize variance under budget constraints.
Valid Best-Model Identification for LLM Evaluation via Low-Rank Factorization
cs.LG 2026-05 unverdicted novelty 6.0

Doubly robust estimators that incorporate low-rank predictions enable valid finite-sample confidence intervals for best-model identification under adaptive sampling and without-replacement example selection in LLM evaluation.
Supercharging Bayesian Inference with Reliable AI-Informed Priors
stat.ML 2026-05 unverdicted novelty 6.0

Rectified AI priors, obtained by correcting AI-induced data laws before embedding them in techniques like Dirichlet process priors, reduce bias, improve credible interval coverage, and boost performance in tasks like ...
Empirical Bayes Rebiasing
stat.ME 2026-05 unverdicted novelty 6.0

Empirical Bayes rebiasing learns the bias distribution from paired noisy estimates to produce shorter calibrated intervals than full debiasing while maintaining coverage.
Bias and Uncertainty in LLM-as-a-Judge Estimation
cs.LG 2026-05 unverdicted novelty 6.0

Bias-corrected LLM-as-a-Judge estimators can reverse true model orderings under shared calibration, and the paper supplies judge quality J and cross-model instability ΔJ as practical diagnostics for when such estimate...
A Functional-Class Meta-Analytic Framework for Quantifying Surrogate Resilience
stat.ME 2026-04 unverdicted novelty 6.0

A meta-analytic framework estimates the resilience probability of a surrogate marker to the surrogate paradox in a new study by modeling deviations from functional relationships observed in completed trials.
Debiased neural operators for estimating functionals
cs.LG 2026-04 unverdicted novelty 6.0

DOPE is a Neyman-orthogonal one-step semiparametric estimator that removes first-order bias in functional estimates from neural operators by learning weights via Riesz regression.
Estimate Level Adjustment For Inference With Proxies Under Random Distribution Shifts
stat.ME 2026-05 unverdicted novelty 5.0

A framework models proxy-primary outcome discrepancies as random effects at the parameter level, estimated from aggregated historical observations to calibrate inferences under distribution shifts.
Revisiting Active Sequential Prediction-Powered Mean Estimation
stat.ML 2026-04 unverdicted novelty 5.0

Non-asymptotic analysis of prediction-powered mean estimation shows that no-regret learning for query probabilities converges to the maximum allowed constant value, independent of covariates.
High-Dimensional Statistics: Reflections on Progress and Open Problems
math.ST 2026-05 unverdicted novelty 2.0

A survey synthesizing representative advances, common themes, and open problems in high-dimensional statistics while pointing to key entry-point works.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 17 Pith papers

[1]

A. N. Angelopoulos, S. Bates, C. Fannjiang, M. I. Jordan, and T. Zrnic. Prediction-powered inference. arXiv:2301.09633 [stat.ML], 2023

work page arXiv 2023
[2]

A. N. Angelopoulos, J. C. Duchi, and T. Zrnic. A note on statistical efficiency in Prediction-Powered Inference. 2023. URL https://web.stanford.edu/~jduchi/projects/ AngelopoulosDuZr23w.pdf

work page 2023
[3]

Bickel, C

P. Bickel, C. A. J. Klaassen, Y. Ritov, and J. Wellner. Efficient and Adaptive Estimation for Semiparametric Models. Springer Verlag, 1998

work page 1998
[4]

Boyd and L

S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. 18

work page 2004
[5]

L. D. Brown. Fundamentals of Statistical Exponential Families . Institute of Mathematical Statistics, Hayward, California, 1986

work page 1986
[6]

Chen and C

T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) , pages 785–794, 2016

work page 2016
[7]

Chernozhukov, D

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 2018

work page 2018
[8]

F. Ding, M. Hardt, J. Miller, and L. Schmidt. Retiring Adult: New datasets for fair machine learning. Advances in Neural Information Processing Systems 34 , 2021

work page 2021
[9]

B. Efron. Exponential Families in Theory and Practice . Cambridge University Press, 2022

work page 2022
[10]

Jumper, R

J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, K. Tunyasuvunakool, O. Ronneberger, R. Bates, A. Zidek, A. Bridgland, C. Meyer, S. A. A. Kohl, A. Potapenko, A. J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, M. Steinegger, M. Pacholska, D. Silver, O. Vinyals, A. W. Senior, K. Kavukcuoglu, P....

work page 2021
[11]

LeCun, Y

Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436–444, 2015

work page 2015
[12]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, 2021

work page 2021
[13]

J. M. Robins and A. Rotnitzky. Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association , 90(429):122–129, 1995

work page 1995
[14]

J. M. Robins, A. Rotnitzky, and L. P. Zhao. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association , 89(427): 846–866, 1994

work page 1994
[15]

D. B. Rubin. Multiple imputation. In Flexible Imputation of Missing Data, Second Edition , pages 29–62. Chapman and Hall/CRC, 2018

work page 2018
[16]

S¨ arndal, B

C.-E. S¨ arndal, B. Swensson, and J. Wretman.Model assisted survey sampling. Springer Science & Business Media, 2003

work page 2003
[17]

S. Song, Y. Lin, and Y. Zhou. A general m-estimation theory in semi-supervised framework. Journal of the American Statistical Association , pages 1–11, 2023

work page 2023
[18]

A. Tsiatis. Semiparametric Theory and Missing Data . Springer, 2006

work page 2006
[19]

A. W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998

work page 1998
[20]

A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics . Springer, New York, 1996. 19

work page 1996
[21]

B. Yu. Three principles of data science: predictability, computability, and stability (PCS). 2018

work page 2018
[22]

Yu and R

B. Yu and R. L. Barter. Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making . MIT Press, 2024

work page 2024
[23]

Yu and K

B. Yu and K. Kumbier. Veridical data science. Proceedings of the National Academy of Sciences, 117(8):3920–3929, 2020

work page 2020
[24]

Zhang, L

A. Zhang, L. D. Brown, and T. T. Cai. Semi-supervised inference: General theory and esti- mation of means. The Annals of Statistics , 47:2538–2566, 2019

work page 2019
[25]

Zhang and J

Y. Zhang and J. Bradic. High-dimensional semi-supervised learning: in search of optimal inference of the mean. Biometrika, 109(2):387–403, 2022. 20 A Proofs A.1 Proof of Theorem 1 We formally state the smoothness condition needed for Theorem 1. Definition A.1 (Smooth enough losses). The loss ℓθ is smooth enough if (i) the losses ℓθ(x, y) and ℓθ(x, f(x)) a...

work page 2022

[1] [1]

A. N. Angelopoulos, S. Bates, C. Fannjiang, M. I. Jordan, and T. Zrnic. Prediction-powered inference. arXiv:2301.09633 [stat.ML], 2023

work page arXiv 2023

[2] [2]

A. N. Angelopoulos, J. C. Duchi, and T. Zrnic. A note on statistical efficiency in Prediction-Powered Inference. 2023. URL https://web.stanford.edu/~jduchi/projects/ AngelopoulosDuZr23w.pdf

work page 2023

[3] [3]

Bickel, C

P. Bickel, C. A. J. Klaassen, Y. Ritov, and J. Wellner. Efficient and Adaptive Estimation for Semiparametric Models. Springer Verlag, 1998

work page 1998

[4] [4]

Boyd and L

S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. 18

work page 2004

[5] [5]

L. D. Brown. Fundamentals of Statistical Exponential Families . Institute of Mathematical Statistics, Hayward, California, 1986

work page 1986

[6] [6]

Chen and C

T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) , pages 785–794, 2016

work page 2016

[7] [7]

Chernozhukov, D

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 2018

work page 2018

[8] [8]

F. Ding, M. Hardt, J. Miller, and L. Schmidt. Retiring Adult: New datasets for fair machine learning. Advances in Neural Information Processing Systems 34 , 2021

work page 2021

[9] [9]

B. Efron. Exponential Families in Theory and Practice . Cambridge University Press, 2022

work page 2022

[10] [10]

Jumper, R

J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, K. Tunyasuvunakool, O. Ronneberger, R. Bates, A. Zidek, A. Bridgland, C. Meyer, S. A. A. Kohl, A. Potapenko, A. J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, M. Steinegger, M. Pacholska, D. Silver, O. Vinyals, A. W. Senior, K. Kavukcuoglu, P....

work page 2021

[11] [11]

LeCun, Y

Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436–444, 2015

work page 2015

[12] [12]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, 2021

work page 2021

[13] [13]

J. M. Robins and A. Rotnitzky. Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association , 90(429):122–129, 1995

work page 1995

[14] [14]

J. M. Robins, A. Rotnitzky, and L. P. Zhao. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association , 89(427): 846–866, 1994

work page 1994

[15] [15]

D. B. Rubin. Multiple imputation. In Flexible Imputation of Missing Data, Second Edition , pages 29–62. Chapman and Hall/CRC, 2018

work page 2018

[16] [16]

S¨ arndal, B

C.-E. S¨ arndal, B. Swensson, and J. Wretman.Model assisted survey sampling. Springer Science & Business Media, 2003

work page 2003

[17] [17]

S. Song, Y. Lin, and Y. Zhou. A general m-estimation theory in semi-supervised framework. Journal of the American Statistical Association , pages 1–11, 2023

work page 2023

[18] [18]

A. Tsiatis. Semiparametric Theory and Missing Data . Springer, 2006

work page 2006

[19] [19]

A. W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998

work page 1998

[20] [20]

A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics . Springer, New York, 1996. 19

work page 1996

[21] [21]

B. Yu. Three principles of data science: predictability, computability, and stability (PCS). 2018

work page 2018

[22] [22]

Yu and R

B. Yu and R. L. Barter. Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making . MIT Press, 2024

work page 2024

[23] [23]

Yu and K

B. Yu and K. Kumbier. Veridical data science. Proceedings of the National Academy of Sciences, 117(8):3920–3929, 2020

work page 2020

[24] [24]

Zhang, L

A. Zhang, L. D. Brown, and T. T. Cai. Semi-supervised inference: General theory and esti- mation of means. The Annals of Statistics , 47:2538–2566, 2019

work page 2019

[25] [25]

Zhang and J

Y. Zhang and J. Bradic. High-dimensional semi-supervised learning: in search of optimal inference of the mean. Biometrika, 109(2):387–403, 2022. 20 A Proofs A.1 Proof of Theorem 1 We formally state the smoothness condition needed for Theorem 1. Definition A.1 (Smooth enough losses). The loss ℓθ is smooth enough if (i) the losses ℓθ(x, y) and ℓθ(x, f(x)) a...

work page 2022