pith. machine review for the scientific record. sign in

arxiv: 2604.21260 · v1 · submitted 2026-04-23 · 📊 stat.ML · cs.AI· cs.LG· econ.EM· q-bio.QM· stat.ME

Recognition: unknown

Calibeating Prediction-Powered Inference

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:02 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LGecon.EMq-bio.QMstat.ME
keywords semisupervised estimationprediction-powered inferencecalibrationisotonic regressionmean estimationaugmented inverse probability weighting
0
0 comments X

The pith

Calibrating a black-box prediction score on a small labeled sample improves both its accuracy and the efficiency of semisupervised mean estimation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper focuses on estimating a population mean when only a few data points have known outcomes, a large number do not, and an existing prediction model may not be well aligned with the true outcome scale. It introduces a post-processing step that fits either a linear or isotonic calibration function to the labeled points and then applies the adjusted scores to the unlabeled points inside an augmented inverse-probability weighting estimator. If the calibration succeeds, the resulting estimator has lower variance than the unadjusted prediction-powered approach while remaining protected against model misspecification. The work shows that isotonic calibration achieves a first-order optimality property: it improves accuracy and efficiency over the raw score and simpler adjustments, yet no additional post-processing of the isotonic output yields further first-order gains. This matters because it lets practitioners refine existing predictors without retraining them, thereby extracting more precise estimates from scarce labels.

Core claim

Post-hoc calibration of the prediction score on the labeled sample produces a new score that serves as both a better predictor and a better regression adjustment. For the isotonic case the calibrated estimator is first-order optimal among post-processing rules: it improves predictive accuracy and estimator efficiency relative to the original score and to simpler rules, while further post-processing of the isotonic score yields no additional first-order improvement. Linear calibration is first-order equivalent to the PPI++ estimator. The original PPI estimator is recovered as a special case of AIPW and is inefficient when the prediction model is already accurate.

What carries the argument

Isotonic calibration, which fits a non-decreasing function on the labeled sample to align the black-box scores with observed outcomes before they are used as regression adjustments in the semisupervised estimator.

If this is right

  • Isotonic post-processing improves both the predictive accuracy of the score and the efficiency of the resulting mean estimator relative to the raw score and simpler post-processing rules.
  • No further adjustment after isotonic calibration produces additional first-order efficiency gains.
  • Linear calibration is first-order equivalent to the existing PPI++ estimator.
  • The original PPI estimator is a special case of AIPW and can be inefficient when the prediction model is already accurate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same calibration step could be applied to other semisupervised functionals such as regression coefficients or quantile estimation by replacing the mean functional inside the estimator.
  • In practice the isotonic version may serve as a default post-processing choice whenever a black-box predictor is available, because its optimality property removes the need to compare multiple adjustment rules.
  • The approach suggests testing whether the calibrated score also reduces finite-sample bias in settings where the prediction model is trained on a different population than the target data.

Load-bearing premise

The small labeled sample is representative enough that the fitted calibration function generalizes to the unlabeled data without introducing bias or excess variance that offsets the efficiency gains.

What would settle it

An experiment or simulation in which the asymptotic variance of the isotonic-calibrated estimator exceeds the variance of the uncalibrated estimator, even though the calibration function fits the labeled sample well and the predictions are miscalibrated on the raw scale.

Figures

Figures reproduced from arXiv: 2604.21260 by Lars van der Laan, Mark van der Laan.

Figure 1
Figure 1. Figure 1: Toy example illustrating how calibration can improve prediction-powered mean estimation. The [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Synthetic simulation study with a poorly calibrated score. The top row shows the balanced case [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Main-text benchmark summary for the reproduced [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: PPE-centered LLM-evaluation benchmark. Panels report normalized MSE relative to PPI, relative [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Calibration-focused appendix comparison across the reproduced PPI benchmarks, restricted to PPI, [PITH_FULL_IMAGE:figures/full_fig_p059_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Full diagnostic grid for the reproduced PPI benchmarks, showing bias, empirical variance, normalized [PITH_FULL_IMAGE:figures/full_fig_p060_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Appendix evaluator-specific PPE summary. Each panel repeats the PPE Human and PPE [PITH_FULL_IMAGE:figures/full_fig_p061_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Appendix PPE-ranking supplement from the LLM benchmark, macro-averaged across the public [PITH_FULL_IMAGE:figures/full_fig_p061_8.png] view at source ↗
read the original abstract

We study semisupervised mean estimation with a small labeled sample, a large unlabeled sample, and a black-box prediction model whose output may be miscalibrated. A standard approach in this setting is augmented inverse-probability weighting (AIPW) [Robins et al., 1994], which protects against prediction-model misspecification but can be inefficient when the prediction score is poorly aligned with the outcome scale. We introduce Calibrated Prediction-Powered Inference, which post-hoc calibrates the prediction score on the labeled sample before using it for semisupervised estimation. This simple step requires no retraining and can improve the original score both as a predictor of the outcome and as a regression adjustment for semisupervised inference. We study both linear and isotonic calibration. For isotonic calibration, we establish first-order optimality guarantees: isotonic post-processing can improve predictive accuracy and estimator efficiency relative to the original score and simpler post-processing rules, while no further post-processing of the fitted isotonic score yields additional first-order gains. For linear calibration, we show first-order equivalence to PPI++. We also clarify the relationship among existing estimators, showing that the original PPI estimator is a special case of AIPW and can be inefficient when the prediction model is accurate, while PPI++ is AIPW with empirical efficiency maximization [Rubin et al., 2008]. In simulations and real-data experiments, our calibrated estimators often outperform PPI and are competitive with, or outperform, AIPW and PPI++. We provide an accompanying Python package, ppi_aipw, at https://larsvanderlaan.github.io/ppi-aipw/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces Calibrated Prediction-Powered Inference (Calibeating PPI), which post-hoc calibrates a black-box prediction model's output on a small labeled sample before using it in augmented inverse-probability weighting (AIPW) for semisupervised mean estimation with a large unlabeled sample. It examines both linear and isotonic calibration, establishing first-order optimality guarantees for isotonic post-processing (improved predictive accuracy and estimator efficiency, with no further post-processing yielding additional first-order gains) and first-order equivalence of linear calibration to PPI++. The work also clarifies that the original PPI estimator is a special case of AIPW (potentially inefficient when the model is accurate) while PPI++ corresponds to AIPW with empirical efficiency maximization. Claims are supported by simulations, real-data experiments, and an accompanying Python package ppi_aipw.

Significance. If the first-order optimality results hold under the stated regularity conditions, this provides a simple, no-retraining enhancement to existing PPI/AIPW methods that can improve both prediction and inference efficiency in the semisupervised regime. The isotonic calibration optimality result is a clear strength, as is the unification of PPI, AIPW, and PPI++ via semiparametric efficiency theory. Credit is due for the reproducible Python package and the empirical validation across simulations and real data, which directly address finite-sample behavior. The approach is a natural, low-overhead extension of standard methods.

major comments (2)
  1. Isotonic calibration section (near the first-order optimality claim): the guarantee that 'no further post-processing of the fitted isotonic score yields additional first-order gains' assumes the isotonic function is estimated from the labeled sample and generalizes without introducing bias or excess variance that offsets gains on the unlabeled data. The manuscript should add an explicit statement or bound on when the estimation error from the small labeled sample does not dominate the efficiency improvement, as this is load-bearing for the central claim of first-order superiority over simpler rules.
  2. Linear calibration equivalence result (section deriving relationship to PPI++): while the derivation follows from standard semiparametric efficiency theory, the manuscript should specify the exact equation or proposition showing that linear post-processing on the labeled sample exactly recovers the empirical efficiency maximization step of PPI++; without this, the claimed first-order equivalence risks being interpretive rather than algebraic.
minor comments (3)
  1. Abstract: the statement of first-order optimality for isotonic calibration would be clearer if it referenced the specific theorem or proposition number establishing the result.
  2. Experiments section: figure captions for the simulation and real-data results should include the exact sample sizes (n labeled, N unlabeled) and the form of the black-box predictor to improve reproducibility.
  3. Notation: the calibration function (linear or isotonic) is introduced with varying symbols across sections; a single consistent definition early in the methods would aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive and constructive review, recommending minor revision. We address the two major comments point by point below and will update the manuscript accordingly to incorporate the suggested clarifications.

read point-by-point responses
  1. Referee: Isotonic calibration section (near the first-order optimality claim): the guarantee that 'no further post-processing of the fitted isotonic score yields additional first-order gains' assumes the isotonic function is estimated from the labeled sample and generalizes without introducing bias or excess variance that offsets gains on the unlabeled data. The manuscript should add an explicit statement or bound on when the estimation error from the small labeled sample does not dominate the efficiency improvement, as this is load-bearing for the central claim of first-order superiority over simpler rules.

    Authors: We agree that the finite-sample estimation error of the isotonic calibration function is an important consideration for the practical applicability of the first-order optimality result. Our theoretical analysis in the appendix establishes the asymptotic first-order gains under standard conditions where the calibration estimator converges at a sufficient rate (e.g., the isotonic regression error is o_p(N^{-1/2})). To address the referee's concern explicitly, we will add a new paragraph in the isotonic calibration section (Section 3.1) discussing the conditions under which the labeled sample size ensures that estimation error does not dominate the efficiency gains. This will include a reference to known convergence rates for isotonic regression and a note on the regime where n >> log N or similar to preserve the first-order improvements. revision: yes

  2. Referee: Linear calibration equivalence result (section deriving relationship to PPI++): while the derivation follows from standard semiparametric efficiency theory, the manuscript should specify the exact equation or proposition showing that linear post-processing on the labeled sample exactly recovers the empirical efficiency maximization step of PPI++; without this, the claimed first-order equivalence risks being interpretive rather than algebraic.

    Authors: We thank the referee for this suggestion to strengthen the algebraic clarity of the equivalence. The relationship is derived from the fact that linear calibration minimizes the same empirical risk as the efficiency maximization in PPI++, leading to identical asymptotic variance. In the revision, we will explicitly state this by referencing the specific optimization problem (currently in the appendix) and add a sentence in the main text of Section 3.2: 'The linear post-processing coefficients solve the same empirical minimization as the PPI++ adjustment, yielding the estimator equal to PPI++ up to o_p(N^{-1/2})'. This makes the equivalence algebraic rather than interpretive. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives Calibeating PPI from standard AIPW (Robins et al. 1994) and semiparametric efficiency theory, with isotonic calibration optimality established via first-order asymptotic arguments under regularity conditions for isotonic regression. Linear calibration equivalence to PPI++ follows from empirical efficiency maximization (Rubin et al. 2008) without reducing any new estimator to a fitted quantity by construction. Relationships among PPI, AIPW, and PPI++ are clarified as special cases or equivalences based on external literature, not self-referential loops or ansatzes smuggled via self-citation. The post-hoc calibration step on the labeled sample is a direct, non-circular extension that preserves the independent content of the efficiency guarantees.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard semiparametric efficiency theory for AIPW and on the statistical properties of post-hoc calibration; the only data-dependent elements are the calibration parameters estimated from the labeled sample.

free parameters (1)
  • calibration function parameters
    Linear coefficients or isotonic mapping fitted on the labeled sample; these are estimated rather than chosen by hand but are central to the method.
axioms (1)
  • domain assumption Labeled and unlabeled samples are drawn i.i.d. from the same joint distribution.
    Invoked implicitly for consistency of calibration and for the validity of AIPW-style estimators.

pith-pipeline@v0.9.0 · 5606 in / 1402 out tokens · 48930 ms · 2026-05-09T20:02:59.503867+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 39 canonical work pages · 5 internal anchors

  1. [1]

    arXiv preprint arXiv:2311.01453 , year=

    Anastasios N. Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I. Jordan, and Tijana Zrnic. Prediction- powered inference.Science, 382(6671):669–674, 2023a. Anastasios N Angelopoulos, John C Duchi, and Tijana Zrnic. Ppi++: Efficient prediction-powered inference. arXiv preprint arXiv:2311.01453, 2023b. Daniele Ballinari and Nora Bearth. Improving the ...

  2. [2]

    Data-adaptive smoothing for optimal-rate estimation of possibly non-regular parameters.arXiv preprint arXiv:1706.07408,

    Aurelien F Bibaut and Mark J van der Laan. Data-adaptive smoothing for optimal-rate estimation of possibly non-regular parameters.arXiv preprint arXiv:1706.07408,

  3. [3]

    Bibaut and Mark J

    Aurélien F. Bibaut and Mark J. van der Laan. Fast rates for empirical risk minimization over càdlàg functions with bounded sectional variation norm.arXiv preprint arXiv:1907.09244,

  4. [4]

    On the Equivalence between Neyman Orthogonality and Pathwise Differentiability

    Yuxi Chen, Edward H Kennedy, and Sivaraman Balakrishnan. On the equivalence between neyman orthogo- nality and pathwise differentiability.arXiv preprint arXiv:2603.15817,

  5. [5]

    Conditional influence functions.arXiv preprint arXiv:2412.18080,

    Victor Chernozhukov, Whitney K Newey, and Vasilis Syrgkanis. Conditional influence functions.arXiv preprint arXiv:2412.18080,

  6. [6]

    Prediction-powered generalization of causal inferences.arXiv preprint arXiv:2406.02873,

    Ilker Demirel, Ahmed Alaa, Anthony Philippakis, and David Sontag. Prediction-powered generalization of causal inferences.arXiv preprint arXiv:2406.02873,

  7. [7]

    calibeating

    URLhttps://arxiv.org/abs/ 1712.06170. Dean P Foster and Sergiu Hart. “calibeating”: Beating forecasters at their own game.Theoretical Economics, 18(4):1441–1474,

  8. [8]

    Towards a unified theory for semiparametric data fusion with individual-level data.arXiv preprint arXiv:2409.09973,

    Ellen Graham, Marco Carone, and Andrea Rotnitzky. Towards a unified theory for semiparametric data fusion with individual-level data.arXiv preprint arXiv:2409.09973,

  9. [9]

    , author Xia, D

    Yanwu Gu and Dong Xia. Local prediction-powered inference.arXiv preprint arXiv:2409.18321,

  10. [10]

    Propensity score models are better when post-calibrated

    Rom Gutman, Ehud Karavani, and Yishai Shimoni. Propensity score models are better when post-calibrated. arXiv preprint arXiv:2211.01221,

  11. [11]

    Robustness of shape-restricted regression estimators: An envelope perspective

    Qiyang Han and Jon A Wellner. Robustness of shape-restricted regression estimators: An envelope perspective. arXiv preprint arXiv:1805.02542,

  12. [12]

    Powering rcts for marginal effects with glms using prognostic score adjustment.arXiv preprint arXiv:2503.22284,

    Emilie Højbjerre-Frandsen, Mark J van der Laan, and Alejandro Schuler. Powering rcts for marginal effects with glms using prognostic score adjustment.arXiv preprint arXiv:2503.22284,

  13. [13]

    Predictions as surrogates: Revisiting surrogate outcomes in the age of ai.arXiv preprint arXiv:2501.09731,

    Wenlong Ji, Lihua Lei, and Tijana Zrnic. Predictions as surrogates: Revisiting surrogate outcomes in the age of ai.arXiv preprint arXiv:2501.09731,

  14. [14]

    Smooth isotonic regression: a new method to calibrate predictive models.AMIA Summits on Translational Science Proceedings, 2011:16,

    Xiaoqian Jiang, Melanie Osl, Jihoon Kim, and Lucila Ohno-Machado. Smooth isotonic regression: a new method to calibrate predictive models.AMIA Summits on Translational Science Proceedings, 2011:16,

  15. [15]

    MEC: Machine-Learning-Assisted Generalized Entropy Calibration for Semi-Supervised Mean Estimation

    Se Yoon Lee and Jae-Kwang Kim. Mec: Machine-learning-assisted generalized entropy calibration for semi-supervised mean estimation.arXiv preprint arXiv:2604.05446,

  16. [16]

    Calibration of probabilities: The state of the art

    Sarah Lichtenstein, Baruch Fischhoff, and Lawrence D Phillips. Calibration of probabilities: The state of the art. InDecision Making and Change in Human Affairs: Proceedings of the Fifth Research Conference on Subjective Probability, Utility, and Decision Making, Darmstadt, 1–4 September, 1975, pages 275–324. Springer,

  17. [17]

    Robust estimation and inference in hybrid controlled trials for binary outcomes: A case study on non-small cell lung cancer.arXiv preprint arXiv:2505.00217,

    Jiajun Liu, Ke Zhu, Shu Yang, and Xiaofei Wang. Robust estimation and inference in hybrid controlled trials for binary outcomes: A case study on non-small cell lung cancer.arXiv preprint arXiv:2505.00217,

  18. [18]

    Adaptive sequential design for a single time-series

    Ivana Malenica, Aurelien Bibaut, and Mark J van der Laan. Adaptive sequential design for a single time-series. arXiv preprint arXiv:2102.00102,

  19. [19]

    Ppi is the difference estimator: Recognizing the survey sampling roots of prediction-powered inference.arXiv preprint arXiv:2603.19160,

    Reagan Mozer. Ppi is the difference estimator: Recognizing the survey sampling roots of prediction-powered inference.arXiv preprint arXiv:2603.19160,

  20. [20]

    Efficient targeted maximum likelihood estimators for two-phase design problems.arXiv preprint arXiv:2602.24131,

    Sky Qiu, Susan Gruber, Pamela A Shaw, Brian D Williamson, and Mark J van der Laan. Efficient targeted maximum likelihood estimators for two-phase design problems.arXiv preprint arXiv:2602.24131,

  21. [21]

    Calibration strategies for robust causal estimation: Theoretical and empirical insights on propensity score based estimators.arXiv preprint arXiv:2503.17290,

    Jan Rabenseifner, Sven Klaassen, Jannis Kueck, and Philipp Bach. Calibration strategies for robust causal estimation: Theoretical and empirical insights on propensity score based estimators.arXiv preprint arXiv:2503.17290,

  22. [22]

    A note on the relation between one-step, outcome regression and ipw-type estimators of parameters with the mixed bias property.arXiv preprint arXiv:2509.22452,

    Andrea Rotnitzky, Ezequiel Smucler, and James M Robins. A note on the relation between one-step, outcome regression and ipw-type estimators of parameters with the mixed bias property.arXiv preprint arXiv:2509.22452,

  23. [23]

    Demystifying prediction powered inference.arXiv preprint arXiv:2601.20819,

    Yilin Song, Dan M Kluger, Harsh Parikh, and Tian Gu. Demystifying prediction powered inference.arXiv preprint arXiv:2601.20819,

  24. [24]

    Prediction-powered conditional inference.arXiv preprint arXiv:2603.05575,

    29 Yang Sui, Jin Zhou, Hua Zhou, and Xiaowu Dai. Prediction-powered conditional inference.arXiv preprint arXiv:2603.05575,

  25. [25]

    Consistency of the bootstrap for asymptotically linear estimators based on machine learning.arXiv preprint arXiv:2404.03064,

    Zhou Tang and Ted Westling. Consistency of the bootstrap for asymptotically linear estimators based on machine learning.arXiv preprint arXiv:2404.03064,

  26. [26]

    Generalized venn and venn-abers calibration with applications in conformal prediction.arXiv preprint arXiv:2502.05676,

    Lars van der Laan and Ahmed Alaa. Generalized venn and venn-abers calibration with applications in conformal prediction.arXiv preprint arXiv:2502.05676,

  27. [27]

    Bellman Calibration for $V$-Learning in Offline Reinforcement Learning

    Lars van der Laan and Nathan Kallus. Bellman calibration for v-learning in offline reinforcement learning. arXiv preprint arXiv:2512.23694,

  28. [28]

    Combining t-learning and dr-learning: a framework for oracle-efficient estimation of causal contrasts.arXiv preprint arXiv:2402.01972, 2024a

    Lars van der Laan, Marco Carone, and Alex Luedtke. Combining t-learning and dr-learning: a framework for oracle-efficient estimation of causal contrasts.arXiv preprint arXiv:2402.01972, 2024a. Lars van der Laan, Ziming Lin, Marco Carone, and Alex Luedtke. Stabilized inverse probability weighting via isotonic calibration.arXiv preprint arXiv:2411.06342, 20...

  29. [29]

    org/2019/12/24/cv-tmle-and-double-machine-learning/

    URLhttps://vanderlaan-lab. org/2019/12/24/cv-tmle-and-double-machine-learning/. Mark van der Laan, Zeyi Wang, and Lars van der Laan. Higher order targeted maximum likelihood estimation. arXiv preprint arXiv:2101.06290,

  30. [30]

    Venn-abers predictors.arXiv preprint arXiv:1211.0025,

    Vladimir Vovk and Ivan Petej. Venn-abers predictors.arXiv preprint arXiv:1211.0025,

  31. [31]

    org/abs/2308.01222

    Cheng Wang. Calibration in deep learning: A survey of the state-of-the-art.arXiv preprint arXiv:2308.01222,

  32. [32]

    Efficient inference using large language models with limited human data: Fine-tuning then rectification.arXiv preprint arXiv:2511.19486,

    Lei Wang, Zikun Ye, and Jinglong Zhao. Efficient inference using large language models with limited human data: Fine-tuning then rectification.arXiv preprint arXiv:2511.19486,

  33. [33]

    neurips.cc/paper_files/paper/2024/hash/ad236edc564f3e3156e1b2feafb99a24-Abstract.html

    URLhttps://proceedings. neurips.cc/paper_files/paper/2024/hash/ad236edc564f3e3156e1b2feafb99a24-Abstract.html. Ted Westling and Marco Carone. A unified study of nonparametric inference for monotone functions.Annals of statistics, 48(2):1001,

  34. [34]

    Orthogonal causal calibration.arXiv preprint arXiv:2406.01933,

    J Whitehouse, C Jung, V Syrgkanis, B Wilder, and ZS Wu. Orthogonal causal calibration.arXiv preprint arXiv:2406.01933,

  35. [35]

    A unified framework for semiparametrically efficient semi-supervised learning.arXiv preprint arXiv:2502.17741,

    Zichun Xu, Daniela Witten, and Ali Shojaie. A unified framework for semiparametrically efficient semi- supervised learning.arXiv preprint arXiv:2502.17741,

  36. [36]

    Contraction and uniform convergence of isotonic regression.arXiv preprint arXiv:1706.01852,

    Fan Yang and Rina Foygel Barber. Contraction and uniform convergence of isotonic regression.arXiv preprint arXiv:1706.01852,

  37. [37]

    Improving Treatment Effect Estimation in Trials through Adaptive Borrowing of External Controls

    Qinwei Yang, Jingyi Li, Peng Wu, and Shu Yang. Improving treatment effect estimation in trials through adaptive borrowing of external controls.arXiv preprint arXiv:2604.13973,

  38. [38]

    Efficient statistical estimation for sequential adaptive experiments with implications for adaptive designs.arXiv preprint arXiv:2508.09135,

    Wenxin Zhang and Mark van der Laan. Efficient statistical estimation for sequential adaptive experiments with implications for adaptive designs.arXiv preprint arXiv:2508.09135,

  39. [39]

    Constructing confidence intervals for infinite-dimensional functional parameters by highly adaptive lasso.arXiv preprint arXiv:2507.10511,

    Wenxin Zhang, Junming Shi, Alan Hubbard, and Mark van der Laan. Constructing confidence intervals for infinite-dimensional functional parameters by highly adaptive lasso.arXiv preprint arXiv:2507.10511,

  40. [40]

    Wenjing Zheng and Mark J Van Der Laan

    doi: 10.1038/s41596-019-0227-6. Wenjing Zheng and Mark J Van Der Laan. Asymptotic theory for cross-validated targeted maximum likelihood estimation

  41. [41]

    Instruction-Following Evaluation for Large Language Models

    URLhttps://arxiv.org/abs/ 2311.07911. Tijana Zrnic and Emmanuel J. Candès. Cross-prediction-powered inference.Proceedings of the National Academy of Sciences, 121(15):e2322083121, 2024a. Tijana Zrnic and Emmanuel J Candès. Active statistical inference.arXiv preprint arXiv:2403.03208, 2024b. José R Zubizarreta. Stable weights that balance covariates for es...

  42. [42]

    This characterization shows that the AIPW class is the relevant first-order benchmark in this model

    Equivalently, for every regular estimatorbψ†, there exists such anf 0 for which bψ† − bψ(f0) =o p(M −1/2). This characterization shows that the AIPW class is the relevant first-order benchmark in this model. The next remark illustrates that several estimators not always written in AIPW form—including regression adjust- ment, balancing-weight adjustment, a...

  43. [43]

    The corresponding plug-in estimator is ρnPL n {mn(X)}+ (1−ρ n)PU N {mn(eX)}. For anyλ∈R, this estimator also admits the AIPW representation ρnPL n {mn(X)}+ (1−ρ n)PU N {mn(eX)}+λP L n Y−m n(X) , because the augmentation termPL n {Y−m n(X)} is zero by the least-squares normal equations. Thus, when mn converges to the best linear predictor ofY given m(X), t...

  44. [44]

    Assume Y∈ [0, 1], or rescale otherwise

    for generalized versions beyond the binary setting. Assume Y∈ [0, 1], or rescale otherwise. For eachx, augment the labeled calibration sample with the hypothetical point( m(x), y)for y∈ { 0, 1}, fit the resulting isotonic calibratorsf(x,0) n and f(x,1) n , and form the interval-valued prediction f(x,0) n {m(x)}, f (x,1) n {m(x)} . Following Vovk and Petej...

  45. [45]

    Its influence function is the difference of the two arm-specific AIPW influences, Dτ(O) = ˜m1(X)−µ 1 + A π1 Y−˜m 1(X) − ˜m0(X)−µ 0 + 1−A π0 Y−˜m 0(X)

    The average treatment effect is then estimated by ˆτcal := ˆµcal 1 −ˆµcal 0 .(18) Equivalently, ˆτcal =P n{˜m1(X)−˜m0(X)}+P n A ˆπ1 {Y−˜m 1(X)} − 1−A ˆπ0 {Y−˜m 0(X)} . Its influence function is the difference of the two arm-specific AIPW influences, Dτ(O) = ˜m1(X)−µ 1 + A π1 Y−˜m 1(X) − ˜m0(X)−µ 0 + 1−A π0 Y−˜m 0(X) . Accordingly, Wald inference for the A...

  46. [46]

    Expanding the variance ofu⊤Dm(O)therefore yields Var[u⊤Dm(O)] = Var[u⊤Deff(O)] +E P0 1−π 0(X) π0(X) u⊤A−1 0 δm(X) 2

    Moreover, EP0 " 1− R π0(X) 2 X # = 1−π 0(X) π0(X) . Expanding the variance ofu⊤Dm(O)therefore yields Var[u⊤Dm(O)] = Var[u⊤Deff(O)] +E P0 1−π 0(X) π0(X) u⊤A−1 0 δm(X) 2 . The second term is nonnegative and equals zero if and only ifδm(X) = 0almost surely. Hence m0(X, θ0) minimizes the asymptotic variance of every linear contrast, soDeff is the efficient in...

  47. [47]

    Also, ρ0Var[C] =ρ 0 ·ρ −2 0 EP0[ε2] =ρ −1 0 EP0 Var[Y|X]

    Thus ρ0Var A+ 1−ρ −1 0 B + (1−ρ 0)Var[A+B] = Var[A] + 1−ρ 0 ρ0 Var[B]. Also, ρ0Var[C] =ρ 0 ·ρ −2 0 EP0[ε2] =ρ −1 0 EP0 Var[Y|X] . SinceE P0[B(X)] = 0, Var[B] =E P0 ( ˜f(X)−µ 0(X))2 . Combining the preceding displays yields ρ0Var[DL f (X, Y)] + (1−ρ 0)Var[DU f (eX)] = Var[µ0(X)] +ρ −1 0 EP0 Var[Y|X] + 1−ρ 0 ρ0 EP0 ( ˜f(X)−µ 0(X))2 . This expression is mini...

  48. [48]

    Since δ is arbitrary, the resulting rates and distributional statements hold unconditionally

    We carry out the deterministic bounded-class arguments onAn,C, where all implicit constants may depend onC but not onn. Since δ is arbitrary, the resulting rates and distributional statements hold unconditionally. 49 Lemma 4(IsotonicL 2 rate).Suppose Assumptions 1 and 2 holds. Then ∥m⋆ n,iso −m 0∥2 2,P0,X =O p(n−2/3). Proof of Lemma 4.LetT:=m(X)and let MC...

  49. [49]

    Then (PL n −P 0,X){m⋆ n,iso(X)−m 0(X)}=O p(n−2/3),(P U N −P 0,X){m⋆ n,iso(eX)−m 0(eX)}=O p n−1/3N −1/2

    52 Lemma 5(Centered empirical-process bounds for isotonic calibration).Suppose Assumptions 1 and 2 holds. Then (PL n −P 0,X){m⋆ n,iso(X)−m 0(X)}=O p(n−2/3),(P U N −P 0,X){m⋆ n,iso(eX)−m 0(eX)}=O p n−1/3N −1/2 . Proof of Lemma 5.By Lemma 4, ∥m⋆ n,iso −m 0∥2 2,P0,X =O p(n−2/3). To control the centered empirical-process term, work on the eventAn,C from the p...

  50. [50]

    The latter controls the remainder generated by estimating the monotone post-processing map and yields then−2/3 remainder rate

    I Proof of asymptotic linearity Proof of theorem 2.The proof combines the exact AIPW representation from Theorem 1 with a localized empirical-process bound for the isotonic calibrator class. The latter controls the remainder generated by estimating the monotone post-processing map and yields then−2/3 remainder rate. The overall argument is closely related...

  51. [51]

    Package documentation and worked examples are hosted at larsvanderlaan.github.io/ppi-aipw/

    MPythonCode Code availability.A public repository containing the ppi_aipw package, experiment scripts, and paper assets is available at github.com/Larsvanderlaan/ppi-aipw. Package documentation and worked examples are hosted at larsvanderlaan.github.io/ppi-aipw/. Below, we provide a minimal, self-contained implementation of the method introduced in this p...

  52. [52]

    estimate

    ci = (psi_hat - z * se_hat, psi_hat + z * se_hat) return { "estimate": psi_hat, "standard_error": se_hat, "confidence_interval": ci, } # Example: # fit = linear_calibrated_plugin( # Y_labeled=Y, # score_labeled=mX, # score_unlabeled=mX_tilde, # ) 65 Datasetn NEstimator Bias Variance MSE Coverage RelEff census_income_semisupervised_mean 50 380041 AIPW 1505...