Optimized Labeling Resource Allocation for Prediction-Assisted Inference via OPAL

Emmanuel J. Cand\`es; Virginia L. Ma

arxiv: 2606.03211 · v1 · pith:HYPWEQHHnew · submitted 2026-06-02 · 📊 stat.ME · stat.ML

Optimized Labeling Resource Allocation for Prediction-Assisted Inference via OPAL

Virginia L. Ma , Emmanuel J. Cand\`es This is my paper

Pith reviewed 2026-06-28 09:06 UTC · model grok-4.3

classification 📊 stat.ME stat.ML

keywords active statistical inferencelabeling allocationprediction-assisted inferenceOPALfinite-sample coveragesmooth policiesodds ratiosblack-box models

0 comments

The pith

OPAL optimizes labeling policies within smooth classes to deliver valid finite-sample inference with far fewer labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents OPAL as a way to strengthen active statistical inference, where a black-box machine learning model guides which data points to label. By learning an optimal labeling strategy inside a tractable family of smooth policies, OPAL produces estimators that keep nominal coverage while cutting variance. The method forms an end-to-end pipeline that converts uncertainty scores into adaptive label allocation and then computes confidence intervals on the resulting samples. Experiments on breast-cancer histopathology images, social-science data, and proteomics show the intervals achieve the accuracy expected from methods that use substantially more labels.

Core claim

OPAL learns a labeling strategy within a tractable class of smooth policies to yield estimators with the lowest variance; the resulting pipeline achieves nominal coverage in finite samples and the accuracy one expects from methods which have far more labeled samples.

What carries the argument

OPAL (Optimized Policy for Allocation of Labels), which converts black-box uncertainty scores into a data-adaptive labeling strategy by optimizing inside a class of smooth policies.

If this is right

Valid confidence intervals for odds ratios across demographic groups can be obtained from histopathology images with reduced labeling effort.
The same optimized allocation works on datasets from computational social science and proteomics while retaining coverage.
The pipeline removes brittleness caused by noisy uncertainty estimates without sacrificing statistical guarantees.
Estimators achieve accuracy comparable to far larger labeled sets while using only the labels selected by the learned policy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could lower labeling costs in any domain where a predictive model already exists and labels are expensive to obtain.
Extensions might test whether the same smooth-policy optimization improves other active-inference tasks such as estimating means or regression coefficients.
If the smooth class is rich enough, similar gains may appear when the black-box model is replaced by newer architectures.

Load-bearing premise

Optimizing a labeling strategy inside a tractable class of smooth policies produces estimators with the lowest variance while preserving the provable guarantees of the active inference framework.

What would settle it

An experiment in which the coverage probability of OPAL confidence intervals falls materially below the nominal level on finite samples from the breast-cancer histopathology data would falsify the finite-sample guarantee.

Figures

Figures reproduced from arXiv: 2606.03211 by Emmanuel J. Cand\`es, Virginia L. Ma.

**Figure 2.** Figure 2: Overview of OPAL incorporating (a) optimization modules for finding labeling policy [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Labeling policy generated via active vs. OPAL based on data generated from (a) balanced [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Coverage for odds ratio estimation of cardiomegaly in patients below vs. over 40 years of age (a) usual Monte Carlo coverage; (b) coverage after finite-population calibration, detailed in Section 4.2, accounting for the fact that inference is evaluated against the fixed empirical population rather than an independent superpopulation draw. The dashed horizontal line indicates the nominal 90% target. Results… view at source ↗

**Figure 5.** Figure 5: Stability of odds ratio estimation of cardiomegaly in patients below vs. over 40 years of age. Variability of estimates, interval widths, left and right endpoints over 500 Monte Carlo trials. The budget given on the x-axis (denoted by the number of labels acquired, nhuman) ranges from 10% to 20% of the total unlabeled observations. this combination makes accurate subgroup inference both important and chall… view at source ↗

**Figure 6.** Figure 6: Odds ratio estimation of triple negative breast cancer in Caucasian vs. African American women (a) effective sample size of each method where solid line denotes baseline and dashed denotes with power tuning (for active proportional-to-uncertainty labeling and active spline-parametrized optimal labeling); (b) coverage of each method, with correction to adjust for finite-population effects. We perform 500 t… view at source ↗

**Figure 7.** Figure 7: Odds ratio estimation of global warming stance with affirming devices Effective sample size of each method under (a) batch sampling and (b) sequential sampling. We perform 500 trials per method at each budget level (20-50%), and average over these trials in the reported results. proportional-to-uncertainty labeling even in the sequential setting. All methods achieve 90% coverage (with finite-population cor… view at source ↗

**Figure 8.** Figure 8: Odds ratio estimation of intrinsic disorder using AlphaFold-derived predictors (a) effective sample size of each method where solid line denotes baseline and dashed denotes with power tuning (for active proportional-to-uncertainty labeling and active spline-parametrized optimal labeling); (b) coverage of each method, with correction to adjust for finite-population effects. We perform 500 trials per method … view at source ↗

**Figure 9.** Figure 9: Effective sample size in the unbalanced group size setting with (a) oracle uncertainties, (b) esti [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

**Figure 10.** Figure 10: Effective sample size for Kendall’s Tau simulation. We perform 500 trials per method at each [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗

**Figure 11.** Figure 11: Full overview of OPAL incorporating (a) optimization modules for finding labeling policy [PITH_FULL_IMAGE:figures/full_fig_p036_11.png] view at source ↗

**Figure 12.** Figure 12: Predictive performance of the CheXpert-pretrained model for cardiomegaly: (a) shows the overall [PITH_FULL_IMAGE:figures/full_fig_p068_12.png] view at source ↗

**Figure 13.** Figure 13: The TNBC prediction task is substantially more challenging than the CheXpert cardiomegaly task because TNBC is rare (both in the broader population and in the data set), comprising approximately 17% of observations. As a result, overall accuracy is a misleading performance measure: a majority classifier that always predicts non-TNBC already achieves approximately 83% accuracy. The CNN’s thresholded predi… view at source ↗

**Figure 13.** Figure 13: Predictive performance of the CNN for TNBC classification: Panel (a) compares the true TNBC [PITH_FULL_IMAGE:figures/full_fig_p071_13.png] view at source ↗

**Figure 14.** Figure 14: Odds ratio estimation of global warming stance with affirming devices Coverage of each method under (a) batch sampling, uncorrected and (b) batch, adjusted for finite population, (c) sequential sampling, uncorrected, and (d) sequential, adjusted for finite population. We perform 500 trials per method at each budget level (20-50%), and average over these trials in the reported results. is prompted zero-sho… view at source ↗

**Figure 7.** Figure 7 [PITH_FULL_IMAGE:figures/full_fig_p072_7.png] view at source ↗

**Figure 15.** Figure 15: Stability of odds ratio estimation of global warming stance in the media in the presence of affirming devices vs. no affirming devices. Variability of estimates, interval widths, left and right endpoints over 500 Monte Carlo trials. The budget given on the x-axis (denoted by the number of labels acquired, nhuman) ranges from 10% to 20% of the total unlabeled observations. J.4 Additional details: Alphafold… view at source ↗

**Figure 16.** Figure 16: Odds ratio estimation of global warming stance with affirming devices: sequential sampling. Distribution of effective sample size (x-axis) of each method across 500 iterations with probabilities renormalized to preserve the target expected number of labels. Under this convention, λ = 1 gives the original adaptive method and λ = 0 gives pure uniform sampling. For each budget, we selected λ using a labeled … view at source ↗

**Figure 17.** Figure 17: Optimal mixing weight with uniform sampling in the odds-ratio simulation. For each budget [PITH_FULL_IMAGE:figures/full_fig_p075_17.png] view at source ↗

**Figure 18.** Figure 18: Unbalanced group sizes: group 1 comprises 95% of population (a) oracle uncertainties used for [PITH_FULL_IMAGE:figures/full_fig_p075_18.png] view at source ↗

read the original abstract

Active Statistical Inference is a new framework to make precise claims about population parameters with provable statistical guarantees. It uses a predictive "black-box" machine learning (ML) model to strategically decide which data points to label, roughly prioritizing samples for which the ML model is unsure about their label values. A major issue is that the framework can be brittle when uncertainty estimates are noisy. This paper introduces OPAL (Optimized Policy for Allocation of Labels), which learns a labeling strategy within a tractable class of smooth policies to yield estimators with the lowest variance. In effect, OPAL is an end-to-end pipeline that turns a black-box model's uncertainty scores into a data-adaptive labeling strategy and then performs inference on the collected samples. We evaluate OPAL on real datasets spanning medical imaging data, computational social science, and proteomics. As a concrete example, we consider predicting breast cancer subtype from histopathology images and using OPAL to form valid confidence intervals for odds ratios for different demographic groups. We show that OPAL achieves nominal coverage in finite samples and has the accuracy one expects from methods which have far more labeled samples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OPAL adds an optimization layer for label allocation in active inference but the finite-sample coverage claim on real data is hard to verify without known truth.

read the letter

The main takeaway is that OPAL learns a smooth labeling policy from a black-box model's uncertainty scores to cut variance in the active statistical inference framework while claiming to keep nominal coverage. It turns the earlier approach into an end-to-end pipeline that adapts which points get labeled.

The paper does a solid job of showing how this works on real applications. The breast cancer histopathology example for odds ratios by demographic group, plus the other medical imaging, social science, and proteomics cases, gives a practical sense of where the method could reduce labeling costs. That evaluation on actual datasets is useful and moves beyond toy examples.

The soft spot is the coverage statement. The abstract says OPAL achieves nominal coverage in finite samples, yet the evaluations are on real data where the true parameter is unknown, so direct coverage checks are impossible. Only interval width or point estimates can be compared. The stress-test note is on target here; without accompanying simulation results with known truth, that claim rests on unshown work. The abstract also gives no derivation or error analysis for how the policy optimization preserves the guarantees, so the central argument is difficult to assess from what is provided.

This is for people already working in active inference or label-efficient statistical estimation, especially those in applied areas like medical imaging or social science who need valid intervals with limited labels. A reader looking for a concrete way to improve on brittle uncertainty-based sampling would get value from the pipeline description and the real-data results.

The paper deserves peer review. The idea targets a clear limitation in the framework and includes relevant experiments, even though the coverage evidence needs more support in the full text.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces OPAL (Optimized Policy for Allocation of Labels), a method that optimizes labeling strategies within a class of smooth policies for active statistical inference. It uses ML uncertainty scores to adaptively select samples for labeling, aiming to produce estimators with minimal variance while maintaining the provable guarantees of the framework. The approach is evaluated on real-world datasets from medical imaging (breast cancer subtype prediction), computational social science, and proteomics, with a specific example on odds ratios by demographic groups. The central claim is that OPAL achieves nominal coverage in finite samples and accuracy comparable to methods using substantially more labeled data.

Significance. If the finite-sample coverage and variance reduction claims hold, this work could have significant impact on resource-efficient statistical inference in domains where labeling is costly, such as medical imaging and social science surveys. The end-to-end pipeline from black-box ML to adaptive labeling and inference represents a practical advancement in prediction-assisted inference methods.

major comments (2)

[Abstract and Evaluation] Abstract and Evaluation: The claim that OPAL 'achieves nominal coverage in finite samples' is load-bearing for the paper's contribution, but the described experiments are conducted on real datasets (e.g., histopathology images, odds ratios) where the true parameter values are unknown. Coverage probability cannot be directly computed without ground truth, and the abstract provides no mention of accompanying simulation studies with known truth parameters that would allow verification of this claim. This issue must be addressed to support the central assertion.
[Methods] Methods/Optimization: The premise that optimizing a labeling strategy inside a tractable class of smooth policies produces estimators with the lowest variance while preserving the provable guarantees requires an explicit derivation, algorithm, or validation procedure (e.g., the objective function or optimization routine used to learn the policy). Without this, the connection between the optimization and the claimed variance reduction remains unclear.

minor comments (2)

The abstract could benefit from a brief mention of the specific optimization technique or loss function used for policy learning.
Ensure that all datasets, evaluation metrics, and any simulation setups are clearly defined in the main text for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on the coverage claims and the optimization details. We address each major point below and will revise the manuscript accordingly to strengthen the presentation of our results.

read point-by-point responses

Referee: [Abstract and Evaluation] Abstract and Evaluation: The claim that OPAL 'achieves nominal coverage in finite samples' is load-bearing for the paper's contribution, but the described experiments are conducted on real datasets (e.g., histopathology images, odds ratios) where the true parameter values are unknown. Coverage probability cannot be directly computed without ground truth, and the abstract provides no mention of accompanying simulation studies with known truth parameters that would allow verification of this claim. This issue must be addressed to support the central assertion.

Authors: We agree that coverage cannot be directly verified on real datasets without known ground truth. To support the finite-sample coverage claim, we will add a new simulation study section with known truth parameters to the revised manuscript. These simulations will be referenced in an updated abstract to explicitly demonstrate nominal coverage under controlled conditions, complementing the real-data results. revision: yes
Referee: [Methods] Methods/Optimization: The premise that optimizing a labeling strategy inside a tractable class of smooth policies produces estimators with the lowest variance while preserving the provable guarantees requires an explicit derivation, algorithm, or validation procedure (e.g., the objective function or optimization routine used to learn the policy). Without this, the connection between the optimization and the claimed variance reduction remains unclear.

Authors: The optimization is described in the Methods section, but we acknowledge the need for greater explicitness. In the revision we will add a detailed derivation of the variance objective function, the gradient-based optimization routine, and pseudocode for learning the smooth policy parameters while preserving the coverage guarantees. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation chain self-contained with no reductions to inputs or self-citations

full rationale

The provided abstract and context describe OPAL as optimizing a labeling policy within a class of smooth policies to minimize variance, followed by inference with claimed finite-sample coverage. No equations, fitted parameters renamed as predictions, or self-citation chains are present in the text. The central claims rest on the active inference framework and policy optimization without any quoted step that reduces by construction to its own inputs. Evaluation on real datasets is described but does not exhibit self-definitional or fitted-input circularity. This is the expected honest non-finding for a methods paper whose abstract contains no load-bearing derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations, derivations, or experimental sections from which free parameters, axioms, or invented entities can be extracted.

pith-pipeline@v0.9.1-grok · 5724 in / 1056 out tokens · 15990 ms · 2026-06-28T09:06:02.741962+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references · 24 canonical work pages

[1]

A rewriting system for convex optimization problems

Akshay Agrawal, Steven Diamond, and Stephen Boyd. “A rewriting system for convex optimization problems”. In:Journal of Control and Decision5.1 (2018), pp. 42–60.doi:10.1080/23307706.2017. 1282058

work page doi:10.1080/23307706.2017 2018
[2]

Disciplined Geometric Programming

Akshay Agrawal, Steven Diamond, and Stephen Boyd. “Disciplined Geometric Programming”. In: Optimization Letters13.5 (2019), pp. 961–976

2019
[3]

Two-Phase Sampling Designs for Data Validation in Settings with Covariate Measurement Error and Continuous Outcome

Gustavo G. C. Amorim et al. “Two-Phase Sampling Designs for Data Validation in Settings with Covariate Measurement Error and Continuous Outcome”. In:Journal of the Royal Statistical Society: Series A184.4 (2021), pp. 1368–1389.doi:10.1111/rssa.12689

work page doi:10.1111/rssa.12689 2021
[4]

Anastasios N Angelopoulos et al.Cost-Optimal Active AI Model Evaluation. 2025. arXiv:2506.07949 [cs.LG].url:https://arxiv.org/abs/2506.07949

arXiv 2025
[5]

Angelopoulos, John C

Anastasios N. Angelopoulos, John C. Duchi, and Tijana Zrnic.PPI++: Efficient Prediction-Powered Inference. 2024. arXiv:2311.01453 [stat.ML].url:https://arxiv.org/abs/2311.01453

Pith/arXiv arXiv 2024
[6]

Prediction-powered inference

Anastasios N. Angelopoulos et al. “Prediction-powered inference”. In:Science382.6671 (2023), pp. 669– 674

2023
[7]

MOSEK ApS

MOSEK ApS.The MOSEK Optimization Toolbox for Python 9.3.https://docs.mosek.com/latest/ python/. MOSEK ApS. Copenhagen, Denmark, 2022

2022
[8]

Policy Learning With Observational Data

S. Athey and S. Wager. “Policy Learning With Observational Data”. In:Econometrica89 (2021), pp. 133–161

2021
[9]

Bickel et al.Efficient and Adaptive Estimation for Semiparametric Models

Peter J. Bickel et al.Efficient and Adaptive Estimation for Semiparametric Models. Springer, 1993

1993
[10]

Patrick Billingsley.Probability and Measure. 3rd ed. Wiley, 1995

1995
[11]

The structural context of posttranslational modifications at a proteome-wide scale

I Bludau et al. “The structural context of posttranslational modifications at a proteome-wide scale”. In:PLoS Biology20.5 (2022), e3001636

2022
[12]

Carl de Boor.A Practical Guide to Splines. Vol. 27. Applied Mathematical Sciences. New York: Springer, 1978.isbn: 978-0387953663. 27

1978
[13]

A Tutorial on Geometric Programming

Stephen Boyd et al. “A Tutorial on Geometric Programming”. In:Optimization and Engineering8.1 (2007), pp. 67–127.doi:10.1007/s11081-007-9001-7

work page doi:10.1007/s11081-007-9001-7 2007
[14]

Improved Horvitz–Thompson Estimation of Model Parameters from Two- Phase Stratified Samples: Applications in Epidemiology

Norman E. Breslow et al. “Improved Horvitz–Thompson Estimation of Model Parameters from Two- Phase Stratified Samples: Applications in Epidemiology”. In:Statistics in Biosciences1.1 (2009), pp. 32–49.doi:10.1007/s12561-009-9001-6

work page doi:10.1007/s12561-009-9001-6 2009
[15]

Surrogate-Powered Inference: Regularization and Adaptivity

Jianmin Chen et al. “Surrogate-Powered Inference: Regularization and Adaptivity”. In:arXiv preprint arXiv:2512.21826(2025).doi:10.48550/arXiv.2512.21826.url:https://arxiv.org/abs/2512. 21826

work page doi:10.48550/arxiv.2512.21826.url:https://arxiv.org/abs/2512 2025
[16]

Double/Debiased Machine Learning for Treatment and Structural Param- eters

Victor Chernozhukov et al. “Double/Debiased Machine Learning for Treatment and Structural Param- eters”. In:The Econometrics Journal21.1 (2018), pp. C1–C68.doi:10.1111/ectj.12097

work page doi:10.1111/ectj.12097 2018
[18]

Cochran.Sampling Techniques

William G. Cochran.Sampling Techniques. 3rd ed. New York: John Wiley & Sons, 1977

1977
[19]

On the limits of cross-domain generalization in automated X-ray prediction

Joseph Paul Cohen et al. “On the limits of cross-domain generalization in automated X-ray prediction”. In:Medical Imaging with Deep Learning. 2020.url:https://arxiv.org/abs/2002.02497

arXiv 2020
[20]

TorchXRayVision: A library of chest X-ray datasets and models

Joseph Paul Cohen et al. “TorchXRayVision: A library of chest X-ray datasets and models”. In:Medical Imaging with Deep Learning. 2022.url:https://github.com/mlmed/torchxrayvision

2022
[21]

On P´ olya Frequency Functions. IV. The Fundamental Spline Functions and Their Limits

H. B. Curry and I. J. Schoenberg. “On P´ olya Frequency Functions. IV. The Fundamental Spline Functions and Their Limits”. In:Journal d’Analyse Math´ ematique17 (1966), pp. 71–107.doi:10. 1007/BF02788653

1966
[22]

CVXPY: A Python-Embedded Modeling Language for Convex Optimization

Steven Diamond and Stephen Boyd. “CVXPY: A Python-Embedded Modeling Language for Convex Optimization”. In:The Journal of Machine Learning Research17.83 (2016), pp. 1–5.url:http : //jmlr.org/papers/v17/15-291.html

2016
[23]

Can Unconfident LLM Annotations Be Used for Confident Conclusions?

Kristina Gligoric et al. “Can Unconfident LLM Annotations Be Used for Confident Conclusions?” In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Com- putational Linguistics: Human Language Technologies (Volume 1: Long Papers). Ed. by Luis Chiruzzo, Alan Ritter, and Lu Wang. Albuquerque, New Mexico: Associa...

2025
[24]

Asymptotic Normality of Simple Linear Rank Statistics under Alternatives

Jaroslav H´ ajek. “Asymptotic Normality of Simple Linear Rank Statistics under Alternatives”. In:Pro- ceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability. Vol. 1. Berkeley: University of California Press, 1972, pp. 139–152

1972
[25]

A Generalization of Sampling Without Replacement from a Finite Universe

D. G. Horvitz and D. J. Thompson. “A Generalization of Sampling Without Replacement from a Finite Universe”. In:Journal of the American Statistical Association47.260 (1952), pp. 663–685.doi: 10.1080/01621459.1952.10483446

work page doi:10.1080/01621459.1952.10483446 1952
[26]

Mong, Safwan S

Jeremy Irvin et al. “CheXpert: a large chest radiograph dataset with uncertainty labels and expert com- parison”. In:Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty- First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence. AAAI’19/...

work page doi:10.1609/aaai.v33i01.3301590 2019
[27]

Stanford AIMI, 2019.doi:10.71718/ y7pj-4v93.url:https://doi.org/10.71718/y7pj-4v93

Jeremy Irvin et al.CheXpert: Chest X-rays Dataset, Version 1.0. Stanford AIMI, 2019.doi:10.71718/ y7pj-4v93.url:https://doi.org/10.71718/y7pj-4v93

work page doi:10.71718/y7pj-4v93 2019
[28]

Wenlong Ji, Lihua Lei, and Tijana Zrnic.Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI. 2025. arXiv:2501.09731 [stat.ML].url:https://arxiv.org/abs/2501.09731

Pith/arXiv arXiv 2025
[29]

Breast cancer histopathological image classification using convolutional neural networks with small SE-ResNet module

Y. Jiang et al. “Breast cancer histopathological image classification using convolutional neural networks with small SE-ResNet module.” In:PLoS One(2019)

2019
[30]

MIMIC-CXR, a de-identified publicly available database of chest radio- graphs with free-text reports

Alistair E. W. Johnson et al. “MIMIC-CXR, a de-identified publicly available database of chest radio- graphs with free-text reports”. In:Nature Scientific Data6 (2019).doi:10.1038/s41597-019-0322-0. url:https://doi.org/10.1038/s41597-019-0322-0. 28

work page doi:10.1038/s41597-019-0322-0 2019
[31]

Convolutional neural networks for histopathology image classification: Training vs. Using pre-trained networks

Brady Kieffer et al. “Convolutional neural networks for histopathology image classification: Training vs. Using pre-trained networks”. In:2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA). 2017, pp. 1–6.doi:10.1109/IPTA.2017.8310149

work page doi:10.1109/ipta.2017.8310149 2017
[32]

Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice

Toru Kitagawa and Aleksey Tetenov. “Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice”. In:Econometrica86.2 (Mar. 2018), pp. 591–616.doi:10 . 3982 / ECTA13288

2018
[33]

M-estimation under Two-Phase Multiwave Sampling with Appli- cations to Prediction-Powered Inference

Dan M. Kluger and Stephen Bates. “M-estimation under Two-Phase Multiwave Sampling with Appli- cations to Prediction-Powered Inference”. In:arXiv preprint arXiv:2602.16933(2026).doi:10.48550/ arXiv.2602.16933.url:https://arxiv.org/abs/2602.16933

arXiv 2026
[34]

Prediction-Powered Inference with Imputed Covariates and Nonuniform Sam- pling

Dan M. Kluger et al. “Prediction-Powered Inference with Imputed Covariates and Nonuniform Sam- pling”. In:arXiv preprint arXiv:2501.18577(2025).doi:10.48550/arXiv.2501.18577.url:https: //arxiv.org/abs/2501.18577

work page doi:10.48550/arxiv.2501.18577.url:https: 2025
[35]

Puheng Li, Tijana Zrnic, and Emmanuel Cand` es.Robust Sampling for Active Statistical Inference
[36]

arXiv:2511.08991 [stat.ML].url:https://arxiv.org/abs/2511.08991

arXiv
[37]

Detecting Stance in Media On Global Warming

Yiwei Luo, Dallas Card, and Dan Jurafsky. “Detecting Stance in Media On Global Warming”. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Ed. by Trevor Cohn, Yulan He, and Yang Liu. Online: Association for Computational Linguistics, Nov. 2020, pp. 3296–3315.doi: 10 . 18653 / v1 / 2020 . findings - emnlp . 296.url:https : / / acla...

2020
[38]

Accessed: 2025-11-05

Mayo Clinic Staff.Enlarged heart — Symptoms & causes. Accessed: 2025-11-05. May 2022.url: https://www.mayoclinic.org/diseases- conditions/enlarged- heart/symptoms- causes/syc- 20355436

2025
[39]

Task-Agnostic Machine-Learning-Assisted Inference

Jiacheng Miao and Qiongshi Lu. “Task-Agnostic Machine-Learning-Assisted Inference”. In:arXiv preprint arXiv:2405.20039(2024).doi:10.48550/arXiv.2405.20039.url:https://arxiv.org/ abs/2405.20039

work page doi:10.48550/arxiv.2405.20039.url:https://arxiv.org/ 2024
[40]

The knowledge-gradient algorithm for sequencing experiments in drug discovery

Diana M. Negoescu, Peter I. Frazier, and Warren B. Powell. “The knowledge-gradient algorithm for sequencing experiments in drug discovery”. In:INFORMS Journal on Computing23.3 (2011), pp. 346– 363.doi:10.1287/ijoc.1100.0417

work page doi:10.1287/ijoc.1100.0417 2011
[41]

On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection

Jerzy Neyman. “On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection”. In:Journal of the Royal Statistical Society97.4 (1934), pp. 558–625

1934
[42]

Monotone Regression Splines in Action

J. O. Ramsay. “Monotone Regression Splines in Action”. In:Statistical Science3.4 (1988), pp. 425– 441.doi:10.1214/ss/1177012761

work page doi:10.1214/ss/1177012761 1988
[43]

Practical considerations for active machine learning in drug discovery

Daniel Reker. “Practical considerations for active machine learning in drug discovery”. In:Drug Dis- covery Today: Technologies32–33 (2019), pp. 73–79.doi:10.1016/j.ddtec.2020.06.001

work page doi:10.1016/j.ddtec.2020.06.001 2019
[44]

Active-learning strategies in computer-assisted drug discovery

Daniel Reker and Gisbert Schneider. “Active-learning strategies in computer-assisted drug discovery”. In:Drug Discovery Today20.4 (2015), pp. 458–465.doi:10.1016/j.drudis.2014.12.004

work page doi:10.1016/j.drudis.2014.12.004 2015
[45]

July 2022.url: https://sites.stat.columbia.edu/bodhi/Talks/Emp-Proc-Lecture-Notes.pdf

Bodhisattva Sen.A Gentle Introduction to Empirical Process Theory and Applications. July 2022.url: https://sites.stat.columbia.edu/bodhi/Talks/Emp-Proc-Lecture-Notes.pdf

2022
[46]

Serfling.Approximation Theorems of Mathematical Statistics

Robert J. Serfling.Approximation Theorems of Mathematical Statistics. New York: John Wiley & Sons, 1980

1980
[47]

Turnbull, S

Shanshan Song, Xihong Lin, and Yong Zhou. “A General M-estimation Theory in Semi-Supervised Framework”. In:Journal of the American Statistical Association119.546 (2024), pp. 1065–1075.doi: 10.1080/01621459.2023.2169699

work page doi:10.1080/01621459.2023.2169699 2024
[48]

Breast cancer histopathological image classification using Convolu- tional Neural Networks

Fabio Alexandre Spanhol et al. “Breast cancer histopathological image classification using Convolu- tional Neural Networks”. In:2016 International Joint Conference on Neural Networks (IJCNN). 2016, pp. 2560–2567.doi:10.1109/IJCNN.2016.7727519. 29

work page doi:10.1109/ijcnn.2016.7727519 2016
[49]

Semiparametric Semi-Supervised Learning for General Targets Under Distribution Shift and Decaying Overlap

Lorenzo Testa et al. “Semiparametric Semi-Supervised Learning for General Targets Under Distribution Shift and Decaying Overlap”. In:arXiv preprint arXiv:2505.06452(2025).doi:10 . 48550 / arXiv . 2505.06452.url:https://arxiv.org/abs/2505.06452

Pith/arXiv arXiv 2025
[50]

Semi-Supervised Regression Analysis with Model Misspeci- fication and High-Dimensional Data

Ye Tian, Peng Wu, and Zhiqiang Tan. “Semi-Supervised Regression Analysis with Model Misspeci- fication and High-Dimensional Data”. In:arXiv preprint arXiv:2406.13906(2024).doi:10.48550/ arXiv.2406.13906.url:https://arxiv.org/abs/2406.13906

arXiv 2024
[51]

Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits

Long Tran-Thanh et al. “Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits”. In:Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI-12). AAAI Press, 2012, pp. 1134–1140

2012
[52]

A. W. van der Vaart and Jon A. Wellner.Weak Convergence and Empirical Processes. 2nd ed. Springer, 2023

2023
[53]

van der Vaart.Asymptotic Statistics

A.W. van der Vaart.Asymptotic Statistics. 1st ed. Cambridge University Press, 1998

1998
[54]

van der Laan and Sherri Rose.Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies

Mark J. van der Laan and Sherri Rose.Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies. Springer, 2018

2018
[55]

Targeted Maximum Likelihood Learning

Mark J. van der Laan and Daniel Rubin. “Targeted Maximum Likelihood Learning”. In:International Journal of Biostatistics2.1 (2006), Article 11

2006
[56]

Smoothing Noisy Data with Spline Functions

Grace Wahba. “Smoothing Noisy Data with Spline Functions”. In:Numerische Mathematik24.5 (1975), pp. 383–393.doi:10.1007/BF01437407

work page doi:10.1007/bf01437407 1975
[57]

Annotation-efficient deep learning for automatic medical image segmentation

Shanshan Wang et al. “Annotation-efficient deep learning for automatic medical image segmentation”. In:Nature Communications12.1 (2021), p. 5915.doi:10.1038/s41467-021-26216-9

work page doi:10.1038/s41467-021-26216-9 2021
[58]

Active learning in the drug discovery process

Manfred K. Warmuth et al. “Active learning in the drug discovery process”. In:Advances in Neural Information Processing Systems. Vol. 14. 2001, pp. 1449–1456

2001
[59]

Active Learning with Support Vector Machines in the Drug Discovery Process

Manfred K. Warmuth et al. “Active Learning with Support Vector Machines in the Drug Discovery Process”. In:Journal of Chemical Information and Computer Sciences43.2 (2003), pp. 667–673.doi: 10.1021/ci025620t

work page doi:10.1021/ci025620t 2003
[60]

Wellner.Notes on the H´ ajek projection and Hoeffding Decomposition

Jon A. Wellner.Notes on the H´ ajek projection and Hoeffding Decomposition. May 2011.url:https: //sites.stat.washington.edu/jaw/COURSES/580s/581/HO/HajekProj-HoeffdingExp.pdf

2011
[61]

Zichun Xu, Daniela Witten, and Ali Shojaie.A Unified Framework for Semiparametrically Efficient Semi-Supervised Learning. 2025. arXiv:2502.17741 [math.ST].url:https://arxiv.org/abs/2502. 17741

arXiv 2025
[62]

A Cost-Effective Chart Review Sampling Design to Account for Phenotyping Error in Electronic Health Records (EHR) Data

Ziyan Yin et al. “A Cost-Effective Chart Review Sampling Design to Account for Phenotyping Error in Electronic Health Records (EHR) Data”. In:Journal of the American Medical Informatics Association 29.1 (2022), pp. 52–61.doi:10.1093/jamia/ocab222

work page doi:10.1093/jamia/ocab222 2022
[63]

Double Robust Semi-Supervised Infer- ence for the Mean: Selection Bias under MAR Labeling with Decaying Overlap

Yuqian Zhang, Abhishek Chakrabortty, and Jelena Bradic. “Double Robust Semi-Supervised Infer- ence for the Mean: Selection Bias under MAR Labeling with Decaying Overlap”. In:Information and Inference: A Journal of the IMA12.3 (2023), pp. 2066–2159.doi:10.1093/imaiai/iaad021

work page doi:10.1093/imaiai/iaad021 2023
[64]

Active statistical inference

Tijana Zrnic and Emmanuel J. Cand` es. “Active statistical inference”. In:Proceedings of the 41st International Conference on Machine Learning. ICML’24. Vienna, Austria: JMLR.org, 2024

2024
[65]

Cross-prediction-powered inference

Tijana Zrnic and Emmanuel J. Cand` es. “Cross-prediction-powered inference”. In:Proceedings of the National Academy of Sciences121.5 (2024), e2322083121. 30 Appendix A Related literature 32 A.1 Neyman Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A.2 Two-phase, validation, and surrogate-assisted sampling . ....

2024
[66]

Compute the point estimate ˆθusing the queried labels and the chosen augmentationa i
[67]

Compute the usual variance estimateV θ,int used in the superpopulation-style Wald interval
[68]

Unqueried units do not contribute to ˆVk,HT, since their contribution is multiplied byξ i = 0

For each queried unit, computeR i =Y i −a i. Unqueried units do not contribute to ˆVk,HT, since their contribution is multiplied byξ i = 0
[69]

Compute ˆV1,HT and ˆV0,HT, then combine them through the Delta method to obtain ˆVθ,HT
[70]

Report the finite-population-calibrated interval ˆθ±z 1−α/2 q ˆVθ,HT
[71]

expert,” “proven,

Report ˆγθ =V θ,int/ ˆVθ,HT as a diagnostic of the variance inflation of the usual Wald interval relative to the finite-population conditional variance. This calibration is used only for coverage evaluation against the fixed finite-population benchmarkθ N. It does not replace the usual superpopulation interval when the inferential target isθ(P). J Additio...

arXiv 2024

[1] [1]

A rewriting system for convex optimization problems

Akshay Agrawal, Steven Diamond, and Stephen Boyd. “A rewriting system for convex optimization problems”. In:Journal of Control and Decision5.1 (2018), pp. 42–60.doi:10.1080/23307706.2017. 1282058

work page doi:10.1080/23307706.2017 2018

[2] [2]

Disciplined Geometric Programming

Akshay Agrawal, Steven Diamond, and Stephen Boyd. “Disciplined Geometric Programming”. In: Optimization Letters13.5 (2019), pp. 961–976

2019

[3] [3]

Two-Phase Sampling Designs for Data Validation in Settings with Covariate Measurement Error and Continuous Outcome

Gustavo G. C. Amorim et al. “Two-Phase Sampling Designs for Data Validation in Settings with Covariate Measurement Error and Continuous Outcome”. In:Journal of the Royal Statistical Society: Series A184.4 (2021), pp. 1368–1389.doi:10.1111/rssa.12689

work page doi:10.1111/rssa.12689 2021

[4] [4]

Anastasios N Angelopoulos et al.Cost-Optimal Active AI Model Evaluation. 2025. arXiv:2506.07949 [cs.LG].url:https://arxiv.org/abs/2506.07949

arXiv 2025

[5] [5]

Angelopoulos, John C

Anastasios N. Angelopoulos, John C. Duchi, and Tijana Zrnic.PPI++: Efficient Prediction-Powered Inference. 2024. arXiv:2311.01453 [stat.ML].url:https://arxiv.org/abs/2311.01453

Pith/arXiv arXiv 2024

[6] [6]

Prediction-powered inference

Anastasios N. Angelopoulos et al. “Prediction-powered inference”. In:Science382.6671 (2023), pp. 669– 674

2023

[7] [7]

MOSEK ApS

MOSEK ApS.The MOSEK Optimization Toolbox for Python 9.3.https://docs.mosek.com/latest/ python/. MOSEK ApS. Copenhagen, Denmark, 2022

2022

[8] [8]

Policy Learning With Observational Data

S. Athey and S. Wager. “Policy Learning With Observational Data”. In:Econometrica89 (2021), pp. 133–161

2021

[9] [9]

Bickel et al.Efficient and Adaptive Estimation for Semiparametric Models

Peter J. Bickel et al.Efficient and Adaptive Estimation for Semiparametric Models. Springer, 1993

1993

[10] [10]

Patrick Billingsley.Probability and Measure. 3rd ed. Wiley, 1995

1995

[11] [11]

The structural context of posttranslational modifications at a proteome-wide scale

I Bludau et al. “The structural context of posttranslational modifications at a proteome-wide scale”. In:PLoS Biology20.5 (2022), e3001636

2022

[12] [12]

Carl de Boor.A Practical Guide to Splines. Vol. 27. Applied Mathematical Sciences. New York: Springer, 1978.isbn: 978-0387953663. 27

1978

[13] [13]

A Tutorial on Geometric Programming

Stephen Boyd et al. “A Tutorial on Geometric Programming”. In:Optimization and Engineering8.1 (2007), pp. 67–127.doi:10.1007/s11081-007-9001-7

work page doi:10.1007/s11081-007-9001-7 2007

[14] [14]

Improved Horvitz–Thompson Estimation of Model Parameters from Two- Phase Stratified Samples: Applications in Epidemiology

Norman E. Breslow et al. “Improved Horvitz–Thompson Estimation of Model Parameters from Two- Phase Stratified Samples: Applications in Epidemiology”. In:Statistics in Biosciences1.1 (2009), pp. 32–49.doi:10.1007/s12561-009-9001-6

work page doi:10.1007/s12561-009-9001-6 2009

[15] [15]

Surrogate-Powered Inference: Regularization and Adaptivity

Jianmin Chen et al. “Surrogate-Powered Inference: Regularization and Adaptivity”. In:arXiv preprint arXiv:2512.21826(2025).doi:10.48550/arXiv.2512.21826.url:https://arxiv.org/abs/2512. 21826

work page doi:10.48550/arxiv.2512.21826.url:https://arxiv.org/abs/2512 2025

[16] [16]

Double/Debiased Machine Learning for Treatment and Structural Param- eters

Victor Chernozhukov et al. “Double/Debiased Machine Learning for Treatment and Structural Param- eters”. In:The Econometrics Journal21.1 (2018), pp. C1–C68.doi:10.1111/ectj.12097

work page doi:10.1111/ectj.12097 2018

[17] [18]

Cochran.Sampling Techniques

William G. Cochran.Sampling Techniques. 3rd ed. New York: John Wiley & Sons, 1977

1977

[18] [19]

On the limits of cross-domain generalization in automated X-ray prediction

Joseph Paul Cohen et al. “On the limits of cross-domain generalization in automated X-ray prediction”. In:Medical Imaging with Deep Learning. 2020.url:https://arxiv.org/abs/2002.02497

arXiv 2020

[19] [20]

TorchXRayVision: A library of chest X-ray datasets and models

Joseph Paul Cohen et al. “TorchXRayVision: A library of chest X-ray datasets and models”. In:Medical Imaging with Deep Learning. 2022.url:https://github.com/mlmed/torchxrayvision

2022

[20] [21]

On P´ olya Frequency Functions. IV. The Fundamental Spline Functions and Their Limits

H. B. Curry and I. J. Schoenberg. “On P´ olya Frequency Functions. IV. The Fundamental Spline Functions and Their Limits”. In:Journal d’Analyse Math´ ematique17 (1966), pp. 71–107.doi:10. 1007/BF02788653

1966

[21] [22]

CVXPY: A Python-Embedded Modeling Language for Convex Optimization

Steven Diamond and Stephen Boyd. “CVXPY: A Python-Embedded Modeling Language for Convex Optimization”. In:The Journal of Machine Learning Research17.83 (2016), pp. 1–5.url:http : //jmlr.org/papers/v17/15-291.html

2016

[22] [23]

Can Unconfident LLM Annotations Be Used for Confident Conclusions?

Kristina Gligoric et al. “Can Unconfident LLM Annotations Be Used for Confident Conclusions?” In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Com- putational Linguistics: Human Language Technologies (Volume 1: Long Papers). Ed. by Luis Chiruzzo, Alan Ritter, and Lu Wang. Albuquerque, New Mexico: Associa...

2025

[23] [24]

Asymptotic Normality of Simple Linear Rank Statistics under Alternatives

Jaroslav H´ ajek. “Asymptotic Normality of Simple Linear Rank Statistics under Alternatives”. In:Pro- ceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability. Vol. 1. Berkeley: University of California Press, 1972, pp. 139–152

1972

[24] [25]

A Generalization of Sampling Without Replacement from a Finite Universe

D. G. Horvitz and D. J. Thompson. “A Generalization of Sampling Without Replacement from a Finite Universe”. In:Journal of the American Statistical Association47.260 (1952), pp. 663–685.doi: 10.1080/01621459.1952.10483446

work page doi:10.1080/01621459.1952.10483446 1952

[25] [26]

Mong, Safwan S

Jeremy Irvin et al. “CheXpert: a large chest radiograph dataset with uncertainty labels and expert com- parison”. In:Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty- First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence. AAAI’19/...

work page doi:10.1609/aaai.v33i01.3301590 2019

[26] [27]

Stanford AIMI, 2019.doi:10.71718/ y7pj-4v93.url:https://doi.org/10.71718/y7pj-4v93

Jeremy Irvin et al.CheXpert: Chest X-rays Dataset, Version 1.0. Stanford AIMI, 2019.doi:10.71718/ y7pj-4v93.url:https://doi.org/10.71718/y7pj-4v93

work page doi:10.71718/y7pj-4v93 2019

[27] [28]

Wenlong Ji, Lihua Lei, and Tijana Zrnic.Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI. 2025. arXiv:2501.09731 [stat.ML].url:https://arxiv.org/abs/2501.09731

Pith/arXiv arXiv 2025

[28] [29]

Breast cancer histopathological image classification using convolutional neural networks with small SE-ResNet module

Y. Jiang et al. “Breast cancer histopathological image classification using convolutional neural networks with small SE-ResNet module.” In:PLoS One(2019)

2019

[29] [30]

MIMIC-CXR, a de-identified publicly available database of chest radio- graphs with free-text reports

Alistair E. W. Johnson et al. “MIMIC-CXR, a de-identified publicly available database of chest radio- graphs with free-text reports”. In:Nature Scientific Data6 (2019).doi:10.1038/s41597-019-0322-0. url:https://doi.org/10.1038/s41597-019-0322-0. 28

work page doi:10.1038/s41597-019-0322-0 2019

[30] [31]

Convolutional neural networks for histopathology image classification: Training vs. Using pre-trained networks

Brady Kieffer et al. “Convolutional neural networks for histopathology image classification: Training vs. Using pre-trained networks”. In:2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA). 2017, pp. 1–6.doi:10.1109/IPTA.2017.8310149

work page doi:10.1109/ipta.2017.8310149 2017

[31] [32]

Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice

Toru Kitagawa and Aleksey Tetenov. “Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice”. In:Econometrica86.2 (Mar. 2018), pp. 591–616.doi:10 . 3982 / ECTA13288

2018

[32] [33]

M-estimation under Two-Phase Multiwave Sampling with Appli- cations to Prediction-Powered Inference

Dan M. Kluger and Stephen Bates. “M-estimation under Two-Phase Multiwave Sampling with Appli- cations to Prediction-Powered Inference”. In:arXiv preprint arXiv:2602.16933(2026).doi:10.48550/ arXiv.2602.16933.url:https://arxiv.org/abs/2602.16933

arXiv 2026

[33] [34]

Prediction-Powered Inference with Imputed Covariates and Nonuniform Sam- pling

Dan M. Kluger et al. “Prediction-Powered Inference with Imputed Covariates and Nonuniform Sam- pling”. In:arXiv preprint arXiv:2501.18577(2025).doi:10.48550/arXiv.2501.18577.url:https: //arxiv.org/abs/2501.18577

work page doi:10.48550/arxiv.2501.18577.url:https: 2025

[34] [35]

Puheng Li, Tijana Zrnic, and Emmanuel Cand` es.Robust Sampling for Active Statistical Inference

[35] [36]

arXiv:2511.08991 [stat.ML].url:https://arxiv.org/abs/2511.08991

arXiv

[36] [37]

Detecting Stance in Media On Global Warming

Yiwei Luo, Dallas Card, and Dan Jurafsky. “Detecting Stance in Media On Global Warming”. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Ed. by Trevor Cohn, Yulan He, and Yang Liu. Online: Association for Computational Linguistics, Nov. 2020, pp. 3296–3315.doi: 10 . 18653 / v1 / 2020 . findings - emnlp . 296.url:https : / / acla...

2020

[37] [38]

Accessed: 2025-11-05

Mayo Clinic Staff.Enlarged heart — Symptoms & causes. Accessed: 2025-11-05. May 2022.url: https://www.mayoclinic.org/diseases- conditions/enlarged- heart/symptoms- causes/syc- 20355436

2025

[38] [39]

Task-Agnostic Machine-Learning-Assisted Inference

Jiacheng Miao and Qiongshi Lu. “Task-Agnostic Machine-Learning-Assisted Inference”. In:arXiv preprint arXiv:2405.20039(2024).doi:10.48550/arXiv.2405.20039.url:https://arxiv.org/ abs/2405.20039

work page doi:10.48550/arxiv.2405.20039.url:https://arxiv.org/ 2024

[39] [40]

The knowledge-gradient algorithm for sequencing experiments in drug discovery

Diana M. Negoescu, Peter I. Frazier, and Warren B. Powell. “The knowledge-gradient algorithm for sequencing experiments in drug discovery”. In:INFORMS Journal on Computing23.3 (2011), pp. 346– 363.doi:10.1287/ijoc.1100.0417

work page doi:10.1287/ijoc.1100.0417 2011

[40] [41]

On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection

Jerzy Neyman. “On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection”. In:Journal of the Royal Statistical Society97.4 (1934), pp. 558–625

1934

[41] [42]

Monotone Regression Splines in Action

J. O. Ramsay. “Monotone Regression Splines in Action”. In:Statistical Science3.4 (1988), pp. 425– 441.doi:10.1214/ss/1177012761

work page doi:10.1214/ss/1177012761 1988

[42] [43]

Practical considerations for active machine learning in drug discovery

Daniel Reker. “Practical considerations for active machine learning in drug discovery”. In:Drug Dis- covery Today: Technologies32–33 (2019), pp. 73–79.doi:10.1016/j.ddtec.2020.06.001

work page doi:10.1016/j.ddtec.2020.06.001 2019

[43] [44]

Active-learning strategies in computer-assisted drug discovery

Daniel Reker and Gisbert Schneider. “Active-learning strategies in computer-assisted drug discovery”. In:Drug Discovery Today20.4 (2015), pp. 458–465.doi:10.1016/j.drudis.2014.12.004

work page doi:10.1016/j.drudis.2014.12.004 2015

[44] [45]

July 2022.url: https://sites.stat.columbia.edu/bodhi/Talks/Emp-Proc-Lecture-Notes.pdf

Bodhisattva Sen.A Gentle Introduction to Empirical Process Theory and Applications. July 2022.url: https://sites.stat.columbia.edu/bodhi/Talks/Emp-Proc-Lecture-Notes.pdf

2022

[45] [46]

Serfling.Approximation Theorems of Mathematical Statistics

Robert J. Serfling.Approximation Theorems of Mathematical Statistics. New York: John Wiley & Sons, 1980

1980

[46] [47]

Turnbull, S

Shanshan Song, Xihong Lin, and Yong Zhou. “A General M-estimation Theory in Semi-Supervised Framework”. In:Journal of the American Statistical Association119.546 (2024), pp. 1065–1075.doi: 10.1080/01621459.2023.2169699

work page doi:10.1080/01621459.2023.2169699 2024

[47] [48]

Breast cancer histopathological image classification using Convolu- tional Neural Networks

Fabio Alexandre Spanhol et al. “Breast cancer histopathological image classification using Convolu- tional Neural Networks”. In:2016 International Joint Conference on Neural Networks (IJCNN). 2016, pp. 2560–2567.doi:10.1109/IJCNN.2016.7727519. 29

work page doi:10.1109/ijcnn.2016.7727519 2016

[48] [49]

Semiparametric Semi-Supervised Learning for General Targets Under Distribution Shift and Decaying Overlap

Lorenzo Testa et al. “Semiparametric Semi-Supervised Learning for General Targets Under Distribution Shift and Decaying Overlap”. In:arXiv preprint arXiv:2505.06452(2025).doi:10 . 48550 / arXiv . 2505.06452.url:https://arxiv.org/abs/2505.06452

Pith/arXiv arXiv 2025

[49] [50]

Semi-Supervised Regression Analysis with Model Misspeci- fication and High-Dimensional Data

Ye Tian, Peng Wu, and Zhiqiang Tan. “Semi-Supervised Regression Analysis with Model Misspeci- fication and High-Dimensional Data”. In:arXiv preprint arXiv:2406.13906(2024).doi:10.48550/ arXiv.2406.13906.url:https://arxiv.org/abs/2406.13906

arXiv 2024

[50] [51]

Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits

Long Tran-Thanh et al. “Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits”. In:Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI-12). AAAI Press, 2012, pp. 1134–1140

2012

[51] [52]

A. W. van der Vaart and Jon A. Wellner.Weak Convergence and Empirical Processes. 2nd ed. Springer, 2023

2023

[52] [53]

van der Vaart.Asymptotic Statistics

A.W. van der Vaart.Asymptotic Statistics. 1st ed. Cambridge University Press, 1998

1998

[53] [54]

van der Laan and Sherri Rose.Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies

Mark J. van der Laan and Sherri Rose.Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies. Springer, 2018

2018

[54] [55]

Targeted Maximum Likelihood Learning

Mark J. van der Laan and Daniel Rubin. “Targeted Maximum Likelihood Learning”. In:International Journal of Biostatistics2.1 (2006), Article 11

2006

[55] [56]

Smoothing Noisy Data with Spline Functions

Grace Wahba. “Smoothing Noisy Data with Spline Functions”. In:Numerische Mathematik24.5 (1975), pp. 383–393.doi:10.1007/BF01437407

work page doi:10.1007/bf01437407 1975

[56] [57]

Annotation-efficient deep learning for automatic medical image segmentation

Shanshan Wang et al. “Annotation-efficient deep learning for automatic medical image segmentation”. In:Nature Communications12.1 (2021), p. 5915.doi:10.1038/s41467-021-26216-9

work page doi:10.1038/s41467-021-26216-9 2021

[57] [58]

Active learning in the drug discovery process

Manfred K. Warmuth et al. “Active learning in the drug discovery process”. In:Advances in Neural Information Processing Systems. Vol. 14. 2001, pp. 1449–1456

2001

[58] [59]

Active Learning with Support Vector Machines in the Drug Discovery Process

Manfred K. Warmuth et al. “Active Learning with Support Vector Machines in the Drug Discovery Process”. In:Journal of Chemical Information and Computer Sciences43.2 (2003), pp. 667–673.doi: 10.1021/ci025620t

work page doi:10.1021/ci025620t 2003

[59] [60]

Wellner.Notes on the H´ ajek projection and Hoeffding Decomposition

Jon A. Wellner.Notes on the H´ ajek projection and Hoeffding Decomposition. May 2011.url:https: //sites.stat.washington.edu/jaw/COURSES/580s/581/HO/HajekProj-HoeffdingExp.pdf

2011

[60] [61]

Zichun Xu, Daniela Witten, and Ali Shojaie.A Unified Framework for Semiparametrically Efficient Semi-Supervised Learning. 2025. arXiv:2502.17741 [math.ST].url:https://arxiv.org/abs/2502. 17741

arXiv 2025

[61] [62]

A Cost-Effective Chart Review Sampling Design to Account for Phenotyping Error in Electronic Health Records (EHR) Data

Ziyan Yin et al. “A Cost-Effective Chart Review Sampling Design to Account for Phenotyping Error in Electronic Health Records (EHR) Data”. In:Journal of the American Medical Informatics Association 29.1 (2022), pp. 52–61.doi:10.1093/jamia/ocab222

work page doi:10.1093/jamia/ocab222 2022

[62] [63]

Double Robust Semi-Supervised Infer- ence for the Mean: Selection Bias under MAR Labeling with Decaying Overlap

Yuqian Zhang, Abhishek Chakrabortty, and Jelena Bradic. “Double Robust Semi-Supervised Infer- ence for the Mean: Selection Bias under MAR Labeling with Decaying Overlap”. In:Information and Inference: A Journal of the IMA12.3 (2023), pp. 2066–2159.doi:10.1093/imaiai/iaad021

work page doi:10.1093/imaiai/iaad021 2023

[63] [64]

Active statistical inference

Tijana Zrnic and Emmanuel J. Cand` es. “Active statistical inference”. In:Proceedings of the 41st International Conference on Machine Learning. ICML’24. Vienna, Austria: JMLR.org, 2024

2024

[64] [65]

Cross-prediction-powered inference

Tijana Zrnic and Emmanuel J. Cand` es. “Cross-prediction-powered inference”. In:Proceedings of the National Academy of Sciences121.5 (2024), e2322083121. 30 Appendix A Related literature 32 A.1 Neyman Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A.2 Two-phase, validation, and surrogate-assisted sampling . ....

2024

[65] [66]

Compute the point estimate ˆθusing the queried labels and the chosen augmentationa i

[66] [67]

Compute the usual variance estimateV θ,int used in the superpopulation-style Wald interval

[67] [68]

Unqueried units do not contribute to ˆVk,HT, since their contribution is multiplied byξ i = 0

For each queried unit, computeR i =Y i −a i. Unqueried units do not contribute to ˆVk,HT, since their contribution is multiplied byξ i = 0

[68] [69]

Compute ˆV1,HT and ˆV0,HT, then combine them through the Delta method to obtain ˆVθ,HT

[69] [70]

Report the finite-population-calibrated interval ˆθ±z 1−α/2 q ˆVθ,HT

[70] [71]

expert,” “proven,

Report ˆγθ =V θ,int/ ˆVθ,HT as a diagnostic of the variance inflation of the usual Wald interval relative to the finite-population conditional variance. This calibration is used only for coverage evaluation against the fixed finite-population benchmarkθ N. It does not replace the usual superpopulation interval when the inferential target isθ(P). J Additio...

arXiv 2024