Prediction-Powered Causal Inference by Automatic Debiased Machine Learning and Semi-Supervised Riesz Regression
Pith reviewed 2026-06-27 06:01 UTC · model grok-4.3
The pith
Unlabeled auxiliary regressors yield causal estimators whose asymptotic variance falls below the labeled-data efficiency bound.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The asymptotic variances of both the EE-DML-PPCI and TMLE-DML-PPCI estimators equal the efficiency bound derived from the semi-supervised data-generating process, which is strictly smaller than the bound obtained from labeled observations alone, because the estimators are constructed from the Neyman-orthogonal efficient influence function whose Riesz representer is estimated by semi-supervised generalized Riesz regression.
What carries the argument
The efficient influence function (also a Neyman orthogonal score) that depends on the Riesz representer and the regression function, estimated via semi-supervised generalized Riesz regression.
If this is right
- Both estimating-equation and targeted-maximum-likelihood versions of DML-PPCI attain the same semi-supervised efficiency bound.
- The Riesz representer can be estimated at rates that preserve the efficiency gain even when the regression function is estimated by machine learning.
- Prediction-powered causal inference improves upon standard debiased machine learning by systematically incorporating auxiliary unlabeled regressors.
- The efficiency bound is attainable without requiring the outcome model or propensity model to be correctly specified, provided the Riesz representer is estimated consistently.
Where Pith is reading between the lines
- The same semi-supervised Riesz-regression technique could be applied to other semiparametric problems that admit a Riesz representer, such as average treatment effect on the treated or policy-value estimation.
- When labeled outcomes are costly, optimal data-collection budgets may shift toward acquiring more unlabeled regressors rather than more labeled pairs.
- High-dimensional or nonparametric extensions follow directly once the convergence rates of the semi-supervised Riesz regression are verified under sparsity or smoothness assumptions.
Load-bearing premise
The semi-supervised data-generating process must satisfy regularity conditions that let the Riesz representer exist and let the semi-supervised generalized Riesz regression converge at the rates needed for the efficiency bound to be attained.
What would settle it
A simulation or real-data experiment in which the finite-sample variance of the proposed estimators fails to drop below the labeled-only bound, or in which the Riesz-regression estimator does not achieve the convergence rates stated under the semi-supervised sampling process.
read the original abstract
This study investigates semiparametric efficient estimation of causal and structural parameters in a semi-supervised setting. In our setting, unlabeled auxiliary regressors are available in addition to labeled observations consisting of outcomes and regressors. Our goal is to construct estimators of causal and structural parameters whose asymptotic variances are smaller than those of estimators constructed using only labeled data. We refer to this framework as prediction-powered causal inference (PPCI). We first derive the efficient influence function and the efficiency bound, which imply that the use of auxiliary regressors can attain a smaller asymptotic variance than the efficiency bound attainable from labeled observations alone. Then, by combining the efficient influence function with the debiased machine learning (DML) framework, we propose methods that we call DML-PPCI. If we construct an estimating-equation estimator, we refer to the method as EE-DML-PPCI; if we construct a targeted-learning estimator, we refer to the method as TMLE-DML-PPCI. The asymptotic variances of both estimators match our derived efficiency bound. In the construction of the estimators, estimation of the efficient influence function plays an important role. In our study, the efficient influence function is also a Neyman orthogonal score, which depends on the Riesz representer and the regression function. For Riesz representer estimation, we develop semi-supervised generalized Riesz regression with convergence rate guarantees.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a framework called prediction-powered causal inference (PPCI) for semiparametric efficient estimation of causal and structural parameters in a semi-supervised setting, where labeled data (outcomes and regressors) are supplemented by unlabeled auxiliary regressors. It derives the efficient influence function (EIF) and corresponding efficiency bound, which is claimed to be strictly smaller than the bound attainable from labeled data alone. It then proposes two debiased machine learning estimators (EE-DML-PPCI via estimating equations and TMLE-DML-PPCI via targeted learning) that achieve this bound by using a Neyman-orthogonal score depending on the Riesz representer and regression function, with the Riesz representer estimated via a new semi-supervised generalized Riesz regression that comes with convergence rate guarantees.
Significance. If the central claims hold, the work would provide a theoretically grounded method to improve efficiency in causal inference by leveraging abundant unlabeled data, with explicit efficiency bounds and matching estimators. This could be relevant for applications with semi-supervised structures, such as those in epidemiology or econometrics. The derivation of the EIF, the smaller efficiency bound, and the rate guarantees for the semi-supervised Riesz regression would be the key contributions if rigorously established.
major comments (2)
- [Abstract] Abstract (efficiency bound paragraph): The claim that the efficiency bound is strictly smaller than the labeled-only bound, and that the EE-DML-PPCI and TMLE-DML-PPCI estimators match it, rests on the semi-supervised data-generating process satisfying regularity conditions so that the Riesz representer exists and the semi-supervised generalized Riesz regression attains the convergence rates needed for asymptotic efficiency. The manuscript must explicitly state these conditions (e.g., on the function classes, smoothness, or eigenvalue bounds) and show they do not reduce to the labeled-only case by construction.
- [Abstract] Abstract (EIF and Neyman orthogonality paragraph): The EIF is asserted to be Neyman orthogonal and to depend on the Riesz representer; however, without the explicit form of the EIF or the semi-supervised model assumptions in the derivation, it is unclear whether the orthogonality holds automatically or requires additional restrictions that could affect the strict improvement in the bound.
Simulated Author's Rebuttal
We thank the referee for the careful review and constructive comments on the abstract. We address each major comment below. The full manuscript derives the EIF, efficiency bound, and rate guarantees under explicit regularity conditions (detailed in Sections 3 and 4), which we will now highlight more prominently in the abstract to address the concerns raised.
read point-by-point responses
-
Referee: [Abstract] Abstract (efficiency bound paragraph): The claim that the efficiency bound is strictly smaller than the labeled-only bound, and that the EE-DML-PPCI and TMLE-DML-PPCI estimators match it, rests on the semi-supervised data-generating process satisfying regularity conditions so that the Riesz representer exists and the semi-supervised generalized Riesz regression attains the convergence rates needed for asymptotic efficiency. The manuscript must explicitly state these conditions (e.g., on the function classes, smoothness, or eigenvalue bounds) and show they do not reduce to the labeled-only case by construction.
Authors: We agree the abstract should reference the conditions more explicitly. The manuscript derives the bound under standard assumptions including bounded function classes for the regression and Riesz representer, sufficient smoothness for the semi-supervised estimators, and eigenvalue bounds ensuring identifiability and faster convergence rates when auxiliary unlabeled data are used. These conditions are shown in the paper (via the semi-supervised generalized Riesz regression) to yield a strictly smaller bound than the labeled-only case, as the auxiliary data improve estimation of the representer without reducing to the supervised setting. We will revise the abstract to state these conditions briefly and note the non-reduction to the labeled-only bound. revision: yes
-
Referee: [Abstract] Abstract (EIF and Neyman orthogonality paragraph): The EIF is asserted to be Neyman orthogonal and to depend on the Riesz representer; however, without the explicit form of the EIF or the semi-supervised model assumptions in the derivation, it is unclear whether the orthogonality holds automatically or requires additional restrictions that could affect the strict improvement in the bound.
Authors: The EIF is explicitly derived in Section 3 of the manuscript as the unique influence function for the semi-supervised model; it is Neyman orthogonal by the Riesz representation property under the semi-supervised data structure (labeled outcomes/regressors plus unlabeled auxiliary regressors). Orthogonality holds automatically from the model assumptions without further restrictions, and the paper shows this yields the strict efficiency improvement. We will add the explicit EIF form to the abstract (or a parenthetical) along with a concise statement of the semi-supervised assumptions ensuring orthogonality and the bound improvement. revision: yes
Circularity Check
No circularity: standard derivation of EIF and efficiency bound with independent estimator construction
full rationale
The paper derives the efficient influence function and efficiency bound from the semi-supervised data-generating process under stated regularity conditions, then constructs DML-based estimators (EE-DML-PPCI and TMLE-DML-PPCI) whose asymptotic variances are shown to match that bound. This follows the conventional semiparametric efficiency theory workflow without any reduction of the bound or estimators to fitted inputs by construction, self-citation load-bearing premises, or ansatz smuggling. The Riesz representer estimation step is presented as a new semi-supervised procedure with convergence guarantees, not as a renaming or self-referential fit. No load-bearing step collapses to its own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Regularity conditions sufficient for existence of the efficient influence function and for convergence of semi-supervised Riesz regression
Reference graph
Works this paper leans on
-
[1]
Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I
Anastasios N. Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I. Jordan, and Tijana Zrnic. Prediction-powered inference. Science, 382 0 (6671): 0 669--674, 2023
2023
-
[2]
Brown, Michael Sklar, Richard Berk, Andreas Buja, and Linda Zhao
David Azriel, Lawrence D. Brown, Michael Sklar, Richard Berk, Andreas Buja, and Linda Zhao. Semi-supervised linear regression. Journal of the American Statistical Association, 117 0 (540): 0 2238--2251, 2022
2022
-
[3]
Heejung Bang and James M. Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61 0 (4): 0 962--973, 2005
2005
-
[4]
Augmented balancing weights as linear regression
David Bruns-Smith, Oliver Dukes, Avi Feller, and Elizabeth L Ogburn. Augmented balancing weights as linear regression. Journal of the Royal Statistical Society Series B: Statistical Methodology, 04 2025
2025
-
[5]
Bruns-Smith and Avi Feller
David A. Bruns-Smith and Avi Feller. Outcome assumptions and duality theory for balancing weights. In International Conference on Artificial Intelligence and Statistics (AISTATS), pp.\ 11037--11055, 2022
2022
-
[6]
Prediction-powered causal inferences
Riccardo Cadei, Ilker Demirel, Piersilvio De Bartolomeis, Lukas Lindorfer, Sylvia Cremer, Cordelia Schmid, and Francesco Locatello. Prediction-powered causal inferences. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2025
2025
-
[7]
Semi-Supervised Learning
Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. Semi-Supervised Learning. MIT Press, 2006
2006
-
[8]
Sieve semiparametric two-step gmm under weak dependence
Xiaohong Chen and Zhipeng Liao. Sieve semiparametric two-step gmm under weak dependence. Journal of Econometrics, 189 0 (1): 0 163--186, 2015
2015
-
[9]
Sieve wald and qlr inferences on semi/nonparametric conditional moment models
Xiaohong Chen and Demian Pouzo. Sieve wald and qlr inferences on semi/nonparametric conditional moment models. Econometrica, 83 0 (3): 0 1013--1079, 2015
2015
-
[10]
Double/debiased machine learning for treatment and structural parameters
Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 2018
2018
-
[11]
Newey, Victor Quintas-Martinez, and Vasilis Syrgkanis
Victor Chernozhukov, Whitney K. Newey, Victor Quintas-Martinez, and Vasilis Syrgkanis. Automatic debiased machine learning via riesz regression, 2021. a rXiv:2104.14737
arXiv 2021
-
[12]
Newey, and Rahul Singh
Victor Chernozhukov, Whitney K. Newey, and Rahul Singh. Automatic debiased machine learning of causal and structural effects. Econometrica, 90 0 (3): 0 967--1027, 2022
2022
-
[13]
Automatic debiased machine learning for covariate shifts, 2025
Victor Chernozhukov, Michael Newey, Whitney K Newey, Rahul Singh, and Vasilis Srygkanis. Automatic debiased machine learning for covariate shifts, 2025. a rXiv: 2307.04527
arXiv 2025
-
[14]
Density ratio estimation via infinitesimal classification
Kristy Choi, Chenlin Meng, Yang Song, and Stefano Ermon. Density ratio estimation via infinitesimal classification. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
2022
-
[15]
Prediction-powered generalization of causal inferences
Ilker Demirel, Ahmed Alaa, Anthony Philippakis, and David Sontag. Prediction-powered generalization of causal inferences. In International Conference on Machine Learning (ICML), 2024
2024
-
[16]
Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies
Jens Hainmueller. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, 20 0 (1): 0 25--46, 2012
2012
-
[17]
Covariate balancing propensity score
Kosuke Imai and Marc Ratkovic. Covariate balancing propensity score. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76 0 (1): 0 243--263, 07 2013. ISSN 1369-7412
2013
-
[18]
A least-squares approach to direct importance estimation
Takafumi Kanamori, Shohei Hido, and Masashi Sugiyama. A least-squares approach to direct importance estimation. Journal of Machine Learning Research, 10 0 (Jul.): 0 1391--1445, 2009
2009
-
[19]
Masahiro Kato. Direct bias-correction term estimation for propensity scores and average treatment effect estimation, 2025 a . a rXiv: 2509.22122
arXiv 2025
-
[20]
Nearest neighbor matching as least squares density ratio estimation and riesz regression, 2025 b
Masahiro Kato. Nearest neighbor matching as least squares density ratio estimation and riesz regression, 2025 b . a rXiv: 2510.24433
arXiv 2025
-
[21]
Riesz regression as direct density ratio estimation, 2025 c
Masahiro Kato. Riesz regression as direct density ratio estimation, 2025 c . a rXiv: 2511.04568
arXiv 2025
-
[22]
Masahiro Kato. Semi-supervised treatment effect estimation with unlabeled covariates via generalized riesz regression, 2025 d . a rXiv: 2511.08303
Pith/arXiv arXiv 2025
-
[23]
Masahiro Kato. A unified framework for debiased machine learning: Riesz representer fitting under bregman divergence, 2026 a . a rXiv: 2601.07752
arXiv 2026
-
[24]
Scorematchingriesz: Score matching for debiased machine learning and policy path estimation
Masahiro Kato. Scorematchingriesz: Score matching for debiased machine learning and policy path estimation. In International Conference on Machine Learning (ICML), 2026 b
2026
-
[25]
Non-negative bregman divergence minimization for deep direct density ratio estimation
Masahiro Kato and Takeshi Teshima. Non-negative bregman divergence minimization for deep direct density ratio estimation. In International Conference on Machine Learning (ICML), 2021
2021
-
[26]
Double debiased covariate shift adaptation robust to density-ratio estimation, 2024 a
Masahiro Kato, Kota Matsui, and Ryo Inokuchi. Double debiased covariate shift adaptation robust to density-ratio estimation, 2024 a . a rXiv: 2310.16638
arXiv 2024
-
[27]
Active adaptive experimental design for treatment effect estimation with covariate choice
Masahiro Kato, Akihiro Oga, Wataru Komatsubara, and Ryo Inokuchi. Active adaptive experimental design for treatment effect estimation with covariate choice. In International Conference on Machine Learning (ICML), 2024 b
2024
-
[28]
Masahiro Kato, Fumiaki Kozai, and Ryo Inokuchi. Puate: Semiparametric efficient average treatment effect estimation from treated (positive) and unlabeled units, 2025. a rXiv:2501.19345
arXiv 2025
-
[29]
Edward H. Kennedy. Efficient nonparametric causal inference with missing exposure information. The International Journal of Biostatistics, 16 0 (1), 2020
2020
-
[30]
Chris A. J. Klaassen. Consistent estimation of the influence function of locally asymptotically linear estimators. Annals of Statistics, 15, 1987
1987
-
[31]
Case-control studies with contaminated controls
Tony Lancaster and Guido Imbens. Case-control studies with contaminated controls. Journal of Econometrics, 71 0 (1): 0 145--160, 1996
1996
-
[32]
Asymptotic Methods in Statistical Decision Theory (Springer Series in Statistics)
Lucien Le Cam. Asymptotic Methods in Statistical Decision Theory (Springer Series in Statistics). Springer, 1986
1986
-
[33]
Theoretical comparisons of positive-unlabeled learning against positive-negative learning
Gang Niu, Marthinus Christoffel du Plessis, Tomoya Sakai, Yao Ma, and Masashi Sugiyama. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In Advances in Neural Information Processing Systems (NeurIPS), 2016
2016
-
[34]
Rhodes, K
B. Rhodes, K. Xu, and M.U. Gutmann. Telescoping density-ratio estimation. In NeurIPS, 2020
2020
-
[35]
Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66: 0 688--701, 1974
1974
-
[36]
Introduction to modern causal inference, 2024
Alejandro Schuler and Mark van der Laan. Introduction to modern causal inference, 2024. URL https://alejandroschuler.github.io/mci/introduction-to-modern-causal-inference.html
2024
-
[37]
Improving predictive inference under covariate shift by weighting the log-likelihood function
Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90 0 (2): 0 227--244, 2000
2000
-
[38]
Direct importance estimation for covariate shift adaptation
Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von B \"u nau, and Motoaki Kawanabe. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60 0 (4): 0 699--746, 2008
2008
-
[39]
Density ratio matching under the bregman divergence: A unified framework of density ratio estimation
Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori. Density ratio matching under the bregman divergence: A unified framework of density ratio estimation. Annals of the Institute of Statistical Mathematics, 64, 10 2011
2011
-
[40]
Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data
Zhiqiang Tan. Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data. Biometrika, 107 0 (1): 0 137--158, 2019
2019
-
[41]
Off-policy evaluation and learning for external validity under a covariate shift
Masatoshi Uehara, Masahiro Kato, and Shota Yasui. Off-policy evaluation and learning for external validity under a covariate shift. In Conference on Neural Information Processing Systems (NeurIPS), 2020
2020
-
[42]
Targeted maximum likelihood learning, 2006
van der Laan. Targeted maximum likelihood learning, 2006. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 213. https://biostats.bepress.com/ucbbiostat/paper213/
2006
-
[43]
van der Laan and S
M.J. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer New York, 2011
2011
-
[44]
van der Vaart
Aad W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998
1998
-
[45]
van der Vaart
Aad W. van der Vaart. Semiparametric statistics, 2002. URL https://sites.stat.washington.edu/jaw/COURSES/EPWG/stflour.pdf
2002
-
[46]
Statistical analysis of semi-supervised regression
Larry Wasserman and John Lafferty. Statistical analysis of semi-supervised regression. In Advances in Neural Information Processing Systems (NeurIPS), volume 20, 2007
2007
-
[47]
Wooldridge
Jeffrey M. Wooldridge. Asymptotic properties of weighted m-estimation for standard stratified samples. Econometric Theory, 2001
2001
-
[48]
Covariate balancing propensity score by tailored loss functions
Qingyuan Zhao. Covariate balancing propensity score by tailored loss functions. The Annals of Statistics, 47 0 (2): 0 965 -- 993, 2019
2019
-
[49]
Error analysis for deep relu feedforward density-ratio estimation with bregman divergence
Siming Zheng, Guohao Shen, Yuanyuan Lin, and Jian Huang. Error analysis for deep relu feedforward density-ratio estimation with bregman divergence. Journal of Machine Learning Research, 27 0 (15): 0 1--60, 2026
2026
-
[50]
Semi-supervised learning literature survey
Xiaojin Zhu. Semi-supervised learning literature survey. Technical report, Computer Sciences, University of Wisconsin-Madison, 2005. URL http://pages.cs.wisc.edu/ jerryzhu/pub/ssl_survey.pdf
2005
-
[51]
Zubizarreta
Jos \'e R. Zubizarreta. Stable weights that balance covariates for estimation with incomplete outcome data. Journal of the American Statistical Association, 110 0 (511): 0 910--922, 2015
2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.