pith. sign in

arxiv: 2606.12892 · v1 · pith:ZXLFHMDPnew · submitted 2026-06-11 · 📊 stat.ML · cs.LG· econ.EM· math.ST· stat.ME· stat.TH

Prediction-Powered Causal Inference by Automatic Debiased Machine Learning and Semi-Supervised Riesz Regression

Pith reviewed 2026-06-27 06:01 UTC · model grok-4.3

classification 📊 stat.ML cs.LGecon.EMmath.STstat.MEstat.TH
keywords semi-supervised learningcausal inferencedebiased machine learningRiesz regressionefficiency boundNeyman orthogonalityprediction-powered inference
0
0 comments X

The pith

Unlabeled auxiliary regressors yield causal estimators whose asymptotic variance falls below the labeled-data efficiency bound.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies semiparametric estimation of causal parameters when unlabeled auxiliary regressors are available in addition to labeled observations of outcomes and regressors. It derives the efficient influence function for this semi-supervised setting and shows that the resulting efficiency bound is strictly smaller than the bound attainable from labeled data alone. The authors then combine the efficient influence function with debiased machine learning to produce two estimators, EE-DML-PPCI and TMLE-DML-PPCI, both of which attain the new bound. Estimation of the Riesz representer inside the influence function is performed by a semi-supervised generalized Riesz regression that carries explicit convergence-rate guarantees. A reader would care because the framework lets analysts exploit cheap unlabeled covariates to sharpen inference on parameters whose estimation normally requires expensive labeled outcomes.

Core claim

The asymptotic variances of both the EE-DML-PPCI and TMLE-DML-PPCI estimators equal the efficiency bound derived from the semi-supervised data-generating process, which is strictly smaller than the bound obtained from labeled observations alone, because the estimators are constructed from the Neyman-orthogonal efficient influence function whose Riesz representer is estimated by semi-supervised generalized Riesz regression.

What carries the argument

The efficient influence function (also a Neyman orthogonal score) that depends on the Riesz representer and the regression function, estimated via semi-supervised generalized Riesz regression.

If this is right

  • Both estimating-equation and targeted-maximum-likelihood versions of DML-PPCI attain the same semi-supervised efficiency bound.
  • The Riesz representer can be estimated at rates that preserve the efficiency gain even when the regression function is estimated by machine learning.
  • Prediction-powered causal inference improves upon standard debiased machine learning by systematically incorporating auxiliary unlabeled regressors.
  • The efficiency bound is attainable without requiring the outcome model or propensity model to be correctly specified, provided the Riesz representer is estimated consistently.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same semi-supervised Riesz-regression technique could be applied to other semiparametric problems that admit a Riesz representer, such as average treatment effect on the treated or policy-value estimation.
  • When labeled outcomes are costly, optimal data-collection budgets may shift toward acquiring more unlabeled regressors rather than more labeled pairs.
  • High-dimensional or nonparametric extensions follow directly once the convergence rates of the semi-supervised Riesz regression are verified under sparsity or smoothness assumptions.

Load-bearing premise

The semi-supervised data-generating process must satisfy regularity conditions that let the Riesz representer exist and let the semi-supervised generalized Riesz regression converge at the rates needed for the efficiency bound to be attained.

What would settle it

A simulation or real-data experiment in which the finite-sample variance of the proposed estimators fails to drop below the labeled-only bound, or in which the Riesz-regression estimator does not achieve the convergence rates stated under the semi-supervised sampling process.

read the original abstract

This study investigates semiparametric efficient estimation of causal and structural parameters in a semi-supervised setting. In our setting, unlabeled auxiliary regressors are available in addition to labeled observations consisting of outcomes and regressors. Our goal is to construct estimators of causal and structural parameters whose asymptotic variances are smaller than those of estimators constructed using only labeled data. We refer to this framework as prediction-powered causal inference (PPCI). We first derive the efficient influence function and the efficiency bound, which imply that the use of auxiliary regressors can attain a smaller asymptotic variance than the efficiency bound attainable from labeled observations alone. Then, by combining the efficient influence function with the debiased machine learning (DML) framework, we propose methods that we call DML-PPCI. If we construct an estimating-equation estimator, we refer to the method as EE-DML-PPCI; if we construct a targeted-learning estimator, we refer to the method as TMLE-DML-PPCI. The asymptotic variances of both estimators match our derived efficiency bound. In the construction of the estimators, estimation of the efficient influence function plays an important role. In our study, the efficient influence function is also a Neyman orthogonal score, which depends on the Riesz representer and the regression function. For Riesz representer estimation, we develop semi-supervised generalized Riesz regression with convergence rate guarantees.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper develops a framework called prediction-powered causal inference (PPCI) for semiparametric efficient estimation of causal and structural parameters in a semi-supervised setting, where labeled data (outcomes and regressors) are supplemented by unlabeled auxiliary regressors. It derives the efficient influence function (EIF) and corresponding efficiency bound, which is claimed to be strictly smaller than the bound attainable from labeled data alone. It then proposes two debiased machine learning estimators (EE-DML-PPCI via estimating equations and TMLE-DML-PPCI via targeted learning) that achieve this bound by using a Neyman-orthogonal score depending on the Riesz representer and regression function, with the Riesz representer estimated via a new semi-supervised generalized Riesz regression that comes with convergence rate guarantees.

Significance. If the central claims hold, the work would provide a theoretically grounded method to improve efficiency in causal inference by leveraging abundant unlabeled data, with explicit efficiency bounds and matching estimators. This could be relevant for applications with semi-supervised structures, such as those in epidemiology or econometrics. The derivation of the EIF, the smaller efficiency bound, and the rate guarantees for the semi-supervised Riesz regression would be the key contributions if rigorously established.

major comments (2)
  1. [Abstract] Abstract (efficiency bound paragraph): The claim that the efficiency bound is strictly smaller than the labeled-only bound, and that the EE-DML-PPCI and TMLE-DML-PPCI estimators match it, rests on the semi-supervised data-generating process satisfying regularity conditions so that the Riesz representer exists and the semi-supervised generalized Riesz regression attains the convergence rates needed for asymptotic efficiency. The manuscript must explicitly state these conditions (e.g., on the function classes, smoothness, or eigenvalue bounds) and show they do not reduce to the labeled-only case by construction.
  2. [Abstract] Abstract (EIF and Neyman orthogonality paragraph): The EIF is asserted to be Neyman orthogonal and to depend on the Riesz representer; however, without the explicit form of the EIF or the semi-supervised model assumptions in the derivation, it is unclear whether the orthogonality holds automatically or requires additional restrictions that could affect the strict improvement in the bound.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive comments on the abstract. We address each major comment below. The full manuscript derives the EIF, efficiency bound, and rate guarantees under explicit regularity conditions (detailed in Sections 3 and 4), which we will now highlight more prominently in the abstract to address the concerns raised.

read point-by-point responses
  1. Referee: [Abstract] Abstract (efficiency bound paragraph): The claim that the efficiency bound is strictly smaller than the labeled-only bound, and that the EE-DML-PPCI and TMLE-DML-PPCI estimators match it, rests on the semi-supervised data-generating process satisfying regularity conditions so that the Riesz representer exists and the semi-supervised generalized Riesz regression attains the convergence rates needed for asymptotic efficiency. The manuscript must explicitly state these conditions (e.g., on the function classes, smoothness, or eigenvalue bounds) and show they do not reduce to the labeled-only case by construction.

    Authors: We agree the abstract should reference the conditions more explicitly. The manuscript derives the bound under standard assumptions including bounded function classes for the regression and Riesz representer, sufficient smoothness for the semi-supervised estimators, and eigenvalue bounds ensuring identifiability and faster convergence rates when auxiliary unlabeled data are used. These conditions are shown in the paper (via the semi-supervised generalized Riesz regression) to yield a strictly smaller bound than the labeled-only case, as the auxiliary data improve estimation of the representer without reducing to the supervised setting. We will revise the abstract to state these conditions briefly and note the non-reduction to the labeled-only bound. revision: yes

  2. Referee: [Abstract] Abstract (EIF and Neyman orthogonality paragraph): The EIF is asserted to be Neyman orthogonal and to depend on the Riesz representer; however, without the explicit form of the EIF or the semi-supervised model assumptions in the derivation, it is unclear whether the orthogonality holds automatically or requires additional restrictions that could affect the strict improvement in the bound.

    Authors: The EIF is explicitly derived in Section 3 of the manuscript as the unique influence function for the semi-supervised model; it is Neyman orthogonal by the Riesz representation property under the semi-supervised data structure (labeled outcomes/regressors plus unlabeled auxiliary regressors). Orthogonality holds automatically from the model assumptions without further restrictions, and the paper shows this yields the strict efficiency improvement. We will add the explicit EIF form to the abstract (or a parenthetical) along with a concise statement of the semi-supervised assumptions ensuring orthogonality and the bound improvement. revision: yes

Circularity Check

0 steps flagged

No circularity: standard derivation of EIF and efficiency bound with independent estimator construction

full rationale

The paper derives the efficient influence function and efficiency bound from the semi-supervised data-generating process under stated regularity conditions, then constructs DML-based estimators (EE-DML-PPCI and TMLE-DML-PPCI) whose asymptotic variances are shown to match that bound. This follows the conventional semiparametric efficiency theory workflow without any reduction of the bound or estimators to fitted inputs by construction, self-citation load-bearing premises, or ansatz smuggling. The Riesz representer estimation step is presented as a new semi-supervised procedure with convergence guarantees, not as a renaming or self-referential fit. No load-bearing step collapses to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; ledger entries are therefore limited to those explicitly referenced in the abstract. No free parameters or invented entities are named. Standard semi-supervised regularity conditions are implicitly required for the efficiency bound.

axioms (1)
  • domain assumption Regularity conditions sufficient for existence of the efficient influence function and for convergence of semi-supervised Riesz regression
    Invoked to derive the efficiency bound and to guarantee rates for the Riesz representer estimator (abstract).

pith-pipeline@v0.9.1-grok · 5788 in / 1299 out tokens · 22827 ms · 2026-06-27T06:01:41.110337+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 1 linked inside Pith

  1. [1]

    Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I

    Anastasios N. Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I. Jordan, and Tijana Zrnic. Prediction-powered inference. Science, 382 0 (6671): 0 669--674, 2023

  2. [2]

    Brown, Michael Sklar, Richard Berk, Andreas Buja, and Linda Zhao

    David Azriel, Lawrence D. Brown, Michael Sklar, Richard Berk, Andreas Buja, and Linda Zhao. Semi-supervised linear regression. Journal of the American Statistical Association, 117 0 (540): 0 2238--2251, 2022

  3. [3]

    Heejung Bang and James M. Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61 0 (4): 0 962--973, 2005

  4. [4]

    Augmented balancing weights as linear regression

    David Bruns-Smith, Oliver Dukes, Avi Feller, and Elizabeth L Ogburn. Augmented balancing weights as linear regression. Journal of the Royal Statistical Society Series B: Statistical Methodology, 04 2025

  5. [5]

    Bruns-Smith and Avi Feller

    David A. Bruns-Smith and Avi Feller. Outcome assumptions and duality theory for balancing weights. In International Conference on Artificial Intelligence and Statistics (AISTATS), pp.\ 11037--11055, 2022

  6. [6]

    Prediction-powered causal inferences

    Riccardo Cadei, Ilker Demirel, Piersilvio De Bartolomeis, Lukas Lindorfer, Sylvia Cremer, Cordelia Schmid, and Francesco Locatello. Prediction-powered causal inferences. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

  7. [7]

    Semi-Supervised Learning

    Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. Semi-Supervised Learning. MIT Press, 2006

  8. [8]

    Sieve semiparametric two-step gmm under weak dependence

    Xiaohong Chen and Zhipeng Liao. Sieve semiparametric two-step gmm under weak dependence. Journal of Econometrics, 189 0 (1): 0 163--186, 2015

  9. [9]

    Sieve wald and qlr inferences on semi/nonparametric conditional moment models

    Xiaohong Chen and Demian Pouzo. Sieve wald and qlr inferences on semi/nonparametric conditional moment models. Econometrica, 83 0 (3): 0 1013--1079, 2015

  10. [10]

    Double/debiased machine learning for treatment and structural parameters

    Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 2018

  11. [11]

    Newey, Victor Quintas-Martinez, and Vasilis Syrgkanis

    Victor Chernozhukov, Whitney K. Newey, Victor Quintas-Martinez, and Vasilis Syrgkanis. Automatic debiased machine learning via riesz regression, 2021. a rXiv:2104.14737

  12. [12]

    Newey, and Rahul Singh

    Victor Chernozhukov, Whitney K. Newey, and Rahul Singh. Automatic debiased machine learning of causal and structural effects. Econometrica, 90 0 (3): 0 967--1027, 2022

  13. [13]

    Automatic debiased machine learning for covariate shifts, 2025

    Victor Chernozhukov, Michael Newey, Whitney K Newey, Rahul Singh, and Vasilis Srygkanis. Automatic debiased machine learning for covariate shifts, 2025. a rXiv: 2307.04527

  14. [14]

    Density ratio estimation via infinitesimal classification

    Kristy Choi, Chenlin Meng, Yang Song, and Stefano Ermon. Density ratio estimation via infinitesimal classification. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

  15. [15]

    Prediction-powered generalization of causal inferences

    Ilker Demirel, Ahmed Alaa, Anthony Philippakis, and David Sontag. Prediction-powered generalization of causal inferences. In International Conference on Machine Learning (ICML), 2024

  16. [16]

    Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies

    Jens Hainmueller. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, 20 0 (1): 0 25--46, 2012

  17. [17]

    Covariate balancing propensity score

    Kosuke Imai and Marc Ratkovic. Covariate balancing propensity score. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76 0 (1): 0 243--263, 07 2013. ISSN 1369-7412

  18. [18]

    A least-squares approach to direct importance estimation

    Takafumi Kanamori, Shohei Hido, and Masashi Sugiyama. A least-squares approach to direct importance estimation. Journal of Machine Learning Research, 10 0 (Jul.): 0 1391--1445, 2009

  19. [19]

    Direct bias-correction term estimation for propensity scores and average treatment effect estimation, 2025 a

    Masahiro Kato. Direct bias-correction term estimation for propensity scores and average treatment effect estimation, 2025 a . a rXiv: 2509.22122

  20. [20]

    Nearest neighbor matching as least squares density ratio estimation and riesz regression, 2025 b

    Masahiro Kato. Nearest neighbor matching as least squares density ratio estimation and riesz regression, 2025 b . a rXiv: 2510.24433

  21. [21]

    Riesz regression as direct density ratio estimation, 2025 c

    Masahiro Kato. Riesz regression as direct density ratio estimation, 2025 c . a rXiv: 2511.04568

  22. [22]

    Semi-supervised treatment effect estimation with unlabeled covariates via generalized riesz regression, 2025 d

    Masahiro Kato. Semi-supervised treatment effect estimation with unlabeled covariates via generalized riesz regression, 2025 d . a rXiv: 2511.08303

  23. [23]

    A unified framework for debiased machine learning: Riesz representer fitting under bregman divergence, 2026 a

    Masahiro Kato. A unified framework for debiased machine learning: Riesz representer fitting under bregman divergence, 2026 a . a rXiv: 2601.07752

  24. [24]

    Scorematchingriesz: Score matching for debiased machine learning and policy path estimation

    Masahiro Kato. Scorematchingriesz: Score matching for debiased machine learning and policy path estimation. In International Conference on Machine Learning (ICML), 2026 b

  25. [25]

    Non-negative bregman divergence minimization for deep direct density ratio estimation

    Masahiro Kato and Takeshi Teshima. Non-negative bregman divergence minimization for deep direct density ratio estimation. In International Conference on Machine Learning (ICML), 2021

  26. [26]

    Double debiased covariate shift adaptation robust to density-ratio estimation, 2024 a

    Masahiro Kato, Kota Matsui, and Ryo Inokuchi. Double debiased covariate shift adaptation robust to density-ratio estimation, 2024 a . a rXiv: 2310.16638

  27. [27]

    Active adaptive experimental design for treatment effect estimation with covariate choice

    Masahiro Kato, Akihiro Oga, Wataru Komatsubara, and Ryo Inokuchi. Active adaptive experimental design for treatment effect estimation with covariate choice. In International Conference on Machine Learning (ICML), 2024 b

  28. [28]

    Puate: Semiparametric efficient average treatment effect estimation from treated (positive) and unlabeled units, 2025

    Masahiro Kato, Fumiaki Kozai, and Ryo Inokuchi. Puate: Semiparametric efficient average treatment effect estimation from treated (positive) and unlabeled units, 2025. a rXiv:2501.19345

  29. [29]

    Edward H. Kennedy. Efficient nonparametric causal inference with missing exposure information. The International Journal of Biostatistics, 16 0 (1), 2020

  30. [30]

    Chris A. J. Klaassen. Consistent estimation of the influence function of locally asymptotically linear estimators. Annals of Statistics, 15, 1987

  31. [31]

    Case-control studies with contaminated controls

    Tony Lancaster and Guido Imbens. Case-control studies with contaminated controls. Journal of Econometrics, 71 0 (1): 0 145--160, 1996

  32. [32]

    Asymptotic Methods in Statistical Decision Theory (Springer Series in Statistics)

    Lucien Le Cam. Asymptotic Methods in Statistical Decision Theory (Springer Series in Statistics). Springer, 1986

  33. [33]

    Theoretical comparisons of positive-unlabeled learning against positive-negative learning

    Gang Niu, Marthinus Christoffel du Plessis, Tomoya Sakai, Yao Ma, and Masashi Sugiyama. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In Advances in Neural Information Processing Systems (NeurIPS), 2016

  34. [34]

    Rhodes, K

    B. Rhodes, K. Xu, and M.U. Gutmann. Telescoping density-ratio estimation. In NeurIPS, 2020

  35. [35]

    Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66: 0 688--701, 1974

  36. [36]

    Introduction to modern causal inference, 2024

    Alejandro Schuler and Mark van der Laan. Introduction to modern causal inference, 2024. URL https://alejandroschuler.github.io/mci/introduction-to-modern-causal-inference.html

  37. [37]

    Improving predictive inference under covariate shift by weighting the log-likelihood function

    Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90 0 (2): 0 227--244, 2000

  38. [38]

    Direct importance estimation for covariate shift adaptation

    Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von B \"u nau, and Motoaki Kawanabe. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60 0 (4): 0 699--746, 2008

  39. [39]

    Density ratio matching under the bregman divergence: A unified framework of density ratio estimation

    Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori. Density ratio matching under the bregman divergence: A unified framework of density ratio estimation. Annals of the Institute of Statistical Mathematics, 64, 10 2011

  40. [40]

    Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data

    Zhiqiang Tan. Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data. Biometrika, 107 0 (1): 0 137--158, 2019

  41. [41]

    Off-policy evaluation and learning for external validity under a covariate shift

    Masatoshi Uehara, Masahiro Kato, and Shota Yasui. Off-policy evaluation and learning for external validity under a covariate shift. In Conference on Neural Information Processing Systems (NeurIPS), 2020

  42. [42]

    Targeted maximum likelihood learning, 2006

    van der Laan. Targeted maximum likelihood learning, 2006. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 213. https://biostats.bepress.com/ucbbiostat/paper213/

  43. [43]

    van der Laan and S

    M.J. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer New York, 2011

  44. [44]

    van der Vaart

    Aad W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998

  45. [45]

    van der Vaart

    Aad W. van der Vaart. Semiparametric statistics, 2002. URL https://sites.stat.washington.edu/jaw/COURSES/EPWG/stflour.pdf

  46. [46]

    Statistical analysis of semi-supervised regression

    Larry Wasserman and John Lafferty. Statistical analysis of semi-supervised regression. In Advances in Neural Information Processing Systems (NeurIPS), volume 20, 2007

  47. [47]

    Wooldridge

    Jeffrey M. Wooldridge. Asymptotic properties of weighted m-estimation for standard stratified samples. Econometric Theory, 2001

  48. [48]

    Covariate balancing propensity score by tailored loss functions

    Qingyuan Zhao. Covariate balancing propensity score by tailored loss functions. The Annals of Statistics, 47 0 (2): 0 965 -- 993, 2019

  49. [49]

    Error analysis for deep relu feedforward density-ratio estimation with bregman divergence

    Siming Zheng, Guohao Shen, Yuanyuan Lin, and Jian Huang. Error analysis for deep relu feedforward density-ratio estimation with bregman divergence. Journal of Machine Learning Research, 27 0 (15): 0 1--60, 2026

  50. [50]

    Semi-supervised learning literature survey

    Xiaojin Zhu. Semi-supervised learning literature survey. Technical report, Computer Sciences, University of Wisconsin-Madison, 2005. URL http://pages.cs.wisc.edu/ jerryzhu/pub/ssl_survey.pdf

  51. [51]

    Zubizarreta

    Jos \'e R. Zubizarreta. Stable weights that balance covariates for estimation with incomplete outcome data. Journal of the American Statistical Association, 110 0 (511): 0 910--922, 2015