Prediction-Powered Causal Inference by Automatic Debiased Machine Learning and Semi-Supervised Riesz Regression

Masahiro Kato

arxiv: 2606.12892 · v1 · pith:ZXLFHMDPnew · submitted 2026-06-11 · 📊 stat.ML · cs.LG· econ.EM· math.ST· stat.ME· stat.TH

Prediction-Powered Causal Inference by Automatic Debiased Machine Learning and Semi-Supervised Riesz Regression

Masahiro Kato This is my paper

Pith reviewed 2026-06-27 06:01 UTC · model grok-4.3

classification 📊 stat.ML cs.LGecon.EMmath.STstat.MEstat.TH

keywords semi-supervised learningcausal inferencedebiased machine learningRiesz regressionefficiency boundNeyman orthogonalityprediction-powered inference

0 comments

The pith

Unlabeled auxiliary regressors yield causal estimators whose asymptotic variance falls below the labeled-data efficiency bound.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies semiparametric estimation of causal parameters when unlabeled auxiliary regressors are available in addition to labeled observations of outcomes and regressors. It derives the efficient influence function for this semi-supervised setting and shows that the resulting efficiency bound is strictly smaller than the bound attainable from labeled data alone. The authors then combine the efficient influence function with debiased machine learning to produce two estimators, EE-DML-PPCI and TMLE-DML-PPCI, both of which attain the new bound. Estimation of the Riesz representer inside the influence function is performed by a semi-supervised generalized Riesz regression that carries explicit convergence-rate guarantees. A reader would care because the framework lets analysts exploit cheap unlabeled covariates to sharpen inference on parameters whose estimation normally requires expensive labeled outcomes.

Core claim

The asymptotic variances of both the EE-DML-PPCI and TMLE-DML-PPCI estimators equal the efficiency bound derived from the semi-supervised data-generating process, which is strictly smaller than the bound obtained from labeled observations alone, because the estimators are constructed from the Neyman-orthogonal efficient influence function whose Riesz representer is estimated by semi-supervised generalized Riesz regression.

What carries the argument

The efficient influence function (also a Neyman orthogonal score) that depends on the Riesz representer and the regression function, estimated via semi-supervised generalized Riesz regression.

If this is right

Both estimating-equation and targeted-maximum-likelihood versions of DML-PPCI attain the same semi-supervised efficiency bound.
The Riesz representer can be estimated at rates that preserve the efficiency gain even when the regression function is estimated by machine learning.
Prediction-powered causal inference improves upon standard debiased machine learning by systematically incorporating auxiliary unlabeled regressors.
The efficiency bound is attainable without requiring the outcome model or propensity model to be correctly specified, provided the Riesz representer is estimated consistently.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same semi-supervised Riesz-regression technique could be applied to other semiparametric problems that admit a Riesz representer, such as average treatment effect on the treated or policy-value estimation.
When labeled outcomes are costly, optimal data-collection budgets may shift toward acquiring more unlabeled regressors rather than more labeled pairs.
High-dimensional or nonparametric extensions follow directly once the convergence rates of the semi-supervised Riesz regression are verified under sparsity or smoothness assumptions.

Load-bearing premise

The semi-supervised data-generating process must satisfy regularity conditions that let the Riesz representer exist and let the semi-supervised generalized Riesz regression converge at the rates needed for the efficiency bound to be attained.

What would settle it

A simulation or real-data experiment in which the finite-sample variance of the proposed estimators fails to drop below the labeled-only bound, or in which the Riesz-regression estimator does not achieve the convergence rates stated under the semi-supervised sampling process.

read the original abstract

This study investigates semiparametric efficient estimation of causal and structural parameters in a semi-supervised setting. In our setting, unlabeled auxiliary regressors are available in addition to labeled observations consisting of outcomes and regressors. Our goal is to construct estimators of causal and structural parameters whose asymptotic variances are smaller than those of estimators constructed using only labeled data. We refer to this framework as prediction-powered causal inference (PPCI). We first derive the efficient influence function and the efficiency bound, which imply that the use of auxiliary regressors can attain a smaller asymptotic variance than the efficiency bound attainable from labeled observations alone. Then, by combining the efficient influence function with the debiased machine learning (DML) framework, we propose methods that we call DML-PPCI. If we construct an estimating-equation estimator, we refer to the method as EE-DML-PPCI; if we construct a targeted-learning estimator, we refer to the method as TMLE-DML-PPCI. The asymptotic variances of both estimators match our derived efficiency bound. In the construction of the estimators, estimation of the efficient influence function plays an important role. In our study, the efficient influence function is also a Neyman orthogonal score, which depends on the Riesz representer and the regression function. For Riesz representer estimation, we develop semi-supervised generalized Riesz regression with convergence rate guarantees.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives an efficiency bound for causal parameters that improves with unlabeled regressors and builds DML estimators to match it, but the gain hinges on unverified regularity conditions for the Riesz step.

read the letter

The main takeaway is that this work shows a concrete way to reduce asymptotic variance in causal estimation by folding in unlabeled auxiliary regressors. They derive the efficient influence function under the semi-supervised setup, prove a bound strictly tighter than the labeled-only case, and then construct two estimators (EE-DML-PPCI and TMLE-DML-PPCI) whose variances attain that bound. The technical piece that enables this is their semi-supervised generalized Riesz regression, which comes with stated convergence rates and is used to estimate the representer inside the Neyman-orthogonal score.

What the paper does cleanly is keep the construction inside the existing DML and TMLE machinery while adding the semi-supervised component. The orthogonality is preserved, and the estimators are presented as automatic once the nuisance functions are estimated. This addresses a setting that shows up often in practice, where extra covariates are cheap to collect but outcomes are expensive to label.

The soft spot is exactly the regularity conditions the stress-test flags. The bound is smaller only if the Riesz representer exists and the semi-supervised regression hits the rates needed for the EIF-based estimators to be efficient. The abstract asserts these hold, but without the proofs it is not clear how restrictive the conditions are or whether the improvement survives when the unlabeled data carries limited signal. If those steps require strong smoothness or density assumptions, the practical gain could shrink or vanish, and the result would reduce to the labeled case. The citation pattern looks standard for this area, but the derivations themselves need checking.

This is for people working on efficient semi-supervised causal methods in econometrics or policy settings. A reader who already uses DML would see how to extend it without much extra machinery. It deserves a serious referee because the claim is specific, the framework is well-motivated, and the technical steps are falsifiable once the proofs are examined. I would send it to peer review.

Referee Report

2 major / 0 minor

Summary. The paper develops a framework called prediction-powered causal inference (PPCI) for semiparametric efficient estimation of causal and structural parameters in a semi-supervised setting, where labeled data (outcomes and regressors) are supplemented by unlabeled auxiliary regressors. It derives the efficient influence function (EIF) and corresponding efficiency bound, which is claimed to be strictly smaller than the bound attainable from labeled data alone. It then proposes two debiased machine learning estimators (EE-DML-PPCI via estimating equations and TMLE-DML-PPCI via targeted learning) that achieve this bound by using a Neyman-orthogonal score depending on the Riesz representer and regression function, with the Riesz representer estimated via a new semi-supervised generalized Riesz regression that comes with convergence rate guarantees.

Significance. If the central claims hold, the work would provide a theoretically grounded method to improve efficiency in causal inference by leveraging abundant unlabeled data, with explicit efficiency bounds and matching estimators. This could be relevant for applications with semi-supervised structures, such as those in epidemiology or econometrics. The derivation of the EIF, the smaller efficiency bound, and the rate guarantees for the semi-supervised Riesz regression would be the key contributions if rigorously established.

major comments (2)

[Abstract] Abstract (efficiency bound paragraph): The claim that the efficiency bound is strictly smaller than the labeled-only bound, and that the EE-DML-PPCI and TMLE-DML-PPCI estimators match it, rests on the semi-supervised data-generating process satisfying regularity conditions so that the Riesz representer exists and the semi-supervised generalized Riesz regression attains the convergence rates needed for asymptotic efficiency. The manuscript must explicitly state these conditions (e.g., on the function classes, smoothness, or eigenvalue bounds) and show they do not reduce to the labeled-only case by construction.
[Abstract] Abstract (EIF and Neyman orthogonality paragraph): The EIF is asserted to be Neyman orthogonal and to depend on the Riesz representer; however, without the explicit form of the EIF or the semi-supervised model assumptions in the derivation, it is unclear whether the orthogonality holds automatically or requires additional restrictions that could affect the strict improvement in the bound.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive comments on the abstract. We address each major comment below. The full manuscript derives the EIF, efficiency bound, and rate guarantees under explicit regularity conditions (detailed in Sections 3 and 4), which we will now highlight more prominently in the abstract to address the concerns raised.

read point-by-point responses

Referee: [Abstract] Abstract (efficiency bound paragraph): The claim that the efficiency bound is strictly smaller than the labeled-only bound, and that the EE-DML-PPCI and TMLE-DML-PPCI estimators match it, rests on the semi-supervised data-generating process satisfying regularity conditions so that the Riesz representer exists and the semi-supervised generalized Riesz regression attains the convergence rates needed for asymptotic efficiency. The manuscript must explicitly state these conditions (e.g., on the function classes, smoothness, or eigenvalue bounds) and show they do not reduce to the labeled-only case by construction.

Authors: We agree the abstract should reference the conditions more explicitly. The manuscript derives the bound under standard assumptions including bounded function classes for the regression and Riesz representer, sufficient smoothness for the semi-supervised estimators, and eigenvalue bounds ensuring identifiability and faster convergence rates when auxiliary unlabeled data are used. These conditions are shown in the paper (via the semi-supervised generalized Riesz regression) to yield a strictly smaller bound than the labeled-only case, as the auxiliary data improve estimation of the representer without reducing to the supervised setting. We will revise the abstract to state these conditions briefly and note the non-reduction to the labeled-only bound. revision: yes
Referee: [Abstract] Abstract (EIF and Neyman orthogonality paragraph): The EIF is asserted to be Neyman orthogonal and to depend on the Riesz representer; however, without the explicit form of the EIF or the semi-supervised model assumptions in the derivation, it is unclear whether the orthogonality holds automatically or requires additional restrictions that could affect the strict improvement in the bound.

Authors: The EIF is explicitly derived in Section 3 of the manuscript as the unique influence function for the semi-supervised model; it is Neyman orthogonal by the Riesz representation property under the semi-supervised data structure (labeled outcomes/regressors plus unlabeled auxiliary regressors). Orthogonality holds automatically from the model assumptions without further restrictions, and the paper shows this yields the strict efficiency improvement. We will add the explicit EIF form to the abstract (or a parenthetical) along with a concise statement of the semi-supervised assumptions ensuring orthogonality and the bound improvement. revision: yes

Circularity Check

0 steps flagged

No circularity: standard derivation of EIF and efficiency bound with independent estimator construction

full rationale

The paper derives the efficient influence function and efficiency bound from the semi-supervised data-generating process under stated regularity conditions, then constructs DML-based estimators (EE-DML-PPCI and TMLE-DML-PPCI) whose asymptotic variances are shown to match that bound. This follows the conventional semiparametric efficiency theory workflow without any reduction of the bound or estimators to fitted inputs by construction, self-citation load-bearing premises, or ansatz smuggling. The Riesz representer estimation step is presented as a new semi-supervised procedure with convergence guarantees, not as a renaming or self-referential fit. No load-bearing step collapses to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; ledger entries are therefore limited to those explicitly referenced in the abstract. No free parameters or invented entities are named. Standard semi-supervised regularity conditions are implicitly required for the efficiency bound.

axioms (1)

domain assumption Regularity conditions sufficient for existence of the efficient influence function and for convergence of semi-supervised Riesz regression
Invoked to derive the efficiency bound and to guarantee rates for the Riesz representer estimator (abstract).

pith-pipeline@v0.9.1-grok · 5788 in / 1299 out tokens · 22827 ms · 2026-06-27T06:01:41.110337+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 1 linked inside Pith

[1]

Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I

Anastasios N. Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I. Jordan, and Tijana Zrnic. Prediction-powered inference. Science, 382 0 (6671): 0 669--674, 2023

2023
[2]

Brown, Michael Sklar, Richard Berk, Andreas Buja, and Linda Zhao

David Azriel, Lawrence D. Brown, Michael Sklar, Richard Berk, Andreas Buja, and Linda Zhao. Semi-supervised linear regression. Journal of the American Statistical Association, 117 0 (540): 0 2238--2251, 2022

2022
[3]

Heejung Bang and James M. Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61 0 (4): 0 962--973, 2005

2005
[4]

Augmented balancing weights as linear regression

David Bruns-Smith, Oliver Dukes, Avi Feller, and Elizabeth L Ogburn. Augmented balancing weights as linear regression. Journal of the Royal Statistical Society Series B: Statistical Methodology, 04 2025

2025
[5]

Bruns-Smith and Avi Feller

David A. Bruns-Smith and Avi Feller. Outcome assumptions and duality theory for balancing weights. In International Conference on Artificial Intelligence and Statistics (AISTATS), pp.\ 11037--11055, 2022

2022
[6]

Prediction-powered causal inferences

Riccardo Cadei, Ilker Demirel, Piersilvio De Bartolomeis, Lukas Lindorfer, Sylvia Cremer, Cordelia Schmid, and Francesco Locatello. Prediction-powered causal inferences. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

2025
[7]

Semi-Supervised Learning

Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. Semi-Supervised Learning. MIT Press, 2006

2006
[8]

Sieve semiparametric two-step gmm under weak dependence

Xiaohong Chen and Zhipeng Liao. Sieve semiparametric two-step gmm under weak dependence. Journal of Econometrics, 189 0 (1): 0 163--186, 2015

2015
[9]

Sieve wald and qlr inferences on semi/nonparametric conditional moment models

Xiaohong Chen and Demian Pouzo. Sieve wald and qlr inferences on semi/nonparametric conditional moment models. Econometrica, 83 0 (3): 0 1013--1079, 2015

2015
[10]

Double/debiased machine learning for treatment and structural parameters

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 2018

2018
[11]

Newey, Victor Quintas-Martinez, and Vasilis Syrgkanis

Victor Chernozhukov, Whitney K. Newey, Victor Quintas-Martinez, and Vasilis Syrgkanis. Automatic debiased machine learning via riesz regression, 2021. a rXiv:2104.14737

arXiv 2021
[12]

Newey, and Rahul Singh

Victor Chernozhukov, Whitney K. Newey, and Rahul Singh. Automatic debiased machine learning of causal and structural effects. Econometrica, 90 0 (3): 0 967--1027, 2022

2022
[13]

Automatic debiased machine learning for covariate shifts, 2025

Victor Chernozhukov, Michael Newey, Whitney K Newey, Rahul Singh, and Vasilis Srygkanis. Automatic debiased machine learning for covariate shifts, 2025. a rXiv: 2307.04527

arXiv 2025
[14]

Density ratio estimation via infinitesimal classification

Kristy Choi, Chenlin Meng, Yang Song, and Stefano Ermon. Density ratio estimation via infinitesimal classification. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

2022
[15]

Prediction-powered generalization of causal inferences

Ilker Demirel, Ahmed Alaa, Anthony Philippakis, and David Sontag. Prediction-powered generalization of causal inferences. In International Conference on Machine Learning (ICML), 2024

2024
[16]

Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies

Jens Hainmueller. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, 20 0 (1): 0 25--46, 2012

2012
[17]

Covariate balancing propensity score

Kosuke Imai and Marc Ratkovic. Covariate balancing propensity score. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76 0 (1): 0 243--263, 07 2013. ISSN 1369-7412

2013
[18]

A least-squares approach to direct importance estimation

Takafumi Kanamori, Shohei Hido, and Masashi Sugiyama. A least-squares approach to direct importance estimation. Journal of Machine Learning Research, 10 0 (Jul.): 0 1391--1445, 2009

2009
[19]

Direct bias-correction term estimation for propensity scores and average treatment effect estimation, 2025 a

Masahiro Kato. Direct bias-correction term estimation for propensity scores and average treatment effect estimation, 2025 a . a rXiv: 2509.22122

arXiv 2025
[20]

Nearest neighbor matching as least squares density ratio estimation and riesz regression, 2025 b

Masahiro Kato. Nearest neighbor matching as least squares density ratio estimation and riesz regression, 2025 b . a rXiv: 2510.24433

arXiv 2025
[21]

Riesz regression as direct density ratio estimation, 2025 c

Masahiro Kato. Riesz regression as direct density ratio estimation, 2025 c . a rXiv: 2511.04568

arXiv 2025
[22]

Semi-supervised treatment effect estimation with unlabeled covariates via generalized riesz regression, 2025 d

Masahiro Kato. Semi-supervised treatment effect estimation with unlabeled covariates via generalized riesz regression, 2025 d . a rXiv: 2511.08303

Pith/arXiv arXiv 2025
[23]

A unified framework for debiased machine learning: Riesz representer fitting under bregman divergence, 2026 a

Masahiro Kato. A unified framework for debiased machine learning: Riesz representer fitting under bregman divergence, 2026 a . a rXiv: 2601.07752

arXiv 2026
[24]

Scorematchingriesz: Score matching for debiased machine learning and policy path estimation

Masahiro Kato. Scorematchingriesz: Score matching for debiased machine learning and policy path estimation. In International Conference on Machine Learning (ICML), 2026 b

2026
[25]

Non-negative bregman divergence minimization for deep direct density ratio estimation

Masahiro Kato and Takeshi Teshima. Non-negative bregman divergence minimization for deep direct density ratio estimation. In International Conference on Machine Learning (ICML), 2021

2021
[26]

Double debiased covariate shift adaptation robust to density-ratio estimation, 2024 a

Masahiro Kato, Kota Matsui, and Ryo Inokuchi. Double debiased covariate shift adaptation robust to density-ratio estimation, 2024 a . a rXiv: 2310.16638

arXiv 2024
[27]

Active adaptive experimental design for treatment effect estimation with covariate choice

Masahiro Kato, Akihiro Oga, Wataru Komatsubara, and Ryo Inokuchi. Active adaptive experimental design for treatment effect estimation with covariate choice. In International Conference on Machine Learning (ICML), 2024 b

2024
[28]

Puate: Semiparametric efficient average treatment effect estimation from treated (positive) and unlabeled units, 2025

Masahiro Kato, Fumiaki Kozai, and Ryo Inokuchi. Puate: Semiparametric efficient average treatment effect estimation from treated (positive) and unlabeled units, 2025. a rXiv:2501.19345

arXiv 2025
[29]

Edward H. Kennedy. Efficient nonparametric causal inference with missing exposure information. The International Journal of Biostatistics, 16 0 (1), 2020

2020
[30]

Chris A. J. Klaassen. Consistent estimation of the influence function of locally asymptotically linear estimators. Annals of Statistics, 15, 1987

1987
[31]

Case-control studies with contaminated controls

Tony Lancaster and Guido Imbens. Case-control studies with contaminated controls. Journal of Econometrics, 71 0 (1): 0 145--160, 1996

1996
[32]

Asymptotic Methods in Statistical Decision Theory (Springer Series in Statistics)

Lucien Le Cam. Asymptotic Methods in Statistical Decision Theory (Springer Series in Statistics). Springer, 1986

1986
[33]

Theoretical comparisons of positive-unlabeled learning against positive-negative learning

Gang Niu, Marthinus Christoffel du Plessis, Tomoya Sakai, Yao Ma, and Masashi Sugiyama. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In Advances in Neural Information Processing Systems (NeurIPS), 2016

2016
[34]

Rhodes, K

B. Rhodes, K. Xu, and M.U. Gutmann. Telescoping density-ratio estimation. In NeurIPS, 2020

2020
[35]

Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66: 0 688--701, 1974

1974
[36]

Introduction to modern causal inference, 2024

Alejandro Schuler and Mark van der Laan. Introduction to modern causal inference, 2024. URL https://alejandroschuler.github.io/mci/introduction-to-modern-causal-inference.html

2024
[37]

Improving predictive inference under covariate shift by weighting the log-likelihood function

Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90 0 (2): 0 227--244, 2000

2000
[38]

Direct importance estimation for covariate shift adaptation

Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von B \"u nau, and Motoaki Kawanabe. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60 0 (4): 0 699--746, 2008

2008
[39]

Density ratio matching under the bregman divergence: A unified framework of density ratio estimation

Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori. Density ratio matching under the bregman divergence: A unified framework of density ratio estimation. Annals of the Institute of Statistical Mathematics, 64, 10 2011

2011
[40]

Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data

Zhiqiang Tan. Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data. Biometrika, 107 0 (1): 0 137--158, 2019

2019
[41]

Off-policy evaluation and learning for external validity under a covariate shift

Masatoshi Uehara, Masahiro Kato, and Shota Yasui. Off-policy evaluation and learning for external validity under a covariate shift. In Conference on Neural Information Processing Systems (NeurIPS), 2020

2020
[42]

Targeted maximum likelihood learning, 2006

van der Laan. Targeted maximum likelihood learning, 2006. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 213. https://biostats.bepress.com/ucbbiostat/paper213/

2006
[43]

van der Laan and S

M.J. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer New York, 2011

2011
[44]

van der Vaart

Aad W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998

1998
[45]

van der Vaart

Aad W. van der Vaart. Semiparametric statistics, 2002. URL https://sites.stat.washington.edu/jaw/COURSES/EPWG/stflour.pdf

2002
[46]

Statistical analysis of semi-supervised regression

Larry Wasserman and John Lafferty. Statistical analysis of semi-supervised regression. In Advances in Neural Information Processing Systems (NeurIPS), volume 20, 2007

2007
[47]

Wooldridge

Jeffrey M. Wooldridge. Asymptotic properties of weighted m-estimation for standard stratified samples. Econometric Theory, 2001

2001
[48]

Covariate balancing propensity score by tailored loss functions

Qingyuan Zhao. Covariate balancing propensity score by tailored loss functions. The Annals of Statistics, 47 0 (2): 0 965 -- 993, 2019

2019
[49]

Error analysis for deep relu feedforward density-ratio estimation with bregman divergence

Siming Zheng, Guohao Shen, Yuanyuan Lin, and Jian Huang. Error analysis for deep relu feedforward density-ratio estimation with bregman divergence. Journal of Machine Learning Research, 27 0 (15): 0 1--60, 2026

2026
[50]

Semi-supervised learning literature survey

Xiaojin Zhu. Semi-supervised learning literature survey. Technical report, Computer Sciences, University of Wisconsin-Madison, 2005. URL http://pages.cs.wisc.edu/ jerryzhu/pub/ssl_survey.pdf

2005
[51]

Zubizarreta

Jos \'e R. Zubizarreta. Stable weights that balance covariates for estimation with incomplete outcome data. Journal of the American Statistical Association, 110 0 (511): 0 910--922, 2015

2015

[1] [1]

Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I

Anastasios N. Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I. Jordan, and Tijana Zrnic. Prediction-powered inference. Science, 382 0 (6671): 0 669--674, 2023

2023

[2] [2]

Brown, Michael Sklar, Richard Berk, Andreas Buja, and Linda Zhao

David Azriel, Lawrence D. Brown, Michael Sklar, Richard Berk, Andreas Buja, and Linda Zhao. Semi-supervised linear regression. Journal of the American Statistical Association, 117 0 (540): 0 2238--2251, 2022

2022

[3] [3]

Heejung Bang and James M. Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61 0 (4): 0 962--973, 2005

2005

[4] [4]

Augmented balancing weights as linear regression

David Bruns-Smith, Oliver Dukes, Avi Feller, and Elizabeth L Ogburn. Augmented balancing weights as linear regression. Journal of the Royal Statistical Society Series B: Statistical Methodology, 04 2025

2025

[5] [5]

Bruns-Smith and Avi Feller

David A. Bruns-Smith and Avi Feller. Outcome assumptions and duality theory for balancing weights. In International Conference on Artificial Intelligence and Statistics (AISTATS), pp.\ 11037--11055, 2022

2022

[6] [6]

Prediction-powered causal inferences

Riccardo Cadei, Ilker Demirel, Piersilvio De Bartolomeis, Lukas Lindorfer, Sylvia Cremer, Cordelia Schmid, and Francesco Locatello. Prediction-powered causal inferences. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

2025

[7] [7]

Semi-Supervised Learning

Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. Semi-Supervised Learning. MIT Press, 2006

2006

[8] [8]

Sieve semiparametric two-step gmm under weak dependence

Xiaohong Chen and Zhipeng Liao. Sieve semiparametric two-step gmm under weak dependence. Journal of Econometrics, 189 0 (1): 0 163--186, 2015

2015

[9] [9]

Sieve wald and qlr inferences on semi/nonparametric conditional moment models

Xiaohong Chen and Demian Pouzo. Sieve wald and qlr inferences on semi/nonparametric conditional moment models. Econometrica, 83 0 (3): 0 1013--1079, 2015

2015

[10] [10]

Double/debiased machine learning for treatment and structural parameters

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 2018

2018

[11] [11]

Newey, Victor Quintas-Martinez, and Vasilis Syrgkanis

Victor Chernozhukov, Whitney K. Newey, Victor Quintas-Martinez, and Vasilis Syrgkanis. Automatic debiased machine learning via riesz regression, 2021. a rXiv:2104.14737

arXiv 2021

[12] [12]

Newey, and Rahul Singh

Victor Chernozhukov, Whitney K. Newey, and Rahul Singh. Automatic debiased machine learning of causal and structural effects. Econometrica, 90 0 (3): 0 967--1027, 2022

2022

[13] [13]

Automatic debiased machine learning for covariate shifts, 2025

Victor Chernozhukov, Michael Newey, Whitney K Newey, Rahul Singh, and Vasilis Srygkanis. Automatic debiased machine learning for covariate shifts, 2025. a rXiv: 2307.04527

arXiv 2025

[14] [14]

Density ratio estimation via infinitesimal classification

Kristy Choi, Chenlin Meng, Yang Song, and Stefano Ermon. Density ratio estimation via infinitesimal classification. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

2022

[15] [15]

Prediction-powered generalization of causal inferences

Ilker Demirel, Ahmed Alaa, Anthony Philippakis, and David Sontag. Prediction-powered generalization of causal inferences. In International Conference on Machine Learning (ICML), 2024

2024

[16] [16]

Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies

Jens Hainmueller. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, 20 0 (1): 0 25--46, 2012

2012

[17] [17]

Covariate balancing propensity score

Kosuke Imai and Marc Ratkovic. Covariate balancing propensity score. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76 0 (1): 0 243--263, 07 2013. ISSN 1369-7412

2013

[18] [18]

A least-squares approach to direct importance estimation

Takafumi Kanamori, Shohei Hido, and Masashi Sugiyama. A least-squares approach to direct importance estimation. Journal of Machine Learning Research, 10 0 (Jul.): 0 1391--1445, 2009

2009

[19] [19]

Direct bias-correction term estimation for propensity scores and average treatment effect estimation, 2025 a

Masahiro Kato. Direct bias-correction term estimation for propensity scores and average treatment effect estimation, 2025 a . a rXiv: 2509.22122

arXiv 2025

[20] [20]

Nearest neighbor matching as least squares density ratio estimation and riesz regression, 2025 b

Masahiro Kato. Nearest neighbor matching as least squares density ratio estimation and riesz regression, 2025 b . a rXiv: 2510.24433

arXiv 2025

[21] [21]

Riesz regression as direct density ratio estimation, 2025 c

Masahiro Kato. Riesz regression as direct density ratio estimation, 2025 c . a rXiv: 2511.04568

arXiv 2025

[22] [22]

Semi-supervised treatment effect estimation with unlabeled covariates via generalized riesz regression, 2025 d

Masahiro Kato. Semi-supervised treatment effect estimation with unlabeled covariates via generalized riesz regression, 2025 d . a rXiv: 2511.08303

Pith/arXiv arXiv 2025

[23] [23]

A unified framework for debiased machine learning: Riesz representer fitting under bregman divergence, 2026 a

Masahiro Kato. A unified framework for debiased machine learning: Riesz representer fitting under bregman divergence, 2026 a . a rXiv: 2601.07752

arXiv 2026

[24] [24]

Scorematchingriesz: Score matching for debiased machine learning and policy path estimation

Masahiro Kato. Scorematchingriesz: Score matching for debiased machine learning and policy path estimation. In International Conference on Machine Learning (ICML), 2026 b

2026

[25] [25]

Non-negative bregman divergence minimization for deep direct density ratio estimation

Masahiro Kato and Takeshi Teshima. Non-negative bregman divergence minimization for deep direct density ratio estimation. In International Conference on Machine Learning (ICML), 2021

2021

[26] [26]

Double debiased covariate shift adaptation robust to density-ratio estimation, 2024 a

Masahiro Kato, Kota Matsui, and Ryo Inokuchi. Double debiased covariate shift adaptation robust to density-ratio estimation, 2024 a . a rXiv: 2310.16638

arXiv 2024

[27] [27]

Active adaptive experimental design for treatment effect estimation with covariate choice

Masahiro Kato, Akihiro Oga, Wataru Komatsubara, and Ryo Inokuchi. Active adaptive experimental design for treatment effect estimation with covariate choice. In International Conference on Machine Learning (ICML), 2024 b

2024

[28] [28]

Puate: Semiparametric efficient average treatment effect estimation from treated (positive) and unlabeled units, 2025

Masahiro Kato, Fumiaki Kozai, and Ryo Inokuchi. Puate: Semiparametric efficient average treatment effect estimation from treated (positive) and unlabeled units, 2025. a rXiv:2501.19345

arXiv 2025

[29] [29]

Edward H. Kennedy. Efficient nonparametric causal inference with missing exposure information. The International Journal of Biostatistics, 16 0 (1), 2020

2020

[30] [30]

Chris A. J. Klaassen. Consistent estimation of the influence function of locally asymptotically linear estimators. Annals of Statistics, 15, 1987

1987

[31] [31]

Case-control studies with contaminated controls

Tony Lancaster and Guido Imbens. Case-control studies with contaminated controls. Journal of Econometrics, 71 0 (1): 0 145--160, 1996

1996

[32] [32]

Asymptotic Methods in Statistical Decision Theory (Springer Series in Statistics)

Lucien Le Cam. Asymptotic Methods in Statistical Decision Theory (Springer Series in Statistics). Springer, 1986

1986

[33] [33]

Theoretical comparisons of positive-unlabeled learning against positive-negative learning

Gang Niu, Marthinus Christoffel du Plessis, Tomoya Sakai, Yao Ma, and Masashi Sugiyama. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In Advances in Neural Information Processing Systems (NeurIPS), 2016

2016

[34] [34]

Rhodes, K

B. Rhodes, K. Xu, and M.U. Gutmann. Telescoping density-ratio estimation. In NeurIPS, 2020

2020

[35] [35]

Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66: 0 688--701, 1974

1974

[36] [36]

Introduction to modern causal inference, 2024

Alejandro Schuler and Mark van der Laan. Introduction to modern causal inference, 2024. URL https://alejandroschuler.github.io/mci/introduction-to-modern-causal-inference.html

2024

[37] [37]

Improving predictive inference under covariate shift by weighting the log-likelihood function

Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90 0 (2): 0 227--244, 2000

2000

[38] [38]

Direct importance estimation for covariate shift adaptation

Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von B \"u nau, and Motoaki Kawanabe. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60 0 (4): 0 699--746, 2008

2008

[39] [39]

Density ratio matching under the bregman divergence: A unified framework of density ratio estimation

Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori. Density ratio matching under the bregman divergence: A unified framework of density ratio estimation. Annals of the Institute of Statistical Mathematics, 64, 10 2011

2011

[40] [40]

Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data

Zhiqiang Tan. Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data. Biometrika, 107 0 (1): 0 137--158, 2019

2019

[41] [41]

Off-policy evaluation and learning for external validity under a covariate shift

Masatoshi Uehara, Masahiro Kato, and Shota Yasui. Off-policy evaluation and learning for external validity under a covariate shift. In Conference on Neural Information Processing Systems (NeurIPS), 2020

2020

[42] [42]

Targeted maximum likelihood learning, 2006

van der Laan. Targeted maximum likelihood learning, 2006. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 213. https://biostats.bepress.com/ucbbiostat/paper213/

2006

[43] [43]

van der Laan and S

M.J. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer New York, 2011

2011

[44] [44]

van der Vaart

Aad W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998

1998

[45] [45]

van der Vaart

Aad W. van der Vaart. Semiparametric statistics, 2002. URL https://sites.stat.washington.edu/jaw/COURSES/EPWG/stflour.pdf

2002

[46] [46]

Statistical analysis of semi-supervised regression

Larry Wasserman and John Lafferty. Statistical analysis of semi-supervised regression. In Advances in Neural Information Processing Systems (NeurIPS), volume 20, 2007

2007

[47] [47]

Wooldridge

Jeffrey M. Wooldridge. Asymptotic properties of weighted m-estimation for standard stratified samples. Econometric Theory, 2001

2001

[48] [48]

Covariate balancing propensity score by tailored loss functions

Qingyuan Zhao. Covariate balancing propensity score by tailored loss functions. The Annals of Statistics, 47 0 (2): 0 965 -- 993, 2019

2019

[49] [49]

Error analysis for deep relu feedforward density-ratio estimation with bregman divergence

Siming Zheng, Guohao Shen, Yuanyuan Lin, and Jian Huang. Error analysis for deep relu feedforward density-ratio estimation with bregman divergence. Journal of Machine Learning Research, 27 0 (15): 0 1--60, 2026

2026

[50] [50]

Semi-supervised learning literature survey

Xiaojin Zhu. Semi-supervised learning literature survey. Technical report, Computer Sciences, University of Wisconsin-Madison, 2005. URL http://pages.cs.wisc.edu/ jerryzhu/pub/ssl_survey.pdf

2005

[51] [51]

Zubizarreta

Jos \'e R. Zubizarreta. Stable weights that balance covariates for estimation with incomplete outcome data. Journal of the American Statistical Association, 110 0 (511): 0 910--922, 2015

2015