Semi-Supervised Treatment Effect Estimation with Unlabeled Covariates for Prediction-Powered Causal Inference

Masahiro Kato

arxiv: 2511.08303 · v2 · submitted 2025-11-11 · 📊 stat.ML · cs.LG· econ.EM· math.ST· stat.ME· stat.TH

Semi-Supervised Treatment Effect Estimation with Unlabeled Covariates for Prediction-Powered Causal Inference

Masahiro Kato This is my paper

Pith reviewed 2026-05-17 23:44 UTC · model grok-4.3

classification 📊 stat.ML cs.LGecon.EMmath.STstat.MEstat.TH

keywords semi-supervised learningtreatment effect estimationcausal inferenceefficiency boundsunlabeled covariatesprediction-powered inferenceasymptotic variance

0 comments

The pith

Incorporating auxiliary covariates lowers the efficiency bound for treatment effect estimation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines treatment effect estimation in a semi-supervised regime where only some observations include the treatment indicator and outcome, but additional unlabeled covariates are available for all units. It analyzes this under two data-generating processes: a one-sample setting in which labels appear on a subset of a single dataset, and a two-sample setting with separate labeled and unlabeled collections. Efficiency bounds are derived for both cases, and the central result is that the auxiliary covariates tighten the bound, so that efficient estimators attain strictly smaller asymptotic variance than estimators that ignore the extra covariates. The work frames the procedure as prediction-powered causal inference.

Core claim

In both the one-sample (censoring) and two-sample (case-control) settings, incorporating auxiliary unlabeled covariates lowers the efficiency bound and yields estimators whose asymptotic variance is smaller than that of estimators that use only the labeled triple of covariates, treatment, and outcome.

What carries the argument

Efficiency bounds derived separately for the one-sample and two-sample semi-supervised data-generating processes, together with the corresponding efficient estimators that attain those bounds.

If this is right

The efficiency bound is strictly lower once auxiliary covariates enter the problem.
Efficient estimators exist whose asymptotic variance matches the improved bound in each setting.
The variance reduction holds without requiring labels on the auxiliary covariates.
The same improvement appears in both the one-sample and two-sample designs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners could collect inexpensive auxiliary covariates to tighten causal estimates whenever full labeling is costly.
The efficiency argument may carry over to other causal functionals beyond the average treatment effect.
Finite-sample behavior and robustness to model misspecification remain open questions suggested by the asymptotic results.

Load-bearing premise

The one-sample and two-sample data-generating processes permit derivation of achievable efficiency bounds under standard regularity conditions for asymptotic analysis of estimators.

What would settle it

An estimator that incorporates the auxiliary covariates yet exhibits the same or larger asymptotic variance as the estimator that ignores them, in either the one-sample or two-sample regime, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2511.08303 by Masahiro Kato.

read the original abstract

This study investigates treatment effect estimation in the semi-supervised setting, also can be interpreted as prediction-powered inference. In our setting, we can use not only the standard triple of covariates, treatment indicator, and outcome, but also unlabeled auxiliary covariates. For this problem, we develop efficiency bounds and efficient estimators whose asymptotic variance aligns with the efficiency bound. In the analysis, we introduce two different data-generating processes: the one-sample setting and the two-sample setting. The one-sample setting considers the case where we can observe treatment indicators and outcomes for a part of the dataset, which is also called the censoring setting. In contrast, the two-sample setting considers two independent datasets with labeled and unlabeled data, which is also called the case-control setting or the stratified setting. In both settings, we find that by incorporating auxiliary covariates, we can lower the efficiency bound and obtain an estimator with an asymptotic variance smaller than that without such auxiliary covariates. We frame our framework as prediction-powered causal inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives efficiency bounds showing that unlabeled auxiliary covariates tighten variance for treatment effect estimators in both one-sample censoring and two-sample settings, with matching estimators.

read the letter

The key point is that adding unlabeled auxiliary covariates lowers the efficiency bound for average treatment effect estimation and yields estimators with smaller asymptotic variance in both the one-sample and two-sample regimes. The work frames this as prediction-powered causal inference and gives explicit bounds plus estimators that attain them under the stated data processes. It does a clean job separating the censoring-style one-sample case from the independent two-sample case and showing the variance improvement over the no-auxiliary baseline. The derivations appear to rest on standard semiparametric efficiency arguments, which is the right toolkit here. Credit for spelling out how the auxiliary covariates enter the influence functions and for providing the corresponding efficient estimators. The soft spot is whether the nuisance estimators (propensity and outcome regressions) estimated on the pooled labeled-plus-unlabeled data actually satisfy the product-rate condition needed for the estimators to hit the bound. The stress-test note flags this correctly: if the extra covariates inflate the entropy of the function classes or slow the nuisance convergence, the claimed variance reduction may not hold in finite samples or even asymptotically. The paper would be stronger with an explicit check of those rates or a simulation that isolates the semi-supervised nuisance step. Overall the argument is coherent on its own terms and the assumptions look standard rather than heroic. This is for people working on semi-supervised or prediction-powered causal methods who care about efficiency bounds. A reader already comfortable with influence-function derivations will get the most out of it. It deserves a serious referee because the problem is well-posed, the claims are falsifiable, and the results would matter if the attainment of the bounds is verified.

Referee Report

3 major / 2 minor

Summary. The paper develops semiparametric efficiency bounds and matching estimators for the average treatment effect in two semi-supervised regimes (one-sample censoring and two-sample case-control) that incorporate unlabeled auxiliary covariates. It derives the bounds under standard regularity conditions, constructs estimators whose asymptotic variance matches the bound, and shows that the auxiliary covariates strictly lower the bound relative to the labeled-only case, framing the approach as prediction-powered causal inference.

Significance. If the bounds are correctly derived and the estimators attain them, the work supplies a rigorous efficiency theory for using abundant unlabeled covariates in causal estimation, which is practically relevant when labeled outcomes are expensive. The explicit comparison of one-sample versus two-sample settings and the demonstration of variance reduction constitute a clear theoretical contribution.

major comments (3)

[§4.1, Theorem 1] §4.1, Theorem 1: the efficiency bound for the one-sample setting is stated to be strictly smaller when auxiliary covariates are included, yet the proof sketch does not explicitly verify that the additional covariates enter the efficient influence function in a way that reduces the variance term without introducing new bias; a direct comparison of the two influence functions (with and without auxiliaries) is needed to confirm the reduction is not an artifact of the censoring mechanism.
[§5.2, Eq. (18)] §5.2, Eq. (18): the claim that the proposed estimator attains the efficiency bound relies on nuisance estimators (propensity and outcome regression) trained on the pooled labeled+unlabeled sample satisfying product-rate conditions o_p(n^{-1/2}). The manuscript provides no entropy or Donsker-class arguments for the function classes that now include the auxiliary covariates, leaving open whether the semi-supervised nuisance rates are sufficient for asymptotic efficiency.
[§6] §6, simulation design: the reported variance reduction is shown only for correctly specified parametric nuisances; it is unclear whether the same reduction persists under nonparametric nuisance estimation with the auxiliary covariates, which is the regime where the efficiency-bound claim is most relevant.

minor comments (2)

Notation for the auxiliary covariate vector is introduced inconsistently between the one-sample and two-sample sections; a single global definition would improve readability.
The abstract states that the estimators are 'efficient' but the main text should explicitly reference the theorem number that establishes asymptotic normality and efficiency.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment point by point below, indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§4.1, Theorem 1] §4.1, Theorem 1: the efficiency bound for the one-sample setting is stated to be strictly smaller when auxiliary covariates are included, yet the proof sketch does not explicitly verify that the additional covariates enter the efficient influence function in a way that reduces the variance term without introducing new bias; a direct comparison of the two influence functions (with and without auxiliaries) is needed to confirm the reduction is not an artifact of the censoring mechanism.

Authors: We agree that an explicit side-by-side comparison of the efficient influence functions would clarify the source of the variance reduction. In the revised manuscript we will insert a direct comparison of the EIFs (with and without auxiliary covariates) for the one-sample setting. This comparison will show that the auxiliary covariates enter only through an additional variance-reduction term in the EIF while leaving the bias term unchanged, confirming that the efficiency gain is not an artifact of the censoring mechanism. revision: yes
Referee: [§5.2, Eq. (18)] §5.2, Eq. (18): the claim that the proposed estimator attains the efficiency bound relies on nuisance estimators (propensity and outcome regression) trained on the pooled labeled+unlabeled sample satisfying product-rate conditions o_p(n^{-1/2}). The manuscript provides no entropy or Donsker-class arguments for the function classes that now include the auxiliary covariates, leaving open whether the semi-supervised nuisance rates are sufficient for asymptotic efficiency.

Authors: The referee correctly identifies that the current text assumes the product-rate conditions without supplying supporting entropy or Donsker arguments for the enlarged function classes. We will revise Section 5.2 to include explicit entropy-integral bounds (or Donsker-class assumptions) that cover the semi-supervised nuisance estimators trained on the pooled sample, thereby rigorously justifying that the required o_p(n^{-1/2}) rates are attainable. revision: yes
Referee: [§6] §6, simulation design: the reported variance reduction is shown only for correctly specified parametric nuisances; it is unclear whether the same reduction persists under nonparametric nuisance estimation with the auxiliary covariates, which is the regime where the efficiency-bound claim is most relevant.

Authors: We acknowledge that the present simulations are limited to correctly specified parametric nuisances. To address this gap we will expand the simulation study to include nonparametric nuisance estimators (e.g., random forests and neural networks) trained on the pooled labeled-plus-unlabeled data. The new experiments will report the realized variance reduction under these nonparametric regimes, directly supporting the efficiency-bound claims. revision: yes

Circularity Check

0 steps flagged

No circularity: efficiency bounds derived independently via semiparametric theory

full rationale

The paper derives efficiency bounds for treatment effect estimation in one-sample (censoring) and two-sample settings by incorporating auxiliary covariates into the data-generating process, then constructs estimators whose asymptotic variance matches the bound under standard regularity conditions. This follows conventional influence-function and semiparametric efficiency arguments without reducing to self-definition, fitted parameters renamed as predictions, or load-bearing self-citations. The claim that auxiliary covariates lower the bound is a direct consequence of the expanded model class rather than an input-output equivalence by construction. The derivations remain self-contained against external benchmarks in semiparametric statistics.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the claims rest on unspecified standard statistical regularity conditions for efficiency bounds.

pith-pipeline@v0.9.0 · 5481 in / 1058 out tokens · 34769 ms · 2026-05-17T23:44:32.786338+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop efficiency bounds and efficient estimators whose asymptotic variance aligns with the efficiency bound... using generalized Riesz regression... Neyman orthogonal scores
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

efficiency bound... V^OS := E[ψ_OS(...)^2] ... asymptotic normality √n(τ̂ - τ0) → N(0, V^OS)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

[1]

Brown, Michael Sklar, Richard Berk, Andreas Buja, and Linda Zhao

David Azriel, Lawrence D. Brown, Michael Sklar, Richard Berk, Andreas Buja, and Linda Zhao. Semi-supervised linear regression. Journal of the American Statistical Association, 117 0 (540): 0 2238--2251, 2022

work page 2022
[2]

Heejung Bang and James M. Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61 0 (4): 0 962--973, 2005

work page 2005
[3]

Augmented balancing weights as linear regression

David Bruns-Smith, Oliver Dukes, Avi Feller, and Elizabeth L Ogburn. Augmented balancing weights as linear regression. Journal of the Royal Statistical Society Series B: Statistical Methodology, 04 2025

work page 2025
[4]

Semi-Supervised Learning

Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. Semi-Supervised Learning. MIT Press, 2006

work page 2006
[5]

Double/debiased machine learning for treatment and structural parameters

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 2018

work page 2018
[6]

arXiv preprint arXiv:2104.14737 , year=

Victor Chernozhukov, Whitney K. Newey, Victor Quintas-Martinez, and Vasilis Syrgkanis. Automatic debiased machine learning via riesz regression, 2021. a rXiv:2104.14737

work page arXiv 2021
[7]

R iesz N et and F orest R iesz: Automatic debiased machine learning with neural nets and random forests

Victor Chernozhukov, Whitney Newey, V\' ctor M Quintas-Mart\' nez, and Vasilis Syrgkanis. R iesz N et and F orest R iesz: Automatic debiased machine learning with neural nets and random forests. In International Conference on Machine Learning (ICML), 2022 a

work page 2022
[8]

Newey, and Rahul Singh

Victor Chernozhukov, Whitney K. Newey, and Rahul Singh. Automatic debiased machine learning of causal and structural effects. Econometrica, 90 0 (3): 0 967--1027, 2022 b

work page 2022
[9]

Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms

Alicia Curth and Mihaela van der Schaar. Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), 2021

work page 2021
[10]

Niu, and Masashi Sugiyama

Marthinus Christoffel du Plessis, Gang. Niu, and Masashi Sugiyama. Convex formulation for learning from positive and unlabeled data. In International Conference on Machine Learning (ICML), 2015

work page 2015
[11]

Learning classifiers from only positive and unlabeled data

Charles Elkan and Keith Noto. Learning classifiers from only positive and unlabeled data. In International Conference on Knowledge Discovery and Data Mining (KDD), 2008

work page 2008
[12]

Hirshberg, Ruohan Zhan, Stefan Wager, and Susan Athey

Vitor Hadad, David A. Hirshberg, Ruohan Zhan, Stefan Wager, and Susan Athey. Confidence intervals for policy evaluation in adaptive experiments. Proceedings of the National Academy of Sciences (PNAS), 118 0 (15), 2021

work page 2021
[13]

On the role of the propensity score in efficient semiparametric estimation of average treatment effects

Jinyong Hahn. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica, 66 0 (2): 0 315--331, 1998

work page 1998
[14]

Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies

Jens Hainmueller. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, 20 0 (1): 0 25--46, 2012

work page 2012
[15]

Shadow prices, market wages, and labor supply

James Heckman. Shadow prices, market wages, and labor supply. Econometrica, 42 0 (4): 0 679--694, 1974

work page 1974
[16]

Horvitz and Donovan J

Daniel G. Horvitz and Donovan J. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47 0 (260): 0 663--685, 1952

work page 1952
[17]

Jiayuan Huang, Arthur Gretton, Karsten Borgwardt, Bernhard Sch \"o lkopf, and Alex J. Smola. Correcting sample selection bias by unlabeled data. In NeurIPS, pp.\ 601--608. MIT Press, 2007

work page 2007
[18]

Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get-out-the-vote campaign

Kosuke Imai and Aaron Strauss. Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get-out-the-vote campaign. Political Analysis, 19 0 (1): 0 1--19, 2011

work page 2011
[19]

Imbens and Tony Lancaster

Guido W. Imbens and Tony Lancaster. Efficient estimation and stratified sampling. Journal of Econometrics, 74 0 (2): 0 289--318, 1996

work page 1996
[20]

Imbens and Donald B

Guido W. Imbens and Donald B. Rubin. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, 2015

work page 2015
[21]

A least-squares approach to direct importance estimation

Takafumi Kanamori, Shohei Hido, and Masashi Sugiyama. A least-squares approach to direct importance estimation. Journal of Machine Learning Research, 10 0 (Jul.): 0 1391--1445, 2009

work page 2009
[22]

Direct bias-correction term estimation for propensity scores and average treatment effect estimation, 2025 a

Masahiro Kato. Direct bias-correction term estimation for propensity scores and average treatment effect estimation, 2025 a . a rXiv: 2509.22122

work page arXiv 2025
[23]

Direct debiased machine learning via bregman divergence minimization, 2025 b

Masahiro Kato. Direct debiased machine learning via bregman divergence minimization, 2025 b . a Xiv: 2510.23534

work page arXiv 2025
[24]

Nearest neighbor matching as least squares density ratio estimation and riesz regression, 2025 c

Masahiro Kato. Nearest neighbor matching as least squares density ratio estimation and riesz regression, 2025 c . a rXiv: 2510.24433

work page arXiv 2025
[25]

A unified theory for causal inference: Direct debiased machine learning via bregman-riesz regression, 2025 d

Masahiro Kato. A unified theory for causal inference: Direct debiased machine learning via bregman-riesz regression, 2025 d

work page 2025
[26]

Non-negative bregman divergence minimization for deep direct density ratio estimation

Masahiro Kato and Takeshi Teshima. Non-negative bregman divergence minimization for deep direct density ratio estimation. In International Conference on Machine Learning (ICML), 2021

work page 2021
[27]

Active adaptive experimental design for treatment effect estimation with covariate choice

Masahiro Kato, Akihiro Oga, Wataru Komatsubara, and Ryo Inokuchi. Active adaptive experimental design for treatment effect estimation with covariate choice. In International Conference on Machine Learning (ICML), 2024

work page 2024
[28]

Puate: Semiparametric efficient average treatment effect estimation from treated (positive) and unlabeled units, 2025

Masahiro Kato, Fumiaki Kozai, and Ryo Inokuchi. Puate: Semiparametric efficient average treatment effect estimation from treated (positive) and unlabeled units, 2025. a rXiv:2501.19345

work page arXiv 2025
[29]

Semi-supervised learning with density-ratio estimation

Masanori Kawakita and Takafumi Kanamori. Semi-supervised learning with density-ratio estimation. Machine Learning, 91 0 (2): 0 189--209, 2013

work page 2013
[30]

Edward H. Kennedy. Efficient nonparametric causal inference with missing exposure information. The International Journal of Biostatistics, 16 0 (1), 2020

work page 2020
[31]

Kennedy, Sivaraman Balakrishnan, James M

Edward H. Kennedy, Sivaraman Balakrishnan, James M. Robins, and Larry Wasserman. Minimax rates for heterogeneous causal effect estimation. The Annals of Statistics, 52 0 (2): 0 793 -- 816, 2024

work page 2024
[32]

Positive-unlabeled learning with non-negative risk estimator

Ryuichi Kiryo, Gang Niu, Marthinus Christoffel du Plessis, and Masashi Sugiyama. Positive-unlabeled learning with non-negative risk estimator. In Advances in Neural Information Processing Systems (NeurIPS), 2017

work page 2017
[33]

Chris A. J. Klaassen. Consistent estimation of the influence function of locally asymptotically linear estimators. Annals of Statistics, 15, 1987

work page 1987
[34]

Lee and Alejandro Schuler

Kaitlyn J. Lee and Alejandro Schuler. Rieszboost: Gradient boosting for riesz regression, 2025. a rXiv: 2501.04871

work page arXiv 2025
[35]

Estimation based on nearest neighbor matching: from density ratio to average treatment effect

Zhexiao Lin, Peng Ding, and Fang Han. Estimation based on nearest neighbor matching: from density ratio to average treatment effect. Econometrica, 91 0 (6): 0 2187--2217, 2023

work page 2023
[36]

Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes

Jerzy Neyman. Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes. Statistical Science, 5: 0 463--472, 1923

work page 1923
[37]

Theoretical comparisons of positive-unlabeled learning against positive-negative learning

Gang Niu, Marthinus Christoffel du Plessis, Tomoya Sakai, Yao Ma, and Masashi Sugiyama. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In Advances in Neural Information Processing Systems (NeurIPS), 2016

work page 2016
[38]

Benjamin Rhodes, Kai Xu, and Michael U. Gutmann. Telescoping density-ratio estimation. In Advances in Neural Information Processing Systems (NeurIPS), 2020

work page 2020
[39]

J. M. Robins, A. Rotnitzky, and L. P. Zhao. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89: 0 846--866, 1994

work page 1994
[40]

Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66: 0 688--701, 1974

work page 1974
[41]

Nonparametric regression using deep neural networks with ReLU activation function

Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with ReLU activation function. Annals of Statistics, 48 0 (4): 0 1875--1897, 2020

work page 2020
[42]

Density ratio matching under the bregman divergence: A unified framework of density ratio estimation

Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori. Density ratio matching under the bregman divergence: A unified framework of density ratio estimation. Annals of the Institute of Statistical Mathematics, 64, 10 2011

work page 2011
[43]

Density Ratio Estimation in Machine Learning

Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori. Density Ratio Estimation in Machine Learning. Cambridge University Press, 2012

work page 2012
[44]

Tsybakov

Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer Publishing Company, Incorporated, 1st edition, 2008

work page 2008
[45]

Off-policy evaluation and learning for external validity under a covariate shift

Masatoshi Uehara, Masahiro Kato, and Shota Yasui. Off-policy evaluation and learning for external validity under a covariate shift. In Conference on Neural Information Processing Systems (NeurIPS), 2020

work page 2020
[46]

van der Laan and S

M.J. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer New York, 2011

work page 2011
[47]

van der Vaart

Aad W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998

work page 1998
[48]

Estimation and inference of heterogeneous treatment effects using random forests

Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113 0 (523): 0 1228--1242, 2018

work page 2018
[49]

Wooldridge

Jeffrey M. Wooldridge. Asymptotic properties of weighted m-estimation for standard stratified samples. Econometric Theory, 2001

work page 2001
[50]

Relative density-ratio estimation for robust distribution comparison

Makoto Yamada, Taiji Suzuki, Takafumi Kanamori, Hirotaka Hachiya, and Masashi Sugiyama. Relative density-ratio estimation for robust distribution comparison. In Advances in Neural Information Processing Systems (NeurIPS), volume 24. Curran Associates, Inc., 2011

work page 2011
[51]

Policy learning with adaptively collected data

Ruohan Zhan, Zhimei Ren, Susan Athey, and Zhengyuan Zhou. Policy learning with adaptively collected data. Management Science, 70 0 (8): 0 5270--5297, 2024

work page 2024
[52]

Covariate balancing propensity score by tailored loss functions

Qingyuan Zhao. Covariate balancing propensity score by tailored loss functions. The Annals of Statistics, 47 0 (2): 0 965 -- 993, 2019

work page 2019
[53]

Zubizarreta

Jos \'e R. Zubizarreta. Stable weights that balance covariates for estimation with incomplete outcome data. Journal of the American Statistical Association, 110 0 (511): 0 910--922, 2015

work page 2015
[54]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[1] [1]

Brown, Michael Sklar, Richard Berk, Andreas Buja, and Linda Zhao

David Azriel, Lawrence D. Brown, Michael Sklar, Richard Berk, Andreas Buja, and Linda Zhao. Semi-supervised linear regression. Journal of the American Statistical Association, 117 0 (540): 0 2238--2251, 2022

work page 2022

[2] [2]

Heejung Bang and James M. Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61 0 (4): 0 962--973, 2005

work page 2005

[3] [3]

Augmented balancing weights as linear regression

David Bruns-Smith, Oliver Dukes, Avi Feller, and Elizabeth L Ogburn. Augmented balancing weights as linear regression. Journal of the Royal Statistical Society Series B: Statistical Methodology, 04 2025

work page 2025

[4] [4]

Semi-Supervised Learning

Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. Semi-Supervised Learning. MIT Press, 2006

work page 2006

[5] [5]

Double/debiased machine learning for treatment and structural parameters

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 2018

work page 2018

[6] [6]

arXiv preprint arXiv:2104.14737 , year=

Victor Chernozhukov, Whitney K. Newey, Victor Quintas-Martinez, and Vasilis Syrgkanis. Automatic debiased machine learning via riesz regression, 2021. a rXiv:2104.14737

work page arXiv 2021

[7] [7]

R iesz N et and F orest R iesz: Automatic debiased machine learning with neural nets and random forests

Victor Chernozhukov, Whitney Newey, V\' ctor M Quintas-Mart\' nez, and Vasilis Syrgkanis. R iesz N et and F orest R iesz: Automatic debiased machine learning with neural nets and random forests. In International Conference on Machine Learning (ICML), 2022 a

work page 2022

[8] [8]

Newey, and Rahul Singh

Victor Chernozhukov, Whitney K. Newey, and Rahul Singh. Automatic debiased machine learning of causal and structural effects. Econometrica, 90 0 (3): 0 967--1027, 2022 b

work page 2022

[9] [9]

Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms

Alicia Curth and Mihaela van der Schaar. Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), 2021

work page 2021

[10] [10]

Niu, and Masashi Sugiyama

Marthinus Christoffel du Plessis, Gang. Niu, and Masashi Sugiyama. Convex formulation for learning from positive and unlabeled data. In International Conference on Machine Learning (ICML), 2015

work page 2015

[11] [11]

Learning classifiers from only positive and unlabeled data

Charles Elkan and Keith Noto. Learning classifiers from only positive and unlabeled data. In International Conference on Knowledge Discovery and Data Mining (KDD), 2008

work page 2008

[12] [12]

Hirshberg, Ruohan Zhan, Stefan Wager, and Susan Athey

Vitor Hadad, David A. Hirshberg, Ruohan Zhan, Stefan Wager, and Susan Athey. Confidence intervals for policy evaluation in adaptive experiments. Proceedings of the National Academy of Sciences (PNAS), 118 0 (15), 2021

work page 2021

[13] [13]

On the role of the propensity score in efficient semiparametric estimation of average treatment effects

Jinyong Hahn. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica, 66 0 (2): 0 315--331, 1998

work page 1998

[14] [14]

Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies

Jens Hainmueller. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, 20 0 (1): 0 25--46, 2012

work page 2012

[15] [15]

Shadow prices, market wages, and labor supply

James Heckman. Shadow prices, market wages, and labor supply. Econometrica, 42 0 (4): 0 679--694, 1974

work page 1974

[16] [16]

Horvitz and Donovan J

Daniel G. Horvitz and Donovan J. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47 0 (260): 0 663--685, 1952

work page 1952

[17] [17]

Jiayuan Huang, Arthur Gretton, Karsten Borgwardt, Bernhard Sch \"o lkopf, and Alex J. Smola. Correcting sample selection bias by unlabeled data. In NeurIPS, pp.\ 601--608. MIT Press, 2007

work page 2007

[18] [18]

Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get-out-the-vote campaign

Kosuke Imai and Aaron Strauss. Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get-out-the-vote campaign. Political Analysis, 19 0 (1): 0 1--19, 2011

work page 2011

[19] [19]

Imbens and Tony Lancaster

Guido W. Imbens and Tony Lancaster. Efficient estimation and stratified sampling. Journal of Econometrics, 74 0 (2): 0 289--318, 1996

work page 1996

[20] [20]

Imbens and Donald B

Guido W. Imbens and Donald B. Rubin. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, 2015

work page 2015

[21] [21]

A least-squares approach to direct importance estimation

Takafumi Kanamori, Shohei Hido, and Masashi Sugiyama. A least-squares approach to direct importance estimation. Journal of Machine Learning Research, 10 0 (Jul.): 0 1391--1445, 2009

work page 2009

[22] [22]

Direct bias-correction term estimation for propensity scores and average treatment effect estimation, 2025 a

Masahiro Kato. Direct bias-correction term estimation for propensity scores and average treatment effect estimation, 2025 a . a rXiv: 2509.22122

work page arXiv 2025

[23] [23]

Direct debiased machine learning via bregman divergence minimization, 2025 b

Masahiro Kato. Direct debiased machine learning via bregman divergence minimization, 2025 b . a Xiv: 2510.23534

work page arXiv 2025

[24] [24]

Nearest neighbor matching as least squares density ratio estimation and riesz regression, 2025 c

Masahiro Kato. Nearest neighbor matching as least squares density ratio estimation and riesz regression, 2025 c . a rXiv: 2510.24433

work page arXiv 2025

[25] [25]

A unified theory for causal inference: Direct debiased machine learning via bregman-riesz regression, 2025 d

Masahiro Kato. A unified theory for causal inference: Direct debiased machine learning via bregman-riesz regression, 2025 d

work page 2025

[26] [26]

Non-negative bregman divergence minimization for deep direct density ratio estimation

Masahiro Kato and Takeshi Teshima. Non-negative bregman divergence minimization for deep direct density ratio estimation. In International Conference on Machine Learning (ICML), 2021

work page 2021

[27] [27]

Active adaptive experimental design for treatment effect estimation with covariate choice

Masahiro Kato, Akihiro Oga, Wataru Komatsubara, and Ryo Inokuchi. Active adaptive experimental design for treatment effect estimation with covariate choice. In International Conference on Machine Learning (ICML), 2024

work page 2024

[28] [28]

Puate: Semiparametric efficient average treatment effect estimation from treated (positive) and unlabeled units, 2025

Masahiro Kato, Fumiaki Kozai, and Ryo Inokuchi. Puate: Semiparametric efficient average treatment effect estimation from treated (positive) and unlabeled units, 2025. a rXiv:2501.19345

work page arXiv 2025

[29] [29]

Semi-supervised learning with density-ratio estimation

Masanori Kawakita and Takafumi Kanamori. Semi-supervised learning with density-ratio estimation. Machine Learning, 91 0 (2): 0 189--209, 2013

work page 2013

[30] [30]

Edward H. Kennedy. Efficient nonparametric causal inference with missing exposure information. The International Journal of Biostatistics, 16 0 (1), 2020

work page 2020

[31] [31]

Kennedy, Sivaraman Balakrishnan, James M

Edward H. Kennedy, Sivaraman Balakrishnan, James M. Robins, and Larry Wasserman. Minimax rates for heterogeneous causal effect estimation. The Annals of Statistics, 52 0 (2): 0 793 -- 816, 2024

work page 2024

[32] [32]

Positive-unlabeled learning with non-negative risk estimator

Ryuichi Kiryo, Gang Niu, Marthinus Christoffel du Plessis, and Masashi Sugiyama. Positive-unlabeled learning with non-negative risk estimator. In Advances in Neural Information Processing Systems (NeurIPS), 2017

work page 2017

[33] [33]

Chris A. J. Klaassen. Consistent estimation of the influence function of locally asymptotically linear estimators. Annals of Statistics, 15, 1987

work page 1987

[34] [34]

Lee and Alejandro Schuler

Kaitlyn J. Lee and Alejandro Schuler. Rieszboost: Gradient boosting for riesz regression, 2025. a rXiv: 2501.04871

work page arXiv 2025

[35] [35]

Estimation based on nearest neighbor matching: from density ratio to average treatment effect

Zhexiao Lin, Peng Ding, and Fang Han. Estimation based on nearest neighbor matching: from density ratio to average treatment effect. Econometrica, 91 0 (6): 0 2187--2217, 2023

work page 2023

[36] [36]

Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes

Jerzy Neyman. Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes. Statistical Science, 5: 0 463--472, 1923

work page 1923

[37] [37]

Theoretical comparisons of positive-unlabeled learning against positive-negative learning

Gang Niu, Marthinus Christoffel du Plessis, Tomoya Sakai, Yao Ma, and Masashi Sugiyama. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In Advances in Neural Information Processing Systems (NeurIPS), 2016

work page 2016

[38] [38]

Benjamin Rhodes, Kai Xu, and Michael U. Gutmann. Telescoping density-ratio estimation. In Advances in Neural Information Processing Systems (NeurIPS), 2020

work page 2020

[39] [39]

J. M. Robins, A. Rotnitzky, and L. P. Zhao. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89: 0 846--866, 1994

work page 1994

[40] [40]

Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66: 0 688--701, 1974

work page 1974

[41] [41]

Nonparametric regression using deep neural networks with ReLU activation function

Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with ReLU activation function. Annals of Statistics, 48 0 (4): 0 1875--1897, 2020

work page 2020

[42] [42]

Density ratio matching under the bregman divergence: A unified framework of density ratio estimation

Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori. Density ratio matching under the bregman divergence: A unified framework of density ratio estimation. Annals of the Institute of Statistical Mathematics, 64, 10 2011

work page 2011

[43] [43]

Density Ratio Estimation in Machine Learning

Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori. Density Ratio Estimation in Machine Learning. Cambridge University Press, 2012

work page 2012

[44] [44]

Tsybakov

Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer Publishing Company, Incorporated, 1st edition, 2008

work page 2008

[45] [45]

Off-policy evaluation and learning for external validity under a covariate shift

Masatoshi Uehara, Masahiro Kato, and Shota Yasui. Off-policy evaluation and learning for external validity under a covariate shift. In Conference on Neural Information Processing Systems (NeurIPS), 2020

work page 2020

[46] [46]

van der Laan and S

M.J. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer New York, 2011

work page 2011

[47] [47]

van der Vaart

Aad W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998

work page 1998

[48] [48]

Estimation and inference of heterogeneous treatment effects using random forests

Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113 0 (523): 0 1228--1242, 2018

work page 2018

[49] [49]

Wooldridge

Jeffrey M. Wooldridge. Asymptotic properties of weighted m-estimation for standard stratified samples. Econometric Theory, 2001

work page 2001

[50] [50]

Relative density-ratio estimation for robust distribution comparison

Makoto Yamada, Taiji Suzuki, Takafumi Kanamori, Hirotaka Hachiya, and Masashi Sugiyama. Relative density-ratio estimation for robust distribution comparison. In Advances in Neural Information Processing Systems (NeurIPS), volume 24. Curran Associates, Inc., 2011

work page 2011

[51] [51]

Policy learning with adaptively collected data

Ruohan Zhan, Zhimei Ren, Susan Athey, and Zhengyuan Zhou. Policy learning with adaptively collected data. Management Science, 70 0 (8): 0 5270--5297, 2024

work page 2024

[52] [52]

Covariate balancing propensity score by tailored loss functions

Qingyuan Zhao. Covariate balancing propensity score by tailored loss functions. The Annals of Statistics, 47 0 (2): 0 965 -- 993, 2019

work page 2019

[53] [53]

Zubizarreta

Jos \'e R. Zubizarreta. Stable weights that balance covariates for estimation with incomplete outcome data. Journal of the American Statistical Association, 110 0 (511): 0 910--922, 2015

work page 2015

[54] [54]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page