Transporting treatment effects by calibrating large-scale observational outcomes

Harrison H Li

arxiv: 2605.07285 · v2 · pith:R6WWXKTUnew · submitted 2026-05-08 · 📊 stat.ME

Transporting treatment effects by calibrating large-scale observational outcomes

Harrison H Li This is my paper

Pith reviewed 2026-05-20 23:21 UTC · model grok-4.3

classification 📊 stat.ME

keywords transported treatment effectobservational calibrationOLS adjustmentcausal inferencesemiparametric efficiencyaverage treatment effectcrop rotation

0 comments

The pith

Calibrating a small experimental contrast onto large observational data produces a valid weighted transported average treatment effect even if the calibration model is wrong.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a two-step procedure: first regress the observational treatment-control contrast onto the experimental contrast using ordinary least squares, then average the resulting estimated conditional average treatment effect over the observational sample. The limiting value of this estimator is a weighted transported average treatment effect, and the accompanying inference is asymptotically valid and semiparametrically efficient whenever the experimental sample grows slower than the observational sample. These properties hold without requiring overlap between the two datasets and without correct specification of the linear calibration model. The approach therefore lets researchers combine a modest number of high-quality experimental measurements with abundant but possibly biased observational records to recover a well-defined causal quantity at the scale of the observational population.

Core claim

The central claim is that the OLS calibration step produces a limiting estimand equal to a weighted transported average treatment effect, and that inference for this estimand is asymptotically valid and semiparametrically efficient when the experimental dataset grows more slowly than the observational dataset, regardless of positivity or correct specification of the OLS model.

What carries the argument

OLS calibration of the observational treatment-control contrast to the experimental contrast, which maps the large-sample estimator to a weighted transported average treatment effect even under misspecification.

If this is right

The estimator targets a well-defined transported effect without needing common support between the experimental and observational populations.
Asymptotic validity and semiparametric efficiency hold under the stated sample-size ordering even when the calibration model is misspecified.
The procedure can be applied directly to combine field-experiment data with satellite-based outcome measurements over large geographic regions.
Inference remains reliable when the experimental sample is the smaller of the two data sources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same calibration logic might be applied with other adjustment methods, such as nonparametric regression or machine-learning models, in place of OLS.
Extensions could transport effects across time periods or geographic regions when experimental data are available only in limited settings.
The method suggests a general template for using small high-quality experiments to anchor inferences drawn from much larger observational sources in policy evaluation.

Load-bearing premise

The experimental dataset supplies an unbiased estimate of the treatment-control contrast that serves as the calibration target.

What would settle it

A simulation in which the OLS calibration is deliberately misspecified yet the estimator converges to a quantity other than the claimed weighted transported average treatment effect would falsify the central result.

Figures

Figures reproduced from arXiv: 2605.07285 by Harrison H Li.

**Figure 2.** Figure 2: The height of each bar corresponds to the estimated mean squared error (MSE) of [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: For each value of θ studied in the simulations from Section 5.1, a plot of the weight function w(·) in (4) with γ given by (22). and observational propensity score Probs(Z = 1 | X = x) = Φ 2x2 − x1 5 , where expit(x) = exp(x)(1 + exp(x))−1 and Φ(·) is the cumulative distribution function corresponding to the standard normal distribution. We set µ(x) = 0.5 + 0.5∆(x) + η(x1 + 1)(x2 + 1) and have i.i.d. n… view at source ↗

**Figure 4.** Figure 4: Same as Fig. 2, but for the multivariate covariate simulations in Section 5.2 [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Histograms of the three estimators across the 100 simulations from the crop rotation [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

read the original abstract

A high-quality experimental dataset is often much smaller than a corresponding observational dataset. When this holds with possibly biased measurements of the outcome of interest in the latter, we propose an estimation and inference procedure for a transported treatment effect. Our point estimator can be computed as follows. First, we estimate the conditional average treatment effect (CATE) by calibrating a treatment-control contrast estimated using the observational outcomes to the experimental dataset using ordinary least squares (OLS). Then, we compute the sample average of this estimated CATE over the observational dataset. We show that the limiting estimand is a weighted transported average treatment effect even when the OLS calibration is misspecified. Furthermore, our inference for this estimand is asymptotically valid and semiparametrically efficient when the size of the experimental dataset grows more slowly than the size of the observational dataset, regardless of the existence of positivity (overlap) between the two datasets. We illustrate the stable empirical performance of our method under varying degrees of positivity using numerical simulations and a data example using field experiments and satellite-based yield estimates to estimate the average effect of crop rotation on maize (corn) yields over a large area of the Midwestern United States.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a calibration trick that lets you transport effects from a small experiment to a big observational sample even without overlap, and claims the OLS step still targets a weighted ATE under misspecification.

read the letter

The core move is to fit an OLS calibration that lines up the observational treatment-control contrast with the experimental one, then average the resulting CATE over the observational sample. The claim is that this limiting target remains a well-defined weighted transported ATE even if the linear calibration is wrong, and that you still get asymptotic validity and semiparametric efficiency as long as the experiment grows slower than the observational data, without needing positivity between the two sources. That combination is the main novelty relative to standard transport or data-fusion results, which usually lean on overlap or correct specification for identification and efficiency. The simulations that vary the degree of positivity and the Midwestern crop-rotation example with satellite yields show the procedure stays stable in practice, which is useful for applied settings where randomization is expensive but secondary data are plentiful. The weakest spot is the no-positivity result. When covariate supports are disjoint the OLS step has to extrapolate, and it is not obvious from the abstract alone whether the extrapolated coefficients still deliver an identifiable causal quantity from the experimental contrast. The paper asserts the limiting estimand stays valid anyway, but the argument would need to be checked carefully for hidden regularity conditions that rule out the worst extrapolation cases. If those conditions are mild or explicitly stated, the result holds up; if they are strong, the practical scope narrows. Overall this is aimed at applied causal-inference users who routinely face small RCTs paired with large but biased observational records. Readers working on transportability or efficient estimation with auxiliary data will get something concrete to try. It is worth sending to referees because the claims are sharp, the application is real, and the potential payoff for practice is clear even if the no-positivity part needs tightening.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a procedure to estimate a transported treatment effect by first estimating a treatment-control contrast from large-scale observational data (possibly with biased outcomes), calibrating this contrast to a smaller experimental dataset via OLS, and then averaging the resulting estimated CATE over the observational sample. The central claims are that the limiting estimand equals a weighted transported average treatment effect even under OLS misspecification, and that inference for this estimand is asymptotically valid and semiparametrically efficient when the experimental sample size grows slower than the observational sample size, without requiring positivity or overlap between the two datasets. The approach is illustrated via simulations varying positivity levels and an empirical example using field experiments and satellite-based yield data to study crop rotation effects on maize yields.

Significance. If the asymptotic results hold, the method would offer a practical way to leverage abundant observational data for transporting effects from limited experimental studies, particularly useful in domains like agriculture and policy evaluation where covariate supports often fail to overlap. The robustness to calibration misspecification and the efficiency claim under n_exp = o(n_obs) are potentially valuable contributions, as is the explicit handling of biased observational outcomes via calibration.

major comments (3)

[Abstract and §3] Abstract and §3 (limiting estimand derivation): the claim that the limiting estimand remains a well-defined weighted transported ATE under OLS misspecification and disjoint covariate supports requires an explicit argument showing that the extrapolated OLS projection preserves identifiability from the experimental contrast alone; without this, consistency of the point estimator is not guaranteed when supports are disjoint.
[§4] Theorem on asymptotic normality (likely §4): the semiparametric efficiency and validity result when n_exp grows slower than n_obs appears to treat the calibration coefficients as fixed in the limiting argument, but under disjoint supports the OLS fit necessarily extrapolates; this needs a separate verification that the influence function remains valid and that the efficiency bound is attained without additional overlap conditions.
[§5] Simulation design in §5: while varying degrees of positivity are considered, the reported coverage and bias results do not include a fully disjoint-support case; adding this would directly test whether the claimed asymptotic validity survives the extrapolation required by the calibration step.

minor comments (2)

[§2] The weighting function implicit in the transported ATE should be defined explicitly (perhaps in §2) so readers can see how it arises from the OLS calibration coefficients.
[Notation] Notation for the observational contrast estimator and the calibration target could be unified across the abstract and main text to avoid minor ambiguity in the two-step procedure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments, which have prompted us to clarify key aspects of the theoretical results and strengthen the empirical section. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (limiting estimand derivation): the claim that the limiting estimand remains a well-defined weighted transported ATE under OLS misspecification and disjoint covariate supports requires an explicit argument showing that the extrapolated OLS projection preserves identifiability from the experimental contrast alone; without this, consistency of the point estimator is not guaranteed when supports are disjoint.

Authors: We agree that an explicit derivation would improve clarity. In the revised manuscript we will expand §3 with a step-by-step argument showing that the population OLS coefficients are identified solely by matching the experimental contrast; the resulting projection, when averaged over the observational distribution, yields a well-defined weighted transported ATE even under misspecification. Because the weighting measure is supplied by the observational sample and the contrast is supplied by the experiment, identifiability holds without overlap or correct specification. We will insert this derivation immediately after the current limiting-estimand statement. revision: yes
Referee: [§4] Theorem on asymptotic normality (likely §4): the semiparametric efficiency and validity result when n_exp grows slower than n_obs appears to treat the calibration coefficients as fixed in the limiting argument, but under disjoint supports the OLS fit necessarily extrapolates; this needs a separate verification that the influence function remains valid and that the efficiency bound is attained without additional overlap conditions.

Authors: We thank the referee for highlighting this point. Under the regime n_exp = o(n_obs) the calibration coefficients converge to a fixed limit at a rate that is asymptotically negligible relative to the √n_obs averaging step; the influence function we derive already incorporates this limit. To make the argument fully transparent under disjoint supports, we will add a remark in §4 that explicitly verifies the influence function continues to hold when the OLS projection extrapolates, without invoking overlap. This verification confirms that the semiparametric efficiency bound for the weighted transported effect is attained under the stated conditions alone. revision: partial
Referee: [§5] Simulation design in §5: while varying degrees of positivity are considered, the reported coverage and bias results do not include a fully disjoint-support case; adding this would directly test whether the claimed asymptotic validity survives the extrapolation required by the calibration step.

Authors: We concur that a fully disjoint-support simulation would provide a direct and informative check. In the revised §5 we will add a new simulation setting in which the covariate supports of the experimental and observational samples have empty intersection. We will report bias, root-mean-squared error, and coverage probabilities for this case alongside the existing positivity-variation results, thereby demonstrating that asymptotic validity is preserved under the extrapolation required by calibration. revision: yes

Circularity Check

0 steps flagged

No circularity: limiting estimand derived via independent asymptotic analysis

full rationale

The procedure first calibrates an observational contrast to experimental data via OLS and then averages the resulting CATE over the observational sample. The paper then derives (rather than defines) that the probability limit of this estimator equals a weighted transported ATE, even under OLS misspecification. This equality is obtained through limiting arguments whose validity is shown separately from the fitted coefficients themselves. Asymptotic validity and semiparametric efficiency when n_exp = o(n_obs) are likewise established by standard empirical-process arguments that do not presuppose the target estimand. No load-bearing self-citation, self-definitional step, or fitted-input-renamed-as-prediction appears in the derivation chain. The analysis therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The procedure rests on standard regularity conditions for OLS and semiparametric efficiency plus the implicit assumption that the experimental contrast is unbiased; no new entities are introduced.

free parameters (1)

OLS calibration coefficients
Fitted by regressing the observational contrast onto the experimental contrast; these are data-dependent and central to the estimator.

axioms (1)

standard math Standard asymptotic regularity conditions for OLS and semiparametric estimators
Invoked to obtain the limiting distribution and efficiency claim when experimental sample size grows slower than observational.

pith-pipeline@v0.9.0 · 5726 in / 1257 out tokens · 34424 ms · 2026-05-20T23:21:53.169217+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

limiting estimand is a weighted transported average treatment effect even when the OLS calibration is misspecified... regardless of the existence of positivity (overlap)
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

¯µ(·) = arg min_f∈F E_rct[(D−f(X))^2]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 1 internal anchor

[1]

2019 , journal=

Semi-supervised inference: General theory and estimation of means , author=. 2019 , journal=

work page 2019
[2]

Biometrika , volume=

High-dimensional semi-supervised learning: in search of optimal inference of the mean , author=. Biometrika , volume=. 2022 , publisher=

work page 2022
[3]

A general

Song, Shanshan and Lin, Yuanyuan and Zhou, Yong , journal=. A general. 2024 , publisher=

work page 2024
[4]

Information and Inference: A Journal of the IMA , volume=

Double robust semi-supervised inference for the mean: selection bias under MAR labeling with decaying overlap , author=. Information and Inference: A Journal of the IMA , volume=. 2023 , publisher=

work page 2023
[5]

Stat , volume=

Solving the missing at random problem in semi-supervised learning: An inverse probability weighting method , author=. Stat , volume=. 2024 , publisher=

work page 2024
[6]

Science , volume=

Prediction-powered inference , author=. Science , volume=. 2023 , publisher=

work page 2023
[7]

Angelopoulos, Anastasios N and Duchi, John C and Zrnic, Tijana , journal=

work page
[8]

Proceedings of the National Academy of Sciences , volume=

Cross-prediction-powered inference , author=. Proceedings of the National Academy of Sciences , volume=. 2024 , publisher=

work page 2024
[9]

A unified framework for semiparametrically efficient semi-supervised learning.arXiv preprint arXiv:2502.17741,

A Unified Framework for Semiparametrically Efficient Semi-Supervised Learning , author=. arXiv preprint arXiv:2502.17741 , year=

work page arXiv
[10]

Handbook of Statistical Methods for Precision Medicine , pages=

Semiparametric doubly robust targeted double machine learning: a review , author=. Handbook of Statistical Methods for Precision Medicine , pages=. 2024 , publisher=

work page 2024
[11]

Annual Review of Statistics and its Application , volume=

A review of generalizability and transportability , author=. Annual Review of Statistics and its Application , volume=. 2023 , publisher=

work page 2023
[12]

Statistical Science , volume=

Causal inference methods for combining randomized trials and observational studies: a review , author=. Statistical Science , volume=. 2024 , publisher=

work page 2024
[13]

Biometrika , volume=

Dealing with limited overlap in estimation of average treatment effects , author=. Biometrika , volume=. 2009 , publisher=

work page 2009
[14]

Biometrika , pages=

Doubly-robust and heteroscedasticity-aware sample trimming for causal inference , author=. Biometrika , pages=. 2024 , publisher=

work page 2024
[15]

American Journal of Epidemiology , volume=

Addressing extreme propensity scores via the overlap weights , author=. American Journal of Epidemiology , volume=. 2019 , publisher=

work page 2019
[16]

The American Statistician , year=

Assumption lean regression , author=. The American Statistician , year=

work page
[17]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Assumption-lean inference for generalised linear model parameters , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2022 , publisher=

work page 2022
[18]

Statistical Science , volume=

Models as approximations I , author=. Statistical Science , volume=. 2019 , publisher=

work page 2019
[19]

Statistical Science , volume=

Models as approximations II , author=. Statistical Science , volume=. 2019 , publisher=

work page 2019
[20]

Automatic debiased machine learning via

Chernozhukov, Victor and Newey, Whitney K and Quintas-Martinez, Victor and Syrgkanis, Vasilis , journal=. Automatic debiased machine learning via

work page
[21]

International Conference on Machine Learning , pages=

Chernozhukov, Victor and Newey, Whitney and Quintas-Mart. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[22]

Lee, Kaitlyn J and Schuler, Alejandro , journal=

work page
[23]

Journal of the American Statistical Association , volume=

Balancing covariates via propensity score weighting , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

work page 2018
[24]

arXiv preprint arXiv:2110.14831 , year=

The balancing act in causal inference , author=. arXiv preprint arXiv:2110.14831 , year=

work page arXiv
[25]

Leveraging population outcomes to improve the generalization of experimental results: Application to the

Huang, Melody and Egami, Naoki and Hartman, Erin and Miratrix, Luke , journal=. Leveraging population outcomes to improve the generalization of experimental results: Application to the. 2023 , publisher=

work page 2023
[26]

Semiparametric semi-supervised learning for general targets under distribution shift and decaying overlap

Semiparametric semi-supervised learning for general targets under distribution shift and decaying overlap , author=. arXiv preprint arXiv:2505.06452 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Statistics in Medicine , volume=

Extending inferences from a randomized trial to a new target population , author=. Statistics in Medicine , volume=. 2020 , publisher=

work page 2020
[28]

Journal of Econometrics , volume=

Overlap in observational studies with high-dimensional covariates , author=. Journal of Econometrics , volume=. 2021 , publisher=

work page 2021
[29]

Environmental Research Letters , volume=

Combining randomized field experiments with observational satellite data to assess the benefits of crop rotations on yields , author=. Environmental Research Letters , volume=. 2022 , publisher=

work page 2022
[30]

arXiv preprint arXiv:2305.19180 , year=

Prognostic adjustment with efficient estimators to unbiasedly leverage historical data in randomized trials , author=. arXiv preprint arXiv:2305.19180 , year=

work page arXiv
[31]

Journal of Causal Inference , volume=

Precise unbiased estimation in randomized experiments using auxiliary observational data , author=. Journal of Causal Inference , volume=. 2023 , publisher=

work page 2023
[32]

The Annals of Applied Statistics , volume=

Overlap violations in external validity: Application to Ugandan cash transfer programs , author=. The Annals of Applied Statistics , volume=. 2025 , publisher=

work page 2025
[33]

Journal of the American Statistical Association , volume=

Estimation and inference of heterogeneous treatment effects using random forests , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

work page 2018
[34]

Improving efficiency in transporting average treatment effects , issn =

Rudolph, K E and Williams, N T and Stuart, E A and DÍAZ, I , month = apr, year =. Improving efficiency in transporting average treatment effects , issn =. doi:10.1093/biomet/asaf027 , journal =

work page doi:10.1093/biomet/asaf027
[35]

Econometrica: Journal of the Econometric Society , pages=

The asymptotic variance of semiparametric estimators , author=. Econometrica: Journal of the Econometric Society , pages=. 1994 , publisher=

work page 1994
[36]

The Annals of Statistics , volume=

Augmented minimax linear estimation , author=. The Annals of Statistics , volume=. 2021 , publisher=

work page 2021
[37]

Journal of the American statistical Association , volume=

Estimation of regression coefficients when some regressors are not always observed , author=. Journal of the American statistical Association , volume=. 1994 , publisher=

work page 1994
[38]

Wiley Interdisciplinary Reviews: Computational Statistics , volume=

Methods for combining observational and experimental causal estimates: A review , author=. Wiley Interdisciplinary Reviews: Computational Statistics , volume=. 2025 , publisher=

work page 2025
[39]

Advances in neural information processing systems , volume=

Removing hidden confounding by experimental grounding , author=. Advances in neural information processing systems , volume=

work page
[40]

Bernoulli , year=

Data fusion methods for the heterogeneity of treatment effect and confounding function , author=. Bernoulli , year=

work page
[41]

arXiv preprint arXiv:2508.14858 , year=

Data Fusion for High-Resolution Estimation , author=. arXiv preprint arXiv:2508.14858 , year=

work page arXiv
[42]

Journal of the American Statistical Association , pages=

Data fusion using weakly aligned sources , author=. Journal of the American Statistical Association , pages=. 2025 , publisher=

work page 2025
[43]

Journal of the American Statistical Association , pages=

On the comparative analysis of average treatment effects estimation via data combination , author=. Journal of the American Statistical Association , pages=. 2025 , publisher=

work page 2025
[44]

Global change biology , volume=

Recent cover crop adoption is associated with small maize and soybean yield losses in the United States , author=. Global change biology , volume=. 2023 , publisher=

work page 2023
[45]

The American Statistician , volume=

One-step weighting to generalize and transport treatment effect estimates to a target population , author=. The American Statistician , volume=. 2024 , publisher=

work page 2024
[46]

Statistics in medicine , volume=

A calibration approach to transportability and data-fusion with observational data , author=. Statistics in medicine , volume=. 2022 , publisher=

work page 2022
[47]

Electronic Journal of Statistics , volume=

Towards optimal doubly robust estimation of heterogeneous causal effects , author=. Electronic Journal of Statistics , volume=. 2023 , publisher=

work page 2023
[48]

2016 IEEE international conference on data science and advanced analytics (DSAA) , pages=

The highly adaptive lasso estimator , author=. 2016 IEEE international conference on data science and advanced analytics (DSAA) , pages=. 2016 , organization=

work page 2016
[49]

Journal of the American Statistical Association , volume=

Who are we missing?: a principled approach to characterizing the underrepresented population , author=. Journal of the American Statistical Association , volume=. 2025 , publisher=

work page 2025
[50]

Communications in Statistics-Theory and Methods , volume=

A note on semiparametric efficient generalization of causal effects from randomized trials to target populations , author=. Communications in Statistics-Theory and Methods , volume=. 2023 , publisher=

work page 2023
[51]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

On the role of surrogates in the efficient estimation of treatment effects with limited outcome data , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2025 , publisher=

work page 2025
[52]

2025 , eprint=

Partially Retargeted Balancing Weights for Causal Effect Estimation Under Positivity Violations , author=. 2025 , eprint=

work page 2025
[53]

2025 , eprint=

Rate doubly robust estimation for weighted average treatment effects , author=. 2025 , eprint=

work page 2025
[54]

2018 , publisher=

Double/debiased machine learning for treatment and structural parameters , author=. 2018 , publisher=

work page 2018
[55]

arXiv preprint arXiv:2406.06941 , year=

Efficient estimation and data fusion under general semiparametric restrictions on outcome mean functions , author=. arXiv preprint arXiv:2406.06941 , year=

work page arXiv
[56]

2000 , publisher=

Asymptotic statistics , author=. 2000 , publisher=

work page 2000
[57]

Econometrica , pages=

On the role of the propensity score in efficient semiparametric estimation of average treatment effects , author=. Econometrica , pages=. 1998 , publisher=

work page 1998
[58]

Efficient and

Bickel, Peter J and Klaassen, Chris AJ and Ritov, Ya’acov and Wellner, Jon A , volume=. Efficient and. 1993 , publisher=

work page 1993
[59]

Generalized Additive Models: An Introduction with R , year =

S.N Wood , edition =. Generalized Additive Models: An Introduction with R , year =

work page
[60]

2024 , note =

SuperLearner: Super Learner Prediction , author =. 2024 , note =

work page 2024
[61]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Regression shrinkage and selection via the lasso , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1996 , publisher=

work page 1996
[62]

The Annals of Statistics , volume=

Multivariate adaptive regression splines , author=. The Annals of Statistics , volume=. 1991 , publisher=

work page 1991
[63]

Machine learning , volume=

Random forests , author=. Machine learning , volume=. 2001 , publisher=

work page 2001
[64]

Machine Learning , volume=

Support-vector networks , author=. Machine Learning , volume=. 1995 , publisher=

work page 1995
[65]

The International Journal of Biostatistics , volume=

Prognostic adjustment with efficient estimators to unbiasedly leverage historical data in randomized trials , author=. The International Journal of Biostatistics , volume=. 2025 , publisher=

work page 2025
[66]

2026 , note =

grf: Generalized Random Forests , author =. 2026 , note =

work page 2026
[67]

2011-68002-30190) , author=

Sustainable corn CAP research data (USDA-NIFA award no. 2011-68002-30190) , author=

work page 2011
[68]

One Earth , volume=

Long-term evidence shows that crop-rotation diversification increases agricultural resilience to adverse growing conditions in North America , author=. One Earth , volume=. 2020 , publisher=

work page 2020

[1] [1]

2019 , journal=

Semi-supervised inference: General theory and estimation of means , author=. 2019 , journal=

work page 2019

[2] [2]

Biometrika , volume=

High-dimensional semi-supervised learning: in search of optimal inference of the mean , author=. Biometrika , volume=. 2022 , publisher=

work page 2022

[3] [3]

A general

Song, Shanshan and Lin, Yuanyuan and Zhou, Yong , journal=. A general. 2024 , publisher=

work page 2024

[4] [4]

Information and Inference: A Journal of the IMA , volume=

Double robust semi-supervised inference for the mean: selection bias under MAR labeling with decaying overlap , author=. Information and Inference: A Journal of the IMA , volume=. 2023 , publisher=

work page 2023

[5] [5]

Stat , volume=

Solving the missing at random problem in semi-supervised learning: An inverse probability weighting method , author=. Stat , volume=. 2024 , publisher=

work page 2024

[6] [6]

Science , volume=

Prediction-powered inference , author=. Science , volume=. 2023 , publisher=

work page 2023

[7] [7]

Angelopoulos, Anastasios N and Duchi, John C and Zrnic, Tijana , journal=

work page

[8] [8]

Proceedings of the National Academy of Sciences , volume=

Cross-prediction-powered inference , author=. Proceedings of the National Academy of Sciences , volume=. 2024 , publisher=

work page 2024

[9] [9]

A unified framework for semiparametrically efficient semi-supervised learning.arXiv preprint arXiv:2502.17741,

A Unified Framework for Semiparametrically Efficient Semi-Supervised Learning , author=. arXiv preprint arXiv:2502.17741 , year=

work page arXiv

[10] [10]

Handbook of Statistical Methods for Precision Medicine , pages=

Semiparametric doubly robust targeted double machine learning: a review , author=. Handbook of Statistical Methods for Precision Medicine , pages=. 2024 , publisher=

work page 2024

[11] [11]

Annual Review of Statistics and its Application , volume=

A review of generalizability and transportability , author=. Annual Review of Statistics and its Application , volume=. 2023 , publisher=

work page 2023

[12] [12]

Statistical Science , volume=

Causal inference methods for combining randomized trials and observational studies: a review , author=. Statistical Science , volume=. 2024 , publisher=

work page 2024

[13] [13]

Biometrika , volume=

Dealing with limited overlap in estimation of average treatment effects , author=. Biometrika , volume=. 2009 , publisher=

work page 2009

[14] [14]

Biometrika , pages=

Doubly-robust and heteroscedasticity-aware sample trimming for causal inference , author=. Biometrika , pages=. 2024 , publisher=

work page 2024

[15] [15]

American Journal of Epidemiology , volume=

Addressing extreme propensity scores via the overlap weights , author=. American Journal of Epidemiology , volume=. 2019 , publisher=

work page 2019

[16] [16]

The American Statistician , year=

Assumption lean regression , author=. The American Statistician , year=

work page

[17] [17]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Assumption-lean inference for generalised linear model parameters , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2022 , publisher=

work page 2022

[18] [18]

Statistical Science , volume=

Models as approximations I , author=. Statistical Science , volume=. 2019 , publisher=

work page 2019

[19] [19]

Statistical Science , volume=

Models as approximations II , author=. Statistical Science , volume=. 2019 , publisher=

work page 2019

[20] [20]

Automatic debiased machine learning via

Chernozhukov, Victor and Newey, Whitney K and Quintas-Martinez, Victor and Syrgkanis, Vasilis , journal=. Automatic debiased machine learning via

work page

[21] [21]

International Conference on Machine Learning , pages=

Chernozhukov, Victor and Newey, Whitney and Quintas-Mart. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022

[22] [22]

Lee, Kaitlyn J and Schuler, Alejandro , journal=

work page

[23] [23]

Journal of the American Statistical Association , volume=

Balancing covariates via propensity score weighting , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

work page 2018

[24] [24]

arXiv preprint arXiv:2110.14831 , year=

The balancing act in causal inference , author=. arXiv preprint arXiv:2110.14831 , year=

work page arXiv

[25] [25]

Leveraging population outcomes to improve the generalization of experimental results: Application to the

Huang, Melody and Egami, Naoki and Hartman, Erin and Miratrix, Luke , journal=. Leveraging population outcomes to improve the generalization of experimental results: Application to the. 2023 , publisher=

work page 2023

[26] [26]

Semiparametric semi-supervised learning for general targets under distribution shift and decaying overlap

Semiparametric semi-supervised learning for general targets under distribution shift and decaying overlap , author=. arXiv preprint arXiv:2505.06452 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

Statistics in Medicine , volume=

Extending inferences from a randomized trial to a new target population , author=. Statistics in Medicine , volume=. 2020 , publisher=

work page 2020

[28] [28]

Journal of Econometrics , volume=

Overlap in observational studies with high-dimensional covariates , author=. Journal of Econometrics , volume=. 2021 , publisher=

work page 2021

[29] [29]

Environmental Research Letters , volume=

Combining randomized field experiments with observational satellite data to assess the benefits of crop rotations on yields , author=. Environmental Research Letters , volume=. 2022 , publisher=

work page 2022

[30] [30]

arXiv preprint arXiv:2305.19180 , year=

Prognostic adjustment with efficient estimators to unbiasedly leverage historical data in randomized trials , author=. arXiv preprint arXiv:2305.19180 , year=

work page arXiv

[31] [31]

Journal of Causal Inference , volume=

Precise unbiased estimation in randomized experiments using auxiliary observational data , author=. Journal of Causal Inference , volume=. 2023 , publisher=

work page 2023

[32] [32]

The Annals of Applied Statistics , volume=

Overlap violations in external validity: Application to Ugandan cash transfer programs , author=. The Annals of Applied Statistics , volume=. 2025 , publisher=

work page 2025

[33] [33]

Journal of the American Statistical Association , volume=

Estimation and inference of heterogeneous treatment effects using random forests , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

work page 2018

[34] [34]

Improving efficiency in transporting average treatment effects , issn =

Rudolph, K E and Williams, N T and Stuart, E A and DÍAZ, I , month = apr, year =. Improving efficiency in transporting average treatment effects , issn =. doi:10.1093/biomet/asaf027 , journal =

work page doi:10.1093/biomet/asaf027

[35] [35]

Econometrica: Journal of the Econometric Society , pages=

The asymptotic variance of semiparametric estimators , author=. Econometrica: Journal of the Econometric Society , pages=. 1994 , publisher=

work page 1994

[36] [36]

The Annals of Statistics , volume=

Augmented minimax linear estimation , author=. The Annals of Statistics , volume=. 2021 , publisher=

work page 2021

[37] [37]

Journal of the American statistical Association , volume=

Estimation of regression coefficients when some regressors are not always observed , author=. Journal of the American statistical Association , volume=. 1994 , publisher=

work page 1994

[38] [38]

Wiley Interdisciplinary Reviews: Computational Statistics , volume=

Methods for combining observational and experimental causal estimates: A review , author=. Wiley Interdisciplinary Reviews: Computational Statistics , volume=. 2025 , publisher=

work page 2025

[39] [39]

Advances in neural information processing systems , volume=

Removing hidden confounding by experimental grounding , author=. Advances in neural information processing systems , volume=

work page

[40] [40]

Bernoulli , year=

Data fusion methods for the heterogeneity of treatment effect and confounding function , author=. Bernoulli , year=

work page

[41] [41]

arXiv preprint arXiv:2508.14858 , year=

Data Fusion for High-Resolution Estimation , author=. arXiv preprint arXiv:2508.14858 , year=

work page arXiv

[42] [42]

Journal of the American Statistical Association , pages=

Data fusion using weakly aligned sources , author=. Journal of the American Statistical Association , pages=. 2025 , publisher=

work page 2025

[43] [43]

Journal of the American Statistical Association , pages=

On the comparative analysis of average treatment effects estimation via data combination , author=. Journal of the American Statistical Association , pages=. 2025 , publisher=

work page 2025

[44] [44]

Global change biology , volume=

Recent cover crop adoption is associated with small maize and soybean yield losses in the United States , author=. Global change biology , volume=. 2023 , publisher=

work page 2023

[45] [45]

The American Statistician , volume=

One-step weighting to generalize and transport treatment effect estimates to a target population , author=. The American Statistician , volume=. 2024 , publisher=

work page 2024

[46] [46]

Statistics in medicine , volume=

A calibration approach to transportability and data-fusion with observational data , author=. Statistics in medicine , volume=. 2022 , publisher=

work page 2022

[47] [47]

Electronic Journal of Statistics , volume=

Towards optimal doubly robust estimation of heterogeneous causal effects , author=. Electronic Journal of Statistics , volume=. 2023 , publisher=

work page 2023

[48] [48]

2016 IEEE international conference on data science and advanced analytics (DSAA) , pages=

The highly adaptive lasso estimator , author=. 2016 IEEE international conference on data science and advanced analytics (DSAA) , pages=. 2016 , organization=

work page 2016

[49] [49]

Journal of the American Statistical Association , volume=

Who are we missing?: a principled approach to characterizing the underrepresented population , author=. Journal of the American Statistical Association , volume=. 2025 , publisher=

work page 2025

[50] [50]

Communications in Statistics-Theory and Methods , volume=

A note on semiparametric efficient generalization of causal effects from randomized trials to target populations , author=. Communications in Statistics-Theory and Methods , volume=. 2023 , publisher=

work page 2023

[51] [51]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

On the role of surrogates in the efficient estimation of treatment effects with limited outcome data , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2025 , publisher=

work page 2025

[52] [52]

2025 , eprint=

Partially Retargeted Balancing Weights for Causal Effect Estimation Under Positivity Violations , author=. 2025 , eprint=

work page 2025

[53] [53]

2025 , eprint=

Rate doubly robust estimation for weighted average treatment effects , author=. 2025 , eprint=

work page 2025

[54] [54]

2018 , publisher=

Double/debiased machine learning for treatment and structural parameters , author=. 2018 , publisher=

work page 2018

[55] [55]

arXiv preprint arXiv:2406.06941 , year=

Efficient estimation and data fusion under general semiparametric restrictions on outcome mean functions , author=. arXiv preprint arXiv:2406.06941 , year=

work page arXiv

[56] [56]

2000 , publisher=

Asymptotic statistics , author=. 2000 , publisher=

work page 2000

[57] [57]

Econometrica , pages=

On the role of the propensity score in efficient semiparametric estimation of average treatment effects , author=. Econometrica , pages=. 1998 , publisher=

work page 1998

[58] [58]

Efficient and

Bickel, Peter J and Klaassen, Chris AJ and Ritov, Ya’acov and Wellner, Jon A , volume=. Efficient and. 1993 , publisher=

work page 1993

[59] [59]

Generalized Additive Models: An Introduction with R , year =

S.N Wood , edition =. Generalized Additive Models: An Introduction with R , year =

work page

[60] [60]

2024 , note =

SuperLearner: Super Learner Prediction , author =. 2024 , note =

work page 2024

[61] [61]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Regression shrinkage and selection via the lasso , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1996 , publisher=

work page 1996

[62] [62]

The Annals of Statistics , volume=

Multivariate adaptive regression splines , author=. The Annals of Statistics , volume=. 1991 , publisher=

work page 1991

[63] [63]

Machine learning , volume=

Random forests , author=. Machine learning , volume=. 2001 , publisher=

work page 2001

[64] [64]

Machine Learning , volume=

Support-vector networks , author=. Machine Learning , volume=. 1995 , publisher=

work page 1995

[65] [65]

The International Journal of Biostatistics , volume=

Prognostic adjustment with efficient estimators to unbiasedly leverage historical data in randomized trials , author=. The International Journal of Biostatistics , volume=. 2025 , publisher=

work page 2025

[66] [66]

2026 , note =

grf: Generalized Random Forests , author =. 2026 , note =

work page 2026

[67] [67]

2011-68002-30190) , author=

Sustainable corn CAP research data (USDA-NIFA award no. 2011-68002-30190) , author=

work page 2011

[68] [68]

One Earth , volume=

Long-term evidence shows that crop-rotation diversification increases agricultural resilience to adverse growing conditions in North America , author=. One Earth , volume=. 2020 , publisher=

work page 2020