Transfer Learning for Meta-analysis Under Covariate Shift

Ali Abdeen; Turgay Ayer; Zilong Wang

arxiv: 2604.02656 · v2 · submitted 2026-04-03 · 📊 stat.ML · cs.LG

Transfer Learning for Meta-analysis Under Covariate Shift

Zilong Wang , Ali Abdeen , Turgay Ayer This is my paper

Pith reviewed 2026-05-13 18:55 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords transfer learningcovariate shiftmeta-analysisheterogeneous treatment effectsdoubly robust estimationtransportabilityclinical trialsCATE estimation

0 comments

The pith

A placebo-anchored transport framework yields Neyman-orthogonal doubly robust estimators for patient-level heterogeneous treatment effects under covariate shift.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method that treats outcomes from source randomized trials as abundant proxy signals and uses scarce target-trial placebo outcomes as high-fidelity labels to calibrate baseline risk. A low-complexity sparse correction aligns the proxy outcome models to the target population, and the anchored models are placed inside a cross-fitted doubly robust learner. This construction produces a Neyman-orthogonal estimator that recovers target-site heterogeneous treatment effects when target treated outcomes are observed. In connected targets the estimator is identified; in disconnected placebo-only targets it reduces to a screen-then-transport procedure under explicit working-model assumptions. Experiments on synthetic data and the IHDP benchmark show gains in CATE accuracy, ATE error, ranking quality, and policy regret especially at small target sample sizes.

Core claim

The central claim is that a placebo-anchored transport framework, which anchors proxy outcome models from source trials to the target population via a sparse correction and embeds them in a cross-fitted doubly robust learner, produces Neyman-orthogonal target-site doubly robust estimators for heterogeneous treatment effects; the framework distinguishes connected targets where effects are identified from disconnected targets where it operates under working-model transport assumptions.

What carries the argument

The placebo-anchored transport framework that uses source-trial outcomes as proxy signals, target placebo outcomes as gold labels, a low-complexity sparse correction to anchor the models, and a cross-fitted doubly robust learner to achieve Neyman orthogonality.

If this is right

In connected targets the estimator identifies target-specific heterogeneous treatment effects.
At small target sample sizes the method improves pointwise CATE accuracy, ATE error, and decision regret over proxy-only, target-only, and standard transport baselines.
In disconnected targets the procedure retains strong ranking performance for treatment targeting decisions.
The cross-fitted doubly robust construction provides robustness to misspecification of the anchored outcome models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The anchoring idea could be tested in other transfer settings that combine abundant unlabeled or proxy data with limited high-fidelity target labels.
Real electronic health record data with known population shifts would reveal whether the working transport conditions are realistic enough for clinical use.
Extending the sparse correction to multiple source studies might further reduce variance when several disconnected trials are available.

Load-bearing premise

The low-complexity sparse correction successfully anchors the proxy outcome models to the target population and the explicit working-model transport assumptions hold in disconnected targets.

What would settle it

A simulation in which the working transport assumptions are deliberately violated shows that the method's pointwise CATE accuracy falls below that of standard transport baselines while ranking quality remains comparable.

read the original abstract

Randomized controlled trials often do not represent the populations where decisions are made, and covariate shift across studies can invalidate standard IPD meta-analysis and transport estimators. We propose a placebo-anchored transport framework that treats source-trial outcomes as abundant proxy signals and target-trial placebo outcomes as scarce, high-fidelity gold labels to calibrate baseline risk. A low-complexity (sparse) correction anchors proxy outcome models to the target population, and the anchored models are embedded in a cross-fitted doubly robust learner, yielding a Neyman-orthogonal, target-site doubly robust estimator for patient-level heterogeneous treatment effects when target treated outcomes are available. We distinguish two regimes: in connected targets (with a treated arm), the method yields target-identified effect estimates; in disconnected targets (placebo-only), it reduces to a principled screen--then--transport procedure under explicit working-model transport assumptions. Experiments on synthetic data and a semi-synthetic IHDP benchmark evaluate pointwise CATE accuracy, ATE error, ranking quality for targeting, decision-theoretic policy regret, and calibration. Across connected settings, the proposed method is best or near-best and improves substantially over proxy-only, target-only, and transport baselines at small target sample sizes; in disconnected settings, it retains strong ranking performance for targeting while pointwise accuracy depends on the strength of the working transport condition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical placebo-anchored framework for meta-analysis under covariate shift that calibrates source models with scarce target placebos via sparse correction before a cross-fitted DR learner.

read the letter

The core contribution is a transport setup that treats source outcomes as proxies and target placebo data as anchors for baseline calibration. It adds a low-complexity sparse correction to adjust the proxy models, then plugs the result into a cross-fitted doubly robust learner for patient-level CATE when target treated outcomes exist. The paper splits the problem into connected targets (treated arm available) and disconnected ones (placebo only), with the latter falling back to a screen-then-transport step under explicit working assumptions. Experiments on synthetic data and the IHDP benchmark show gains in pointwise accuracy, ATE error, and policy regret over proxy-only, target-only, and standard transport baselines, especially at small target sizes. Ranking quality holds up reasonably in the disconnected case too. This combination of pieces is not standard in the cited transport or IPD meta-analysis literature, so the framing is new enough to notice. The experiments are reported at a level that lets you see where the method helps most. The main soft spot is the reliance on the sparse correction actually capturing the baseline shift. If the true difference between source and target risk functions sits outside the low-dimensional subspace the correction uses, the anchored model stays misspecified and double robustness only protects against estimation error, not that structural mismatch. The abstract already flags the working-model transport assumption for disconnected targets, which is honest but means the method's reliability depends on how well that assumption holds in practice. No obvious circularity or invented quantities appear. This is for applied causal-inference researchers who run meta-analyses on clinical trials where populations differ. Readers who need to produce target-specific effect estimates or targeting rules from mismatched sources will get direct value. The problem is common and the approach is coherent enough that it deserves a serious referee, even if the sparse-correction assumption will need close scrutiny in review.

Referee Report

2 major / 2 minor

Summary. The paper proposes a placebo-anchored transport framework for IPD meta-analysis under covariate shift. Abundant source-trial outcomes serve as proxy signals for outcomes, while scarce target-trial placebo outcomes calibrate baseline risk via a low-complexity sparse correction. The anchored models are embedded in a cross-fitted doubly robust learner to produce a Neyman-orthogonal, target-site doubly robust estimator for patient-level CATE when target treated outcomes are available. The work distinguishes connected targets (yielding target-identified estimates) from disconnected targets (reducing to a screen-then-transport procedure under explicit working-model transport assumptions). Experiments on synthetic data and the semi-synthetic IHDP benchmark report improvements in pointwise CATE accuracy, ATE error, ranking quality, policy regret, and calibration, particularly at small target sample sizes.

Significance. If the Neyman-orthogonality and double-robustness properties hold under the stated assumptions, the framework offers a practical method for transporting heterogeneous effect estimates across studies with covariate shift, leveraging limited target placebo data to anchor proxies. This addresses a common limitation in clinical meta-analysis where RCTs do not match target populations. The empirical evaluation on decision-theoretic metrics such as policy regret and the explicit handling of connected versus disconnected regimes add applied value for targeting and policy learning in statistics and machine learning.

major comments (2)

[Abstract / Estimator construction] The abstract asserts that the method yields a Neyman-orthogonal, target-site doubly robust estimator, but supplies no derivation steps, influence-function derivation, or explicit error-bound analysis. These steps are load-bearing for the central claim that the cross-fitted DR learner remains valid after the sparse correction; the full manuscript must include them (e.g., in the section defining the estimator and its influence function) to substantiate the properties.
[Sparse correction / Transport assumptions] The low-complexity (sparse) correction is load-bearing for anchoring proxy models to the target population. If the true baseline-risk difference lies outside the sparse subspace (dense high-dimensional shift or uncaptured interactions), the anchored model remains misspecified; double robustness corrects only for nuisance estimation error, not this structural transport misspecification. Consequently the estimator may converge to a biased functional of the source rather than the target CATE. This assumption is flagged for the disconnected regime but requires explicit robustness checks or relaxation for the connected regime as well.

minor comments (2)

[Abstract / Experiments] The abstract and experiments section describe results at a high level; adding a brief statement on the concrete form of the sparse correction (e.g., L1 penalty on which coefficients or basis) and the precise simulation protocol would aid reproducibility.
[Notation / Methods] Notation for the proxy outcome models, sparse correction parameters, and the final DR functional should be introduced consistently early in the paper to improve readability for readers unfamiliar with the specific transport setup.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We have carefully considered the major comments and provide point-by-point responses below. We plan to make revisions to strengthen the manuscript as outlined.

read point-by-point responses

Referee: [Abstract / Estimator construction] The abstract asserts that the method yields a Neyman-orthogonal, target-site doubly robust estimator, but supplies no derivation steps, influence-function derivation, or explicit error-bound analysis. These steps are load-bearing for the central claim that the cross-fitted DR learner remains valid after the sparse correction; the full manuscript must include them (e.g., in the section defining the estimator and its influence function) to substantiate the properties.

Authors: We agree that the derivation of Neyman-orthogonality and double robustness, including the influence function after the sparse correction, should be presented explicitly. In the revised manuscript we will add a dedicated subsection deriving the influence function for the cross-fitted doubly robust learner and showing that the estimator remains Neyman-orthogonal under the maintained conditions. This will include the relevant error-bound analysis to substantiate the central claims. revision: yes
Referee: [Sparse correction / Transport assumptions] The low-complexity (sparse) correction is load-bearing for anchoring proxy models to the target population. If the true baseline-risk difference lies outside the sparse subspace (dense high-dimensional shift or uncaptured interactions), the anchored model remains misspecified; double robustness corrects only for nuisance estimation error, not this structural transport misspecification. Consequently the estimator may converge to a biased functional of the source rather than the target CATE. This assumption is flagged for the disconnected regime but requires explicit robustness checks or relaxation for the connected regime as well.

Authors: We thank the referee for this observation. In the connected regime the estimator is target-identified: the doubly robust construction uses the available target treated outcomes to identify the target CATE directly, so that consistency holds even if the sparse correction is misspecified (the correction improves finite-sample efficiency by borrowing strength from the source but is not required for asymptotic validity). Double robustness protects against estimation error in the anchored nuisances. The disconnected regime does rely on the explicit working-model transport assumption, as already noted. In the revision we will add a formal statement of the identification conditions distinguishing the two regimes, clarify the role of the sparse correction, and include additional simulation experiments that assess sensitivity to violations of the sparsity assumption in the connected setting. revision: yes

Circularity Check

0 steps flagged

No circularity; standard DR and transport machinery invoked independently of fitted inputs

full rationale

The derivation chain invokes Neyman orthogonality and double robustness as pre-existing causal-inference results rather than deriving them from the paper's own sparse correction or proxy fits. The abstract and description present the sparse correction as an explicit modeling choice under stated working-model transport assumptions, without any equation that reduces the target CATE to a fitted quantity by construction. No self-citation is shown to be load-bearing for the central claim, and the two regimes (connected vs. disconnected) are distinguished by explicit assumptions rather than by renaming or self-definition. The estimator remains a standard cross-fitted DR learner once the anchored nuisance functions are supplied; nothing in the provided text forces the final functional to equal its inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review; free parameters and axioms inferred from high-level description only.

free parameters (1)

sparse correction parameters
Low-complexity correction term used to anchor proxy models; exact dimension or regularization strength not specified.

axioms (1)

domain assumption working-model transport assumptions hold in disconnected targets
Explicitly invoked to reduce to a screen-then-transport procedure when target treated outcomes are unavailable.

pith-pipeline@v0.9.0 · 5536 in / 1247 out tokens · 60296 ms · 2026-05-13T18:55:03.761458+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A low-complexity (sparse) correction anchors proxy outcome models to the target population, and the anchored models are embedded in a cross-fitted doubly robust learner, yielding a Neyman-orthogonal, target-site doubly robust estimator

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

L. M. Friedman, C. D. Furberg, D. L. DeMets, D. M. Reboussin, and C. B. Granger,Fundamentals of clinical trials. Springer, 2015

work page 2015
[2]

Providing clinical evidence of effectiveness for human drug and biological products,

U.S. Food and Drug Administration, “Providing clinical evidence of effectiveness for human drug and biological products,” May 1998, guidance Document, Docket No. FDA-1997-D-0027. [Online]. Available: https: //www.fda.gov/media/71655/download

work page 1998
[3]

Are rcts the gold standard?

N. Cartwright, “Are rcts the gold standard?”BioSocieties, vol. 2, no. 1, pp. 11–20, 2007

work page 2007
[4]

Combination of direct and indirect evidence in mixed treatment comparisons,

G. Lu and A. Ades, “Combination of direct and indirect evidence in mixed treatment comparisons,”Statistics in medicine, vol. 23, no. 20, pp. 3105–3124, 2004

work page 2004
[5]

Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next gener- ation evidence synthesis tool,

G. Salanti, “Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next gener- ation evidence synthesis tool,”Research synthesis meth- ods, vol. 3, no. 2, pp. 80–97, 2012

work page 2012
[6]

Individual participant data meta-analysis for healthcare research,

R. D. Riley, L. A. Stewart, and J. F. Tierney, “Individual participant data meta-analysis for healthcare research,” Individual Participant Data Meta-Analysis: a handbook for healthcare research, pp. 1–6, 2021

work page 2021
[7]

Using indi- vidual participant data to improve network meta-analysis projects,

R. D. Riley, S. Dias, S. Donegan, J. F. Tierney, L. A. Stewart, O. Efthimiou, and D. M. Phillippo, “Using indi- vidual participant data to improve network meta-analysis projects,”BMJ evidence-based medicine, vol. 28, no. 3, pp. 197–203, 2023

work page 2023
[8]

Generalizing causal inferences from randomized trials: counterfactual and graphical identification,

I. J. Dahabreh, S. E. Robertson, E. J. Tchetgen Tchetgen, E. A. Stuart, and M. A. Hern ´an, “Generalizing causal inferences from randomized trials: counterfactual and graphical identification,”Biometrics, 2019

work page 2019
[9]

External validity: From do-calculus to transportability across populations,

J. Pearl and E. Bareinboim, “External validity: From do-calculus to transportability across populations,” in Probabilistic and causal inference: The works of Judea Pearl, 2022, pp. 451–482

work page 2022
[10]

A generalization of sampling without replacement from a finite universe,

D. G. Horvitz and D. J. Thompson, “A generalization of sampling without replacement from a finite universe,” Journal of the American statistical Association, vol. 47, no. 260, pp. 663–685, 1952

work page 1952
[11]

The central role of the propensity score in observational studies for causal effects,

P. R. Rosenbaum and D. B. Rubin, “The central role of the propensity score in observational studies for causal effects,”Biometrika, vol. 70, no. 1, pp. 41–55, 1983

work page 1983
[12]

Semiparametric effi- ciency in multivariate regression models with missing data,

J. M. Robins and A. Rotnitzky, “Semiparametric effi- ciency in multivariate regression models with missing data,”Journal of the American Statistical Association, vol. 90, no. 429, pp. 122–129, 1995

work page 1995
[13]

Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies,

J. Hainmueller, “Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies,”Political analysis, vol. 20, no. 1, pp. 25–46, 2012

work page 2012
[14]

Predicting with proxies: Transfer learning in high dimension,

H. Bastani, “Predicting with proxies: Transfer learning in high dimension,”Management Science, vol. 67, no. 5, pp. 2964–2984, 2021

work page 2021
[15]

Transfer learning under high- dimensional generalized linear models,

Y . Tian and Y . Feng, “Transfer learning under high- dimensional generalized linear models,”Journal of the American Statistical Association, vol. 118, no. 544, pp. 2684–2697, 2023

work page 2023
[16]

Double/debiased machine learning for treatment and structural parame- ters,

V . Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins, “Double/debiased machine learning for treatment and structural parame- ters,” 2018

work page 2018
[17]

Semiparametric doubly robust targeted double machine learning: a review,

E. H. Kennedy, “Semiparametric doubly robust targeted double machine learning: a review,”Handbook of statis- tical methods for precision medicine, pp. 207–236, 2024

work page 2024
[18]

Generalized random forests.Ann

S. Athey, J. Tibshirani, and S. Wager, “Generalized random forests,”The Annals of Statistics, vol. 47, no. 2, pp. 1148 – 1178, 2019. [Online]. Available: https://doi.org/10.1214/18-AOS1709

work page doi:10.1214/18-aos1709 2019
[19]

Bart: Bayesian additive regression trees,

H. A. Chipman, E. I. George, and R. E. McCulloch, “Bart: Bayesian additive regression trees,”The Annals of Applied Statistics, vol. 4, no. 1, Mar. 2010. [Online]. Available: http://dx.doi.org/10.1214/09-AOAS285

work page doi:10.1214/09-aoas285 2010
[20]

M. J. Van der Laan, S. Roseet al.,Targeted learn- ing: causal inference for observational and experimental data. Springer, 2011, vol. 4

work page 2011
[21]

Towards optimal doubly robust estimation of heterogeneous causal effects,

E. H. Kennedy, “Towards optimal doubly robust estimation of heterogeneous causal effects,”Electronic Journal of Statistics, vol. 17, no. 2, pp. 3008 – 3049, 2023. [Online]. Available: https://doi.org/10.1214/ 23-EJS2157

work page 2023
[22]

Bayesian nonparametric modeling for causal inference,

J. L. Hill, “Bayesian nonparametric modeling for causal inference,”Journal of Computational and Graphical Statistics, vol. 20, no. 1, pp. 217–240, 2011

work page 2011
[23]

Transportability of trial results using inverse odds of sampling weights,

D. Westreich, J. K. Edwards, C. R. Lesko, E. Stuart, and S. R. Cole, “Transportability of trial results using inverse odds of sampling weights,”American journal of epidemiology, vol. 186, no. 8, pp. 1010–1014, 2017. APPENDIX A. Asymptotics and Error Decompositions There are two regimes: 1)Connected target (Option A /Proposed-CF).The target site has both a...

work page 2017
[24]

Split target data intoK= 2folds

work page
[25]

For each foldk: fitˆµ (−k) 0 ,ˆµ(−k) 1 on remaining folds (propensitye(X)is known by randomization design, not estimated)

work page
[26]

Compute DR pseudo-outcomes˜τ i for foldksamples

work page
[27]

Run glmtrans on pseudo-outcomes Proposed-B.For disconnected targets (m 1 = 0):

work page
[28]

Use target placebo outcomes to run glmtrans source detection on the control arm, identifying transferable sourcesA

work page
[29]

Fit source-side DR CATE using only selected source data

work page
[30]

Transport the source CATE estimate to the target covariate distribution by averaging over target placebo covariates J. Hyperparameters and Tuning a) Regularization.: •LASSO/Ridge: 5-fold cross-validation withLassoCV/RidgeCV •glmtrans:λselected by 5-fold cross-validation minimizing MSE •Random Forest (proxy outcome models): 100 trees, max depth 8, min samp...

work page 2004

[1] [1]

L. M. Friedman, C. D. Furberg, D. L. DeMets, D. M. Reboussin, and C. B. Granger,Fundamentals of clinical trials. Springer, 2015

work page 2015

[2] [2]

Providing clinical evidence of effectiveness for human drug and biological products,

U.S. Food and Drug Administration, “Providing clinical evidence of effectiveness for human drug and biological products,” May 1998, guidance Document, Docket No. FDA-1997-D-0027. [Online]. Available: https: //www.fda.gov/media/71655/download

work page 1998

[3] [3]

Are rcts the gold standard?

N. Cartwright, “Are rcts the gold standard?”BioSocieties, vol. 2, no. 1, pp. 11–20, 2007

work page 2007

[4] [4]

Combination of direct and indirect evidence in mixed treatment comparisons,

G. Lu and A. Ades, “Combination of direct and indirect evidence in mixed treatment comparisons,”Statistics in medicine, vol. 23, no. 20, pp. 3105–3124, 2004

work page 2004

[5] [5]

Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next gener- ation evidence synthesis tool,

G. Salanti, “Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next gener- ation evidence synthesis tool,”Research synthesis meth- ods, vol. 3, no. 2, pp. 80–97, 2012

work page 2012

[6] [6]

Individual participant data meta-analysis for healthcare research,

R. D. Riley, L. A. Stewart, and J. F. Tierney, “Individual participant data meta-analysis for healthcare research,” Individual Participant Data Meta-Analysis: a handbook for healthcare research, pp. 1–6, 2021

work page 2021

[7] [7]

Using indi- vidual participant data to improve network meta-analysis projects,

R. D. Riley, S. Dias, S. Donegan, J. F. Tierney, L. A. Stewart, O. Efthimiou, and D. M. Phillippo, “Using indi- vidual participant data to improve network meta-analysis projects,”BMJ evidence-based medicine, vol. 28, no. 3, pp. 197–203, 2023

work page 2023

[8] [8]

Generalizing causal inferences from randomized trials: counterfactual and graphical identification,

I. J. Dahabreh, S. E. Robertson, E. J. Tchetgen Tchetgen, E. A. Stuart, and M. A. Hern ´an, “Generalizing causal inferences from randomized trials: counterfactual and graphical identification,”Biometrics, 2019

work page 2019

[9] [9]

External validity: From do-calculus to transportability across populations,

J. Pearl and E. Bareinboim, “External validity: From do-calculus to transportability across populations,” in Probabilistic and causal inference: The works of Judea Pearl, 2022, pp. 451–482

work page 2022

[10] [10]

A generalization of sampling without replacement from a finite universe,

D. G. Horvitz and D. J. Thompson, “A generalization of sampling without replacement from a finite universe,” Journal of the American statistical Association, vol. 47, no. 260, pp. 663–685, 1952

work page 1952

[11] [11]

The central role of the propensity score in observational studies for causal effects,

P. R. Rosenbaum and D. B. Rubin, “The central role of the propensity score in observational studies for causal effects,”Biometrika, vol. 70, no. 1, pp. 41–55, 1983

work page 1983

[12] [12]

Semiparametric effi- ciency in multivariate regression models with missing data,

J. M. Robins and A. Rotnitzky, “Semiparametric effi- ciency in multivariate regression models with missing data,”Journal of the American Statistical Association, vol. 90, no. 429, pp. 122–129, 1995

work page 1995

[13] [13]

Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies,

J. Hainmueller, “Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies,”Political analysis, vol. 20, no. 1, pp. 25–46, 2012

work page 2012

[14] [14]

Predicting with proxies: Transfer learning in high dimension,

H. Bastani, “Predicting with proxies: Transfer learning in high dimension,”Management Science, vol. 67, no. 5, pp. 2964–2984, 2021

work page 2021

[15] [15]

Transfer learning under high- dimensional generalized linear models,

Y . Tian and Y . Feng, “Transfer learning under high- dimensional generalized linear models,”Journal of the American Statistical Association, vol. 118, no. 544, pp. 2684–2697, 2023

work page 2023

[16] [16]

Double/debiased machine learning for treatment and structural parame- ters,

V . Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins, “Double/debiased machine learning for treatment and structural parame- ters,” 2018

work page 2018

[17] [17]

Semiparametric doubly robust targeted double machine learning: a review,

E. H. Kennedy, “Semiparametric doubly robust targeted double machine learning: a review,”Handbook of statis- tical methods for precision medicine, pp. 207–236, 2024

work page 2024

[18] [18]

Generalized random forests.Ann

S. Athey, J. Tibshirani, and S. Wager, “Generalized random forests,”The Annals of Statistics, vol. 47, no. 2, pp. 1148 – 1178, 2019. [Online]. Available: https://doi.org/10.1214/18-AOS1709

work page doi:10.1214/18-aos1709 2019

[19] [19]

Bart: Bayesian additive regression trees,

H. A. Chipman, E. I. George, and R. E. McCulloch, “Bart: Bayesian additive regression trees,”The Annals of Applied Statistics, vol. 4, no. 1, Mar. 2010. [Online]. Available: http://dx.doi.org/10.1214/09-AOAS285

work page doi:10.1214/09-aoas285 2010

[20] [20]

M. J. Van der Laan, S. Roseet al.,Targeted learn- ing: causal inference for observational and experimental data. Springer, 2011, vol. 4

work page 2011

[21] [21]

Towards optimal doubly robust estimation of heterogeneous causal effects,

E. H. Kennedy, “Towards optimal doubly robust estimation of heterogeneous causal effects,”Electronic Journal of Statistics, vol. 17, no. 2, pp. 3008 – 3049, 2023. [Online]. Available: https://doi.org/10.1214/ 23-EJS2157

work page 2023

[22] [22]

Bayesian nonparametric modeling for causal inference,

J. L. Hill, “Bayesian nonparametric modeling for causal inference,”Journal of Computational and Graphical Statistics, vol. 20, no. 1, pp. 217–240, 2011

work page 2011

[23] [23]

Transportability of trial results using inverse odds of sampling weights,

D. Westreich, J. K. Edwards, C. R. Lesko, E. Stuart, and S. R. Cole, “Transportability of trial results using inverse odds of sampling weights,”American journal of epidemiology, vol. 186, no. 8, pp. 1010–1014, 2017. APPENDIX A. Asymptotics and Error Decompositions There are two regimes: 1)Connected target (Option A /Proposed-CF).The target site has both a...

work page 2017

[24] [24]

Split target data intoK= 2folds

work page

[25] [25]

For each foldk: fitˆµ (−k) 0 ,ˆµ(−k) 1 on remaining folds (propensitye(X)is known by randomization design, not estimated)

work page

[26] [26]

Compute DR pseudo-outcomes˜τ i for foldksamples

work page

[27] [27]

Run glmtrans on pseudo-outcomes Proposed-B.For disconnected targets (m 1 = 0):

work page

[28] [28]

Use target placebo outcomes to run glmtrans source detection on the control arm, identifying transferable sourcesA

work page

[29] [29]

Fit source-side DR CATE using only selected source data

work page

[30] [30]

Transport the source CATE estimate to the target covariate distribution by averaging over target placebo covariates J. Hyperparameters and Tuning a) Regularization.: •LASSO/Ridge: 5-fold cross-validation withLassoCV/RidgeCV •glmtrans:λselected by 5-fold cross-validation minimizing MSE •Random Forest (proxy outcome models): 100 trees, max depth 8, min samp...

work page 2004