Causal Effect Estimation with TMLE: Handling Missing Data and Near-Violations of Positivity

arxiv: 2510.22202 · v1 · submitted 2025-10-25 · 📊 stat.ME · stat.ML

Causal Effect Estimation with TMLE: Handling Missing Data and Near-Violations of Positivity

Christoph Wiederkehr (1) , Christian Heumann (2) , Michael Schomaker (1 , 2 , 3 , 4) ((1) Department of Statistics , Ludwig-Maximilians University Munich , (2) Centre for Integrated Data

show 11 more authors

Epidemiological Research Cape Town (3) Institute of Public Health Medical Decision Making Health Technology Assessment UMIT - University for Health Sciences Medical Informatics Technology Hall in Tirol (4) Munich Center for Machine Learning (MCML) Ludwig-Maximilians University Munich)

This is my paper

Pith reviewed 2026-05-18 04:31 UTC · model grok-4.3

classification 📊 stat.ME stat.ML

keywords causal inferencemissing dataTMLEpositivity violationmultiple imputationaverage treatment effectepidemiological studiescomplete case analysis

0 comments p. Extension

The pith

Complete-case TMLE that models outcome missingness reduces bias more than multiple imputation when estimating causal effects under missing data and near-positivity violations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests targeted maximum likelihood estimation for average treatment effect estimation when observations are missing according to five common mechanisms and when the positivity assumption is nearly violated. It pits eight missing-data strategies against one another, some that discard incomplete records and others that fill in values via multiple imputation using either parametric or tree-based models. Simulations include both abstract data-generating processes and a design-based setup that undersmooths highly adaptive lasso on the WASH Benefits Bangladesh dataset. The central result is that keeping only complete cases while embedding an explicit model for outcome missingness inside TMLE produces smaller bias and stays more stable when positivity is strained. This comparison matters for applied researchers who routinely face incomplete exposure, outcome, and covariate records in one-time epidemiological studies.

Core claim

When targeted maximum likelihood estimation is paired with complete-case analysis that incorporates an outcome-missingness model, the resulting average treatment effect estimates exhibit lower bias and greater robustness to near-positivity violations than estimates obtained from any of the multiple-imputation strategies examined, whether those imputations rely on parametric regressions or on classification and regression trees.

What carries the argument

Complete-case TMLE augmented by an explicit model for the probability that the outcome is observed, which uses the observed data to correct for selection while retaining the full sample structure for treatment and covariate relations.

If this is right

Non-multiple-imputation approaches, especially the outcome-missingness-adjusted complete-case version, are preferred when bias minimization is the primary goal.
Multiple imputation using classification and regression trees yields lower root mean squared error and maintains nominal coverage more reliably than other imputation variants.
Trade-offs between bias and interval coverage should guide method choice depending on whether point estimation or uncertainty quantification is prioritized.
The relative robustness of the recommended non-MI strategy persists across both model-based and design-based simulation settings that include not-at-random missingness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same complete-case adjustment could be tested inside other doubly robust estimators such as augmented inverse-probability weighting to see whether the bias reduction generalizes beyond TMLE.
In studies with time-varying exposures, embedding missingness models for each time point might preserve the robustness property observed here.
When positivity violations are suspected, a preliminary check of the estimated propensity-score distribution could flag whether the outcome-missingness adjustment is likely to deliver the reported stability.
Applied analysts could embed the recommended procedure in sensitivity analyses that vary the assumed missingness mechanism to quantify how much the point estimate moves.

Load-bearing premise

The five missingness-directed acyclic graphs together with the undersmoothed highly adaptive lasso design-based simulation on the WASH Benefits Bangladesh data set accurately capture the missing-data patterns and positivity challenges that arise in typical one-point exposure epidemiological studies.

What would settle it

Apply the same eight methods to a randomized trial with known true average treatment effect, artificially introduce missingness according to one of the paper's DAGs, and check whether the complete-case TMLE with outcome-missingness model still shows the smallest bias.

Figures

Figures reproduced from arXiv: 2510.22202 by 2, (2) Centre for Integrated Data, 3, (3) Institute of Public Health, 4) ((1) Department of Statistics, (4) Munich Center for Machine Learning (MCML), Cape Town, Christian Heumann (2), Christoph Wiederkehr (1), Epidemiological Research, Hall in Tirol, Health Technology Assessment, Ludwig-Maximilians University Munich, Ludwig-Maximilians University Munich), Medical Decision Making, Medical Informatics, Michael Schomaker (1, Technology, UMIT - University for Health Sciences.

**Figure 2.** Figure 2: Workflow illustration: The study starts with different DGPs (e.g., DGP 1), explores three positivity levels for each DGP, and incorporates m-DAGs to represent different missingness mechanisms. Finally, various missing data handling methods are applied, alongside TMLE, to estimate causal effects. 3.3 Evaluation Criteria We assesed performance of the approaches for handling missing data, following Morris et … view at source ↗

**Figure 3.** Figure 3: Model-based simulation: Relative bias (%) in ATE estimation using different missing data [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Model-based simulation: Coverage in ATE estimation using different missing data methods [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

**Figure 5.** Figure 5: Model-based simulation: RMSE in ATE estimation using different missing data methods [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

**Figure 6.** Figure 6: Results for the design-based simulation: Performance evaluation in ATE estimation using [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗

read the original abstract

We evaluate the performance of targeted maximum likelihood estimation (TMLE) for estimating the average treatment effect in missing data scenarios under varying levels of positivity violations. We employ model- and design-based simulations, with the latter using undersmoothed highly adaptive lasso on the 'WASH Benefits Bangladesh' dataset to mimic real-world complexities. Five missingness-directed acyclic graphs are considered, capturing common missing data mechanisms in epidemiological research, particularly in one-point exposure studies. These mechanisms include also not-at-random missingness in the exposure, outcome, and confounders. We compare eight missing data methods in conjunction with TMLE as the analysis method, distinguishing between non-multiple imputation (non-MI) and multiple imputation (MI) approaches. The MI approaches use both parametric and machine-learning models. Results show that non-MI methods, particularly complete cases with TMLE incorporating an outcome-missingness model, exhibit lower bias compared to all other evaluated missing data methods and greater robustness against positivity violations across. In Comparison MI with classification and regression trees (CART) achieve lower root mean squared error, while often maintaining nominal coverage rates. Our findings highlight the trade-offs between bias and coverage, and we recommend using complete cases with TMLE incorporating an outcome-missingness model for bias reduction and MI CART when accurate confidence intervals are the priority.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This simulation study finds complete-case TMLE that models outcome missingness gives the lowest bias under near-positivity violations, while MI-CART performs better on RMSE and coverage, but the ordering depends on the chosen missingness mechanisms.

read the letter

The paper compares eight ways to handle missing data when using TMLE for average treatment effect estimation. It runs both model-based simulations with five explicit missingness DAGs (including MNAR on A, Y, and W) and a design-based simulation that fits an undersmoothed highly adaptive lasso to the WASH Benefits Bangladesh data to create a realistic observed distribution before imposing missingness. The headline result is that non-MI complete-case analysis paired with TMLE that includes an outcome-missingness model shows lower bias and more robustness to near-positivity violations than the other approaches, while MI using CART trees achieves lower RMSE and better coverage rates in many settings. They report clear trade-offs between bias and interval performance across the scenarios.

Referee Report

2 major / 3 minor

Summary. The manuscript evaluates targeted maximum likelihood estimation (TMLE) for average treatment effect estimation under missing data and near-positivity violations. It employs both model-based and design-based simulations, the latter using undersmoothed highly adaptive lasso on the WASH Benefits Bangladesh dataset, along with five missingness DAGs that include MNAR mechanisms for exposure, outcome, and confounders. Eight missing-data methods (non-MI and MI, with parametric and ML variants) are compared when paired with TMLE; the central claim is that non-MI complete-case analysis with an explicit outcome-missingness model yields the lowest bias and greatest robustness to positivity violations, while MI with CART achieves lower RMSE and maintains nominal coverage.

Significance. If the simulation results hold under broader conditions, the work supplies concrete, actionable guidance on bias-coverage trade-offs when applying TMLE to incomplete epidemiological data with positivity concerns, a setting where practitioners routinely face these issues.

major comments (2)

[Design-based simulation] Design-based simulation (described in the methods for the WASH Benefits analysis): the decision to undersmooth HAL and then impose the five fixed missingness mechanisms does not include a sensitivity check on the tail quantiles of g(A|W); because near-positivity violations are driven precisely by those tails, the reported ordering of bias and robustness may be an artifact of the chosen DGP rather than a general property of the estimators.
[Results] Results tables comparing bias across the eight methods: the claim that complete-case TMLE with outcome-missingness model exhibits 'lower bias' and 'greater robustness' is presented as the headline finding, yet no formal comparison (e.g., paired t-tests or bootstrap intervals on the Monte Carlo bias differences) is reported; without this, it is impossible to judge whether the observed advantage exceeds simulation noise.

minor comments (3)

[Abstract] Abstract: the final sentence is truncated ('across.'); it should read 'across scenarios' or equivalent.
[Methods] Notation: the manuscript introduces 'outcome-missingness model' without an explicit equation or diagram showing how this model enters the TMLE targeting step; a short display equation would remove ambiguity.
[Figures] Figure captions: the DAGs in the five missingness scenarios would benefit from explicit node labels (A, Y, W, R) and a legend distinguishing observed versus missing arrows.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and describe the changes we will make to the manuscript.

read point-by-point responses

Referee: [Design-based simulation] Design-based simulation (described in the methods for the WASH Benefits analysis): the decision to undersmooth HAL and then impose the five fixed missingness mechanisms does not include a sensitivity check on the tail quantiles of g(A|W); because near-positivity violations are driven precisely by those tails, the reported ordering of bias and robustness may be an artifact of the chosen DGP rather than a general property of the estimators.

Authors: We appreciate the referee's point that near-positivity violations are driven by the tails of g(A|W) and that a sensitivity check would strengthen the design-based results. Our choice of undersmoothed HAL was intended to retain the empirical distribution and tail behavior observed in the WASH Benefits data rather than impose an artificial DGP. The five missingness mechanisms are then applied to this fixed empirical structure. Nevertheless, we agree that additional checks are warranted. In the revision we will add a sensitivity analysis that varies the undersmoothing parameter of HAL and reports the resulting changes in the tail quantiles of the estimated propensity scores, together with the corresponding performance metrics for the leading methods. revision: yes
Referee: [Results] Results tables comparing bias across the eight methods: the claim that complete-case TMLE with outcome-missingness model exhibits 'lower bias' and 'greater robustness' is presented as the headline finding, yet no formal comparison (e.g., paired t-tests or bootstrap intervals on the Monte Carlo bias differences) is reported; without this, it is impossible to judge whether the observed advantage exceeds simulation noise.

Authors: We agree that formal assessment of whether the observed bias differences exceed Monte Carlo error would improve the credibility of the headline claim. In the revised manuscript we will report Monte Carlo standard errors for all bias estimates and add bootstrap intervals (or paired t-tests) on the differences in bias between the complete-case TMLE with outcome-missingness model and the other seven methods. These additions will allow readers to evaluate whether the reported advantages are statistically distinguishable from simulation noise. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical simulation results are independent of fitted inputs

full rationale

The paper evaluates TMLE performance under missing data and positivity violations exclusively through model- and design-based simulations on specified DAGs and the WASH Benefits dataset. Performance metrics (bias, RMSE, coverage) are computed directly from applying the estimators to generated data; these quantities do not reduce by any equation or self-citation to previously fitted parameters. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the reported chain. The simulation design is externally specified and falsifiable, making the findings self-contained against the chosen benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The evaluation rests on standard causal-inference and missing-data assumptions plus the claim that the chosen DAGs and the real-data simulation design are representative; no new free parameters or invented entities are introduced.

axioms (1)

domain assumption Five missingness-directed acyclic graphs capture common missing data mechanisms in epidemiological research, particularly in one-point exposure studies.
Stated directly in the abstract as the basis for the simulation scenarios.

pith-pipeline@v0.9.0 · 5868 in / 1330 out tokens · 48348 ms · 2026-05-18T04:31:15.209302+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We compare eight missing data methods in conjunction with TMLE... non-MI methods, particularly complete cases with TMLE incorporating an outcome-missingness model, exhibit lower bias... MI with CART achieve lower root mean squared error
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Five missingness-directed acyclic graphs... recoverability of the ATE

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages

[1]

Wiley, 2019.doi:10.1002/9781119482260

Roderick Little and Donald Rubin.Statistical Analysis with Missing Data, Third Edition. Wiley, 2019.doi:10.1002/9781119482260

work page doi:10.1002/9781119482260 2019
[2]

A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation

Roderick J. Little, James R. Carpenter, and Katherine J. Lee. “A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation”. In:Sociological Methods & Research53.3 (2024), pp. 1105–1135.doi:10. 1177/00491241221113873

work page 2024
[3]

Graphical models for inference with missing data

Karthika Mohan, Pearl Judea, and Tian Jin. “Graphical models for inference with missing data”. In:Advances in Neural Information Processing Systems26 (2013), pp. 1277–1285

work page 2013
[4]

Graphical models for processing missing data

Karthika Mohan and Judea Pearl. “Graphical models for processing missing data”. In:Journal of the American Statistical Association116.534 (2021), pp. 1023–1037.doi:10.1080/01621459. 2021.1874961.url:https://doi.org/10.1080/01621459.2021.1874961. 20

work page doi:10.1080/01621459 2021
[5]

Canonical Causal Diagrams to Guide the Treatment of Missing Data in Epidemiologic Studies

Margarita Moreno-Betancur et al. “Canonical Causal Diagrams to Guide the Treatment of Missing Data in Epidemiologic Studies”. In:American Journal of Epidemiology187.12 (2018), pp. 2705– 2715.doi:10.1093/aje/kwy173

work page doi:10.1093/aje/kwy173 2018
[6]

Recoverability of Causal Effects under Presence of Missing Data: A Longitudinal Case Study

Anastasiia Holovchak et al. “Recoverability of Causal Effects under Presence of Missing Data: A Longitudinal Case Study”. In:Biostatistics26.1 (2025), kxae044.doi:10.1093/biostatistics/ kxae044

work page doi:10.1093/biostatistics/ 2025
[7]

Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification

Katherine J Lee et al. “Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification”. In:International Journal of Epidemiology52.4 (2023), pp. 1268–1275.url:https://doi.org/10.1093/ije/dyad008

work page doi:10.1093/ije/dyad008 2023
[8]

Hern´ an and James M

Miguel A. Hern´ an and James M. Robins.Causal Inference: What If. Chapman and Hall/CRC, 2020

work page 2020
[9]

Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies

Megan S. Schuler and Sherri Rose. “Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies”. In:American Journal of Epidemiology185 (2017), pp. 65–73.doi: https://doi.org/10.1093/aje/kww165

work page doi:10.1093/aje/kww165 2017
[10]

Targeted maximum likelihood estimation for a binary treatment: A tutorial

Miguel A. Luque-Fernandez et al. “Targeted maximum likelihood estimation for a binary treatment: A tutorial”. In:Statistics in Medicine37.16 (2018), pp. 2530–2546

work page 2018
[11]

Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review

Matthew J. Smith et al. “Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review”. In:Annals of Epidemiology86 (2023), 34–48.e28. issn: 1047-2797.doi:10.1016/j.annepidem.2023.06.004.url:https://www.sciencedirect. com/science/article/pii/S1047279723001151

work page doi:10.1016/j.annepidem.2023.06.004.url:https://www.sciencedirect 2023
[12]

van der Laan and Sherri Rose.Targeted Learning: Causal Inference for Observational and Experimental Data

Mark J. van der Laan and Sherri Rose.Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, 2011

work page 2011
[13]

Super Learner

Mark J. van der Laan, Eric C. Polley, and Alan E. Hubbard. “Super Learner”. In:Statistical Applications in Genetics and Molecular Biology6.1 (2007).doi:10.2202/1544-6115.1309

work page doi:10.2202/1544-6115.1309 2007
[14]

Handling missing data when estimating causal effects with targeted maximum likelihood estimation

S. G. Dashti et al. “Handling missing data when estimating causal effects with targeted maximum likelihood estimation”. In:arXiv2 (2021).doi:https://doi.org/10.48550/arXiv.2112.05274

work page doi:10.48550/arxiv.2112.05274 2021
[15]

Diagnosing and responding to violations in the positivity assumption

Mark L Petersen et al. “Diagnosing and responding to violations in the positivity assumption”. In: Statistical Methods in Medical Research21.1 (2012), pp. 31–54.doi:10.1177/0962280210386207

work page doi:10.1177/0962280210386207 2012
[16]

tmle: An R Package for Targeted Maximum Likelihood Estimation

Susan Gruber and Mark van der Laan. “tmle: An R Package for Targeted Maximum Likelihood Estimation”. In:Journal of Statistical Software51.13 (2012), pp. 1–35.doi:10.18637/jss.v051. i13.url:https://www.jstatsoft.org/v51/i13/

work page doi:10.18637/jss.v051 2012
[17]

Causal inference in case of near-violation of positivity: comparison of methods

Marc L´ eger et al. “Causal inference in case of near-violation of positivity: comparison of methods”. In:Biometrical Journal64 (2022), pp. 1389–1403.doi:10.1002/bimj.202000323

work page doi:10.1002/bimj.202000323 2022
[18]

Multiple-Imputation Inferences with Uncongenial Sources of Input

Xiao-Li Meng. “Multiple-Imputation Inferences with Uncongenial Sources of Input”. In:Statistical Science9.4 (1994), pp. 538–558.url:http://www.jstor.org/stable/2246252

work page arXiv 1994
[19]

Multiple imputation using chained equa- tions: Issues and guidance for practice

Ian R. White, Patrick Royston, and Angela M. Wood. “Multiple imputation using chained equa- tions: Issues and guidance for practice”. In:Statistics in Medicine30.4 (2011), pp. 377–399.doi: https://doi.org/10.1002/sim.4067

work page doi:10.1002/sim.4067 2011
[20]

Estimating treatment effects with partially observed covariates using out- come regression with missing indicators

Helen A. Blake et al. “Estimating treatment effects with partially observed covariates using out- come regression with missing indicators”. In:Biometrical Journal62 (2020), pp. 428–443.doi: https://doi.org/10.1002/bimj.201900041

work page doi:10.1002/bimj.201900041 2020
[21]

A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation

Roderick J. Little, James R. Carpenter, and Katherine J. Lee. “A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation”. In:Sociological Methods & Research53.3 (2024), pp. 1105–1135.doi:10. 1177/00491241221113873.url:https://doi.org/10.1177/00491241221113873

work page doi:10.1177/00491241221113873 2024
[22]

Recoverability and estimation of causal effects under typical multivariable miss- ingness mechanisms

J. Zhang et al. “Recoverability and estimation of causal effects under typical multivariable miss- ingness mechanisms”. In:Biometrical Journal66 (2024).doi:10.1002/bimj.202200326

work page doi:10.1002/bimj.202200326 2024
[23]

Toward a standardized evaluation of imputation methodol- ogy

Hanne I. Oberman and Gerko Vink. “Toward a standardized evaluation of imputation methodol- ogy”. In:Biometrical Journal66.1 (2024).doi:https://doi.org/10.1002/bimj.202200107. 21

work page doi:10.1002/bimj.202200107 2024
[24]

Evaluating the Robustness of Targeted Maximum Likelihood Estimators via Realistic Simulations in Nutrition Intervention Trials

Haodong Li et al. “Evaluating the Robustness of Targeted Maximum Likelihood Estimators via Realistic Simulations in Nutrition Intervention Trials”. In:Statistics in Medicine41.12 (2022).doi: https://doi.org/10.1002/sim.9348

work page doi:10.1002/sim.9348 2022
[25]

Effects of water quality, sanitation, handwashing, and nutritional interven- tions on diarrhoea and child growth in rural Bangladesh: a cluster randomised controlled trial

Stephen P Luby et al. “Effects of water quality, sanitation, handwashing, and nutritional interven- tions on diarrhoea and child growth in rural Bangladesh: a cluster randomised controlled trial”. In:The Lancet Global Health6.3 (2018), pp. 302–315.doi:https://doi.org/10.1016/S2214- 109X(17)30490-4

work page doi:10.1016/s2214- 2018
[26]

Causal Inference Using Potential Outcomes

Donald B. Rubin. “Causal Inference Using Potential Outcomes”. In:Journal of the American Statistical Association(2005).url:10.1198/016214504000001880

work page doi:10.1198/016214504000001880 2005
[27]

Targeted maximum likelihood estimation in safety analysis

Samuel D. Lendle, Bruce Fireman, and Mark J. van der Laan. “Targeted maximum likelihood estimation in safety analysis”. In:Journal of Clinical Epidemiology66 (2013), S91–S98

work page 2013
[28]

R package version 2.0-28

Eric Polley et al.SuperLearner: Super Learner Prediction. R package version 2.0-28. 2021.url: https://CRAN.R-project.org/package=SuperLearner

work page 2021
[29]

mice: Multivariate Imputation by Chained Equations in R

Stef van Buuren and Karin Groothuis-Oudshoorn. “mice: Multivariate Imputation by Chained Equations in R”. In:Journal of Statistical Software45.3 (2011), pp. 1–67.doi:10.18637/jss. v045.i03

work page doi:10.18637/jss 2011
[30]

Stef van Buuren.Flexible Imputation of Missing Data. Vol. 2. Chapman & Hall/CRC, 2018.url: https://stefvanbuuren.name/fimd/

work page 2018
[31]

The ”Why

Laura D’Agostino McGowan, Sarah C. Lotspeich, and Sarah A. Hepler. “The ”Why” behind including ”Y” in your imputation model”. In:Statistical Methods in Medical Research33.6 (2024), pp. 996–1020.doi:10.1177/09622802241244608

work page doi:10.1177/09622802241244608 2024
[32]

Appropriate inclusion of interactions was needed to avoid bias in multiple imputation

Kate Tilling et al. “Appropriate inclusion of interactions was needed to avoid bias in multiple imputation”. In:Journal of Clinical Epidemiology80 (2016), pp. 107–115.doi:https://doi. org/10.1016/j.jclinepi.2016.07.004

work page doi:10.1016/j.jclinepi.2016.07.004 2016
[33]

Amelia II: A Program for Missing Data

James Honaker, Gary King, and Matthew Blackwell. “Amelia II: A Program for Missing Data”. In:Journal of Statistical Software45.7 (2011), pp. 1–47.doi:10.18637/jss.v045.i07

work page doi:10.18637/jss.v045.i07 2011
[34]

What to do about missing values in time-series cross-section data?

James Honaker and Gary King. “What to do about missing values in time-series cross-section data?” In:American Journal of Political Science54 (2010), pp. 561–581

work page 2010
[35]

Joseph L Schafer.Analysis of Incomplete Multivariate Data. 1st. Chapman and Hall/CRC, 1997. doi:10.1201/9780367803025

work page doi:10.1201/9780367803025 1997
[36]

The Prognosis of Common Mental Disorders in Adolescents: A 14-Year Prospective Cohort Study

George C. Patton et al. “The Prognosis of Common Mental Disorders in Adolescents: A 14-Year Prospective Cohort Study”. In:Lancet383.9926 (2014), pp. 1404–1411.doi:10 . 1016 / S0140 - 6736(13)62116-9

work page 2014
[37]

R package version 1.1-2

Marius Hofert et al.copula: Multivariate Dependence with Copulas. R package version 1.1-2. 2023. url:https://CRAN.R-project.org/package=copula

work page 2023
[38]

Generating missing values for simulation purposes: a multivariate amputation procedure

R. M. Schouten, P. Lugtig, and G. Vink. “Generating missing values for simulation purposes: a multivariate amputation procedure”. In:Journal of Statistical Computation and Simulation88.15 (2018), pp. 2909–2930.doi:10.1080/00949655.2018.1491577

work page doi:10.1080/00949655.2018.1491577 2018
[39]

Do machine learning methods lead to similar individualized treat- ment rules? A comparison study on real data

TP Morris, IR White, and MJ. Crowther. “Using simulation studies to evaluate statistical meth- ods”. In:Statistics in Medicine38 (2019), pp. 2074–2102.doi:https://doi.org/10.1002/sim. 8086

work page doi:10.1002/sim 2019
[40]

The Highly Adaptive Lasso Estimator

David Benkeser and Mark J. van der Laan. “The Highly Adaptive Lasso Estimator”. In:Proc Int Conf Data Sci Adv Anal2016 (2016), pp. 689–696.doi:10.1109/DSAA.2016.93

work page doi:10.1109/dsaa.2016.93 2016
[41]

A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso

Mark J. van der Laan. “A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso”. In:International Journal of Biostatistics13.2 (2017), /j/ijb.2017.13.issue- 2/ijb-2015-0097/ijb-2015–0097.xml.doi:10.1515/ijb-2015-0097

work page doi:10.1515/ijb-2015-0097 2017
[42]

Efficient estimation of pathwise differen- tiable target parameters with the undersmoothed highly adaptive lasso

Mark J van der Laan, David Benkeser, and Wenjing Cai. “Efficient estimation of pathwise differen- tiable target parameters with the undersmoothed highly adaptive lasso”. In:International Journal of Biostatistics19.1 (2022), pp. 261–289.doi:10.1515/ijb-2019-0092. 22

work page doi:10.1515/ijb-2019-0092 2022
[43]

Performance of Cross-Validated Targeted Maximum Likelihood Estimation

M. J. Smith et al. “Performance of Cross-Validated Targeted Maximum Likelihood Estimation”. In:Statistics in Medicine44.15–17 (2025), e70185.doi:https://doi.org/10.1002/sim.70185

work page doi:10.1002/sim.70185 2025
[44]

Missing covariate data in clinical research: When and when not to use the missing-indicator method for analysis

Rolf Groenwold et al. “Missing covariate data in clinical research: When and when not to use the missing-indicator method for analysis”. In:CMAJ : Canadian Medical Association journal = journal de l’Association medicale canadienne184 (2012), pp. 1265–9.doi:10.1503/cmaj.110977

work page doi:10.1503/cmaj.110977 2012
[45]

Using Causal Diagrams to Guide Analysis in Missing Data Problems

Rhian M. Daniel et al. “Using Causal Diagrams to Guide Analysis in Missing Data Problems”. In: Statistical Methods in Medical Research21.3 (2012), pp. 243–256.doi:10.1177/0962280210394469

work page doi:10.1177/0962280210394469 2012
[46]

Introduction to Double Robust Methods for Incomplete Data

Shaun R. Seaman and Stijn Vansteelandt. “Introduction to Double Robust Methods for Incomplete Data”. In:Statistical Science33.2 (2018), pp. 184–197.url:https://www.jstor.org/stable/ 26770990

work page 2018
[47]

Multiple Imputation: A Review of Practical and Theoretical Findings

Jared S. Murray. “Multiple Imputation: A Review of Practical and Theoretical Findings”. In: Statistical Science33.2 (2018), pp. 142–159.doi:https://doi.org/10.1214/18-STS644

work page doi:10.1214/18-sts644 2018
[48]

A fair comparison of tree-based and parametric methods in multiple imputation by chained equations

Emily Slade and Melissa G. Naylor. “A fair comparison of tree-based and parametric methods in multiple imputation by chained equations”. In:Statistics in Medicine39.8 (2020), pp. 1156–1166. doi:https://doi.org/10.1002/sim.8468

work page doi:10.1002/sim.8468 2020
[49]

Recursive partitioning for missing data imputation in the presence of interaction effects

L.L. Doove, S. Van Buuren, and E. Dusseldorp. “Recursive partitioning for missing data imputation in the presence of interaction effects”. In:Computational Statistics and Data Analysis72 (2014), pp. 92–104.doi:https://doi.org/10.1016/j.csda.2013.10.025

work page doi:10.1016/j.csda.2013.10.025 2014
[50]

Abacavir, zidovudine, or stavudine as paediatric tablets for African HIV- infected children (CHAPAS-3): an open-label, parallel-group, randomised controlled trial

Veronica Mulenga et al. “Abacavir, zidovudine, or stavudine as paediatric tablets for African HIV- infected children (CHAPAS-3): an open-label, parallel-group, randomised controlled trial”. In:The Lancet Infectious diseases16.2 (2016), pp. 169–79

work page 2016
[51]

Plasma Efavirenz Exposure, Sex, and Age Predict Virological Response in HIV-Infected African Children

Andrzej Bienczak et al. “Plasma Efavirenz Exposure, Sex, and Age Predict Virological Response in HIV-Infected African Children”. In:Journal of acquired immune deficiency syndromes73.2 (2016), pp. 161–8

work page 2016
[52]

Determining Targets for Antiretroviral Drug Concentrations: a Causal Frame- work Illustrated with Pediatric Efavirenz Data from the CHAPAS-3 Trial

M. Schomaker et al. “Determining Targets for Antiretroviral Drug Concentrations: a Causal Frame- work Illustrated with Pediatric Efavirenz Data from the CHAPAS-3 Trial”. In:Pharmacoepidemi- ology and Drug Safety33 (2024), e70051

work page 2024
[53]

R Foundation for Sta- tistical Computing

R Core Team.R: A Language and Environment for Statistical Computing. R Foundation for Sta- tistical Computing. Vienna, Austria, 2023.url:https://www.R-project.org/

work page 2023
[54]

R package version 0.4.6

Jeremy R Coyle et al.hal9001: The scalable highly adaptive lasso. R package version 0.4.6. 2023. doi:10.5281/zenodo.3558313.url:https://github.com/tlverse/hal9001

work page doi:10.5281/zenodo.3558313.url:https://github.com/tlverse/hal9001 2023
[55]

URLhttps://doi.org/10

Nima S Hejazi, Jeremy R Coyle, and Mark J van der Laan. “hal9001: Scalable highly adaptive lasso regression in R”. In:Journal of Open Source Software(2020).doi:10.21105/joss.02526. url:https://doi.org/10.21105/joss.02526. 23 24 A Results Figure 4: Model-based simulation: Coverage in ATE estimation using different missing data methods combined with TMLE an...

work page doi:10.21105/joss.02526 2020
[56]

Generate samples from the Gaussian copula: (U1, U2, U3, U4, U5, U6)∼ C Gaussian(ρ) with the correlation matrixρ: ρ=   1 0.3−0.3 0.3 0.3−0.3 0.3 1 0.7 0.3 0.3 0.3 −0.3 0.7 1 0.3 0.7 0.3 0.3 0.3 0.3 1 0.7 0.3 0.3 0.3 0.7 0.7 1−0.3 −0.3 0.3 0.3 0.3−0.3 1  

work page
[57]

The probabilities of each category depend onBand are created using a softmax function: P(W 3 =i) = eγ0i+γ1i ·B P4 j=1 eγ0j +γ1j ·B ,fori, j= 1, ...,4

Transform copula samples into the desired variablesW 1 toW 6: •B∼ N(0,1) •W 1 =I(U 1 >logit −1(a0)) •W 2 =I(U 2 >logit −1(β0 +β 1B)) •W 3 = Categorical(pi) is a categorical variable with four categories. The probabilities of each category depend onBand are created using a softmax function: P(W 3 =i) = eγ0i+γ1i ·B P4 j=1 eγ0j +γ1j ·B ,fori, j= 1, ...,4. 31...

work page
[58]

Generate exposureAand outcomeY: Exposure A was modeled via regression on B and W incor- porating two-way confounder-confounder interactions and the outcomeYwas generated through regression on A,W involving two-, three-, and four-way confounder-confounder interactions: A∼Binomial (1, p), p= logit −1(η0 +η 1W1 +η 2W2 +η 3W3 +η 4W4 +η 5W5 +η 6W6 +η 7B +η 10W...

work page

[1] [1]

Wiley, 2019.doi:10.1002/9781119482260

Roderick Little and Donald Rubin.Statistical Analysis with Missing Data, Third Edition. Wiley, 2019.doi:10.1002/9781119482260

work page doi:10.1002/9781119482260 2019

[2] [2]

A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation

Roderick J. Little, James R. Carpenter, and Katherine J. Lee. “A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation”. In:Sociological Methods & Research53.3 (2024), pp. 1105–1135.doi:10. 1177/00491241221113873

work page 2024

[3] [3]

Graphical models for inference with missing data

Karthika Mohan, Pearl Judea, and Tian Jin. “Graphical models for inference with missing data”. In:Advances in Neural Information Processing Systems26 (2013), pp. 1277–1285

work page 2013

[4] [4]

Graphical models for processing missing data

Karthika Mohan and Judea Pearl. “Graphical models for processing missing data”. In:Journal of the American Statistical Association116.534 (2021), pp. 1023–1037.doi:10.1080/01621459. 2021.1874961.url:https://doi.org/10.1080/01621459.2021.1874961. 20

work page doi:10.1080/01621459 2021

[5] [5]

Canonical Causal Diagrams to Guide the Treatment of Missing Data in Epidemiologic Studies

Margarita Moreno-Betancur et al. “Canonical Causal Diagrams to Guide the Treatment of Missing Data in Epidemiologic Studies”. In:American Journal of Epidemiology187.12 (2018), pp. 2705– 2715.doi:10.1093/aje/kwy173

work page doi:10.1093/aje/kwy173 2018

[6] [6]

Recoverability of Causal Effects under Presence of Missing Data: A Longitudinal Case Study

Anastasiia Holovchak et al. “Recoverability of Causal Effects under Presence of Missing Data: A Longitudinal Case Study”. In:Biostatistics26.1 (2025), kxae044.doi:10.1093/biostatistics/ kxae044

work page doi:10.1093/biostatistics/ 2025

[7] [7]

Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification

Katherine J Lee et al. “Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification”. In:International Journal of Epidemiology52.4 (2023), pp. 1268–1275.url:https://doi.org/10.1093/ije/dyad008

work page doi:10.1093/ije/dyad008 2023

[8] [8]

Hern´ an and James M

Miguel A. Hern´ an and James M. Robins.Causal Inference: What If. Chapman and Hall/CRC, 2020

work page 2020

[9] [9]

Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies

Megan S. Schuler and Sherri Rose. “Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies”. In:American Journal of Epidemiology185 (2017), pp. 65–73.doi: https://doi.org/10.1093/aje/kww165

work page doi:10.1093/aje/kww165 2017

[10] [10]

Targeted maximum likelihood estimation for a binary treatment: A tutorial

Miguel A. Luque-Fernandez et al. “Targeted maximum likelihood estimation for a binary treatment: A tutorial”. In:Statistics in Medicine37.16 (2018), pp. 2530–2546

work page 2018

[11] [11]

Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review

Matthew J. Smith et al. “Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review”. In:Annals of Epidemiology86 (2023), 34–48.e28. issn: 1047-2797.doi:10.1016/j.annepidem.2023.06.004.url:https://www.sciencedirect. com/science/article/pii/S1047279723001151

work page doi:10.1016/j.annepidem.2023.06.004.url:https://www.sciencedirect 2023

[12] [12]

van der Laan and Sherri Rose.Targeted Learning: Causal Inference for Observational and Experimental Data

Mark J. van der Laan and Sherri Rose.Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, 2011

work page 2011

[13] [13]

Super Learner

Mark J. van der Laan, Eric C. Polley, and Alan E. Hubbard. “Super Learner”. In:Statistical Applications in Genetics and Molecular Biology6.1 (2007).doi:10.2202/1544-6115.1309

work page doi:10.2202/1544-6115.1309 2007

[14] [14]

Handling missing data when estimating causal effects with targeted maximum likelihood estimation

S. G. Dashti et al. “Handling missing data when estimating causal effects with targeted maximum likelihood estimation”. In:arXiv2 (2021).doi:https://doi.org/10.48550/arXiv.2112.05274

work page doi:10.48550/arxiv.2112.05274 2021

[15] [15]

Diagnosing and responding to violations in the positivity assumption

Mark L Petersen et al. “Diagnosing and responding to violations in the positivity assumption”. In: Statistical Methods in Medical Research21.1 (2012), pp. 31–54.doi:10.1177/0962280210386207

work page doi:10.1177/0962280210386207 2012

[16] [16]

tmle: An R Package for Targeted Maximum Likelihood Estimation

Susan Gruber and Mark van der Laan. “tmle: An R Package for Targeted Maximum Likelihood Estimation”. In:Journal of Statistical Software51.13 (2012), pp. 1–35.doi:10.18637/jss.v051. i13.url:https://www.jstatsoft.org/v51/i13/

work page doi:10.18637/jss.v051 2012

[17] [17]

Causal inference in case of near-violation of positivity: comparison of methods

Marc L´ eger et al. “Causal inference in case of near-violation of positivity: comparison of methods”. In:Biometrical Journal64 (2022), pp. 1389–1403.doi:10.1002/bimj.202000323

work page doi:10.1002/bimj.202000323 2022

[18] [18]

Multiple-Imputation Inferences with Uncongenial Sources of Input

Xiao-Li Meng. “Multiple-Imputation Inferences with Uncongenial Sources of Input”. In:Statistical Science9.4 (1994), pp. 538–558.url:http://www.jstor.org/stable/2246252

work page arXiv 1994

[19] [19]

Multiple imputation using chained equa- tions: Issues and guidance for practice

Ian R. White, Patrick Royston, and Angela M. Wood. “Multiple imputation using chained equa- tions: Issues and guidance for practice”. In:Statistics in Medicine30.4 (2011), pp. 377–399.doi: https://doi.org/10.1002/sim.4067

work page doi:10.1002/sim.4067 2011

[20] [20]

Estimating treatment effects with partially observed covariates using out- come regression with missing indicators

Helen A. Blake et al. “Estimating treatment effects with partially observed covariates using out- come regression with missing indicators”. In:Biometrical Journal62 (2020), pp. 428–443.doi: https://doi.org/10.1002/bimj.201900041

work page doi:10.1002/bimj.201900041 2020

[21] [21]

A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation

Roderick J. Little, James R. Carpenter, and Katherine J. Lee. “A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation”. In:Sociological Methods & Research53.3 (2024), pp. 1105–1135.doi:10. 1177/00491241221113873.url:https://doi.org/10.1177/00491241221113873

work page doi:10.1177/00491241221113873 2024

[22] [22]

Recoverability and estimation of causal effects under typical multivariable miss- ingness mechanisms

J. Zhang et al. “Recoverability and estimation of causal effects under typical multivariable miss- ingness mechanisms”. In:Biometrical Journal66 (2024).doi:10.1002/bimj.202200326

work page doi:10.1002/bimj.202200326 2024

[23] [23]

Toward a standardized evaluation of imputation methodol- ogy

Hanne I. Oberman and Gerko Vink. “Toward a standardized evaluation of imputation methodol- ogy”. In:Biometrical Journal66.1 (2024).doi:https://doi.org/10.1002/bimj.202200107. 21

work page doi:10.1002/bimj.202200107 2024

[24] [24]

Evaluating the Robustness of Targeted Maximum Likelihood Estimators via Realistic Simulations in Nutrition Intervention Trials

Haodong Li et al. “Evaluating the Robustness of Targeted Maximum Likelihood Estimators via Realistic Simulations in Nutrition Intervention Trials”. In:Statistics in Medicine41.12 (2022).doi: https://doi.org/10.1002/sim.9348

work page doi:10.1002/sim.9348 2022

[25] [25]

Effects of water quality, sanitation, handwashing, and nutritional interven- tions on diarrhoea and child growth in rural Bangladesh: a cluster randomised controlled trial

Stephen P Luby et al. “Effects of water quality, sanitation, handwashing, and nutritional interven- tions on diarrhoea and child growth in rural Bangladesh: a cluster randomised controlled trial”. In:The Lancet Global Health6.3 (2018), pp. 302–315.doi:https://doi.org/10.1016/S2214- 109X(17)30490-4

work page doi:10.1016/s2214- 2018

[26] [26]

Causal Inference Using Potential Outcomes

Donald B. Rubin. “Causal Inference Using Potential Outcomes”. In:Journal of the American Statistical Association(2005).url:10.1198/016214504000001880

work page doi:10.1198/016214504000001880 2005

[27] [27]

Targeted maximum likelihood estimation in safety analysis

Samuel D. Lendle, Bruce Fireman, and Mark J. van der Laan. “Targeted maximum likelihood estimation in safety analysis”. In:Journal of Clinical Epidemiology66 (2013), S91–S98

work page 2013

[28] [28]

R package version 2.0-28

Eric Polley et al.SuperLearner: Super Learner Prediction. R package version 2.0-28. 2021.url: https://CRAN.R-project.org/package=SuperLearner

work page 2021

[29] [29]

mice: Multivariate Imputation by Chained Equations in R

Stef van Buuren and Karin Groothuis-Oudshoorn. “mice: Multivariate Imputation by Chained Equations in R”. In:Journal of Statistical Software45.3 (2011), pp. 1–67.doi:10.18637/jss. v045.i03

work page doi:10.18637/jss 2011

[30] [30]

Stef van Buuren.Flexible Imputation of Missing Data. Vol. 2. Chapman & Hall/CRC, 2018.url: https://stefvanbuuren.name/fimd/

work page 2018

[31] [31]

The ”Why

Laura D’Agostino McGowan, Sarah C. Lotspeich, and Sarah A. Hepler. “The ”Why” behind including ”Y” in your imputation model”. In:Statistical Methods in Medical Research33.6 (2024), pp. 996–1020.doi:10.1177/09622802241244608

work page doi:10.1177/09622802241244608 2024

[32] [32]

Appropriate inclusion of interactions was needed to avoid bias in multiple imputation

Kate Tilling et al. “Appropriate inclusion of interactions was needed to avoid bias in multiple imputation”. In:Journal of Clinical Epidemiology80 (2016), pp. 107–115.doi:https://doi. org/10.1016/j.jclinepi.2016.07.004

work page doi:10.1016/j.jclinepi.2016.07.004 2016

[33] [33]

Amelia II: A Program for Missing Data

James Honaker, Gary King, and Matthew Blackwell. “Amelia II: A Program for Missing Data”. In:Journal of Statistical Software45.7 (2011), pp. 1–47.doi:10.18637/jss.v045.i07

work page doi:10.18637/jss.v045.i07 2011

[34] [34]

What to do about missing values in time-series cross-section data?

James Honaker and Gary King. “What to do about missing values in time-series cross-section data?” In:American Journal of Political Science54 (2010), pp. 561–581

work page 2010

[35] [35]

Joseph L Schafer.Analysis of Incomplete Multivariate Data. 1st. Chapman and Hall/CRC, 1997. doi:10.1201/9780367803025

work page doi:10.1201/9780367803025 1997

[36] [36]

The Prognosis of Common Mental Disorders in Adolescents: A 14-Year Prospective Cohort Study

George C. Patton et al. “The Prognosis of Common Mental Disorders in Adolescents: A 14-Year Prospective Cohort Study”. In:Lancet383.9926 (2014), pp. 1404–1411.doi:10 . 1016 / S0140 - 6736(13)62116-9

work page 2014

[37] [37]

R package version 1.1-2

Marius Hofert et al.copula: Multivariate Dependence with Copulas. R package version 1.1-2. 2023. url:https://CRAN.R-project.org/package=copula

work page 2023

[38] [38]

Generating missing values for simulation purposes: a multivariate amputation procedure

R. M. Schouten, P. Lugtig, and G. Vink. “Generating missing values for simulation purposes: a multivariate amputation procedure”. In:Journal of Statistical Computation and Simulation88.15 (2018), pp. 2909–2930.doi:10.1080/00949655.2018.1491577

work page doi:10.1080/00949655.2018.1491577 2018

[39] [39]

Do machine learning methods lead to similar individualized treat- ment rules? A comparison study on real data

TP Morris, IR White, and MJ. Crowther. “Using simulation studies to evaluate statistical meth- ods”. In:Statistics in Medicine38 (2019), pp. 2074–2102.doi:https://doi.org/10.1002/sim. 8086

work page doi:10.1002/sim 2019

[40] [40]

The Highly Adaptive Lasso Estimator

David Benkeser and Mark J. van der Laan. “The Highly Adaptive Lasso Estimator”. In:Proc Int Conf Data Sci Adv Anal2016 (2016), pp. 689–696.doi:10.1109/DSAA.2016.93

work page doi:10.1109/dsaa.2016.93 2016

[41] [41]

A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso

Mark J. van der Laan. “A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso”. In:International Journal of Biostatistics13.2 (2017), /j/ijb.2017.13.issue- 2/ijb-2015-0097/ijb-2015–0097.xml.doi:10.1515/ijb-2015-0097

work page doi:10.1515/ijb-2015-0097 2017

[42] [42]

Efficient estimation of pathwise differen- tiable target parameters with the undersmoothed highly adaptive lasso

Mark J van der Laan, David Benkeser, and Wenjing Cai. “Efficient estimation of pathwise differen- tiable target parameters with the undersmoothed highly adaptive lasso”. In:International Journal of Biostatistics19.1 (2022), pp. 261–289.doi:10.1515/ijb-2019-0092. 22

work page doi:10.1515/ijb-2019-0092 2022

[43] [43]

Performance of Cross-Validated Targeted Maximum Likelihood Estimation

M. J. Smith et al. “Performance of Cross-Validated Targeted Maximum Likelihood Estimation”. In:Statistics in Medicine44.15–17 (2025), e70185.doi:https://doi.org/10.1002/sim.70185

work page doi:10.1002/sim.70185 2025

[44] [44]

Missing covariate data in clinical research: When and when not to use the missing-indicator method for analysis

Rolf Groenwold et al. “Missing covariate data in clinical research: When and when not to use the missing-indicator method for analysis”. In:CMAJ : Canadian Medical Association journal = journal de l’Association medicale canadienne184 (2012), pp. 1265–9.doi:10.1503/cmaj.110977

work page doi:10.1503/cmaj.110977 2012

[45] [45]

Using Causal Diagrams to Guide Analysis in Missing Data Problems

Rhian M. Daniel et al. “Using Causal Diagrams to Guide Analysis in Missing Data Problems”. In: Statistical Methods in Medical Research21.3 (2012), pp. 243–256.doi:10.1177/0962280210394469

work page doi:10.1177/0962280210394469 2012

[46] [46]

Introduction to Double Robust Methods for Incomplete Data

Shaun R. Seaman and Stijn Vansteelandt. “Introduction to Double Robust Methods for Incomplete Data”. In:Statistical Science33.2 (2018), pp. 184–197.url:https://www.jstor.org/stable/ 26770990

work page 2018

[47] [47]

Multiple Imputation: A Review of Practical and Theoretical Findings

Jared S. Murray. “Multiple Imputation: A Review of Practical and Theoretical Findings”. In: Statistical Science33.2 (2018), pp. 142–159.doi:https://doi.org/10.1214/18-STS644

work page doi:10.1214/18-sts644 2018

[48] [48]

A fair comparison of tree-based and parametric methods in multiple imputation by chained equations

Emily Slade and Melissa G. Naylor. “A fair comparison of tree-based and parametric methods in multiple imputation by chained equations”. In:Statistics in Medicine39.8 (2020), pp. 1156–1166. doi:https://doi.org/10.1002/sim.8468

work page doi:10.1002/sim.8468 2020

[49] [49]

Recursive partitioning for missing data imputation in the presence of interaction effects

L.L. Doove, S. Van Buuren, and E. Dusseldorp. “Recursive partitioning for missing data imputation in the presence of interaction effects”. In:Computational Statistics and Data Analysis72 (2014), pp. 92–104.doi:https://doi.org/10.1016/j.csda.2013.10.025

work page doi:10.1016/j.csda.2013.10.025 2014

[50] [50]

Abacavir, zidovudine, or stavudine as paediatric tablets for African HIV- infected children (CHAPAS-3): an open-label, parallel-group, randomised controlled trial

Veronica Mulenga et al. “Abacavir, zidovudine, or stavudine as paediatric tablets for African HIV- infected children (CHAPAS-3): an open-label, parallel-group, randomised controlled trial”. In:The Lancet Infectious diseases16.2 (2016), pp. 169–79

work page 2016

[51] [51]

Plasma Efavirenz Exposure, Sex, and Age Predict Virological Response in HIV-Infected African Children

Andrzej Bienczak et al. “Plasma Efavirenz Exposure, Sex, and Age Predict Virological Response in HIV-Infected African Children”. In:Journal of acquired immune deficiency syndromes73.2 (2016), pp. 161–8

work page 2016

[52] [52]

Determining Targets for Antiretroviral Drug Concentrations: a Causal Frame- work Illustrated with Pediatric Efavirenz Data from the CHAPAS-3 Trial

M. Schomaker et al. “Determining Targets for Antiretroviral Drug Concentrations: a Causal Frame- work Illustrated with Pediatric Efavirenz Data from the CHAPAS-3 Trial”. In:Pharmacoepidemi- ology and Drug Safety33 (2024), e70051

work page 2024

[53] [53]

R Foundation for Sta- tistical Computing

R Core Team.R: A Language and Environment for Statistical Computing. R Foundation for Sta- tistical Computing. Vienna, Austria, 2023.url:https://www.R-project.org/

work page 2023

[54] [54]

R package version 0.4.6

Jeremy R Coyle et al.hal9001: The scalable highly adaptive lasso. R package version 0.4.6. 2023. doi:10.5281/zenodo.3558313.url:https://github.com/tlverse/hal9001

work page doi:10.5281/zenodo.3558313.url:https://github.com/tlverse/hal9001 2023

[55] [55]

URLhttps://doi.org/10

Nima S Hejazi, Jeremy R Coyle, and Mark J van der Laan. “hal9001: Scalable highly adaptive lasso regression in R”. In:Journal of Open Source Software(2020).doi:10.21105/joss.02526. url:https://doi.org/10.21105/joss.02526. 23 24 A Results Figure 4: Model-based simulation: Coverage in ATE estimation using different missing data methods combined with TMLE an...

work page doi:10.21105/joss.02526 2020

[56] [56]

Generate samples from the Gaussian copula: (U1, U2, U3, U4, U5, U6)∼ C Gaussian(ρ) with the correlation matrixρ: ρ=   1 0.3−0.3 0.3 0.3−0.3 0.3 1 0.7 0.3 0.3 0.3 −0.3 0.7 1 0.3 0.7 0.3 0.3 0.3 0.3 1 0.7 0.3 0.3 0.3 0.7 0.7 1−0.3 −0.3 0.3 0.3 0.3−0.3 1  

work page

[57] [57]

The probabilities of each category depend onBand are created using a softmax function: P(W 3 =i) = eγ0i+γ1i ·B P4 j=1 eγ0j +γ1j ·B ,fori, j= 1, ...,4

Transform copula samples into the desired variablesW 1 toW 6: •B∼ N(0,1) •W 1 =I(U 1 >logit −1(a0)) •W 2 =I(U 2 >logit −1(β0 +β 1B)) •W 3 = Categorical(pi) is a categorical variable with four categories. The probabilities of each category depend onBand are created using a softmax function: P(W 3 =i) = eγ0i+γ1i ·B P4 j=1 eγ0j +γ1j ·B ,fori, j= 1, ...,4. 31...

work page

[58] [58]

Generate exposureAand outcomeY: Exposure A was modeled via regression on B and W incor- porating two-way confounder-confounder interactions and the outcomeYwas generated through regression on A,W involving two-, three-, and four-way confounder-confounder interactions: A∼Binomial (1, p), p= logit −1(η0 +η 1W1 +η 2W2 +η 3W3 +η 4W4 +η 5W5 +η 6W6 +η 7B +η 10W...

work page