Causal Effect Estimation with TMLE: Handling Missing Data and Near-Violations of Positivity
Pith reviewed 2026-05-18 04:31 UTC · model grok-4.3
The pith
Complete-case TMLE that models outcome missingness reduces bias more than multiple imputation when estimating causal effects under missing data and near-positivity violations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When targeted maximum likelihood estimation is paired with complete-case analysis that incorporates an outcome-missingness model, the resulting average treatment effect estimates exhibit lower bias and greater robustness to near-positivity violations than estimates obtained from any of the multiple-imputation strategies examined, whether those imputations rely on parametric regressions or on classification and regression trees.
What carries the argument
Complete-case TMLE augmented by an explicit model for the probability that the outcome is observed, which uses the observed data to correct for selection while retaining the full sample structure for treatment and covariate relations.
If this is right
- Non-multiple-imputation approaches, especially the outcome-missingness-adjusted complete-case version, are preferred when bias minimization is the primary goal.
- Multiple imputation using classification and regression trees yields lower root mean squared error and maintains nominal coverage more reliably than other imputation variants.
- Trade-offs between bias and interval coverage should guide method choice depending on whether point estimation or uncertainty quantification is prioritized.
- The relative robustness of the recommended non-MI strategy persists across both model-based and design-based simulation settings that include not-at-random missingness.
Where Pith is reading between the lines
- The same complete-case adjustment could be tested inside other doubly robust estimators such as augmented inverse-probability weighting to see whether the bias reduction generalizes beyond TMLE.
- In studies with time-varying exposures, embedding missingness models for each time point might preserve the robustness property observed here.
- When positivity violations are suspected, a preliminary check of the estimated propensity-score distribution could flag whether the outcome-missingness adjustment is likely to deliver the reported stability.
- Applied analysts could embed the recommended procedure in sensitivity analyses that vary the assumed missingness mechanism to quantify how much the point estimate moves.
Load-bearing premise
The five missingness-directed acyclic graphs together with the undersmoothed highly adaptive lasso design-based simulation on the WASH Benefits Bangladesh data set accurately capture the missing-data patterns and positivity challenges that arise in typical one-point exposure epidemiological studies.
What would settle it
Apply the same eight methods to a randomized trial with known true average treatment effect, artificially introduce missingness according to one of the paper's DAGs, and check whether the complete-case TMLE with outcome-missingness model still shows the smallest bias.
Figures
read the original abstract
We evaluate the performance of targeted maximum likelihood estimation (TMLE) for estimating the average treatment effect in missing data scenarios under varying levels of positivity violations. We employ model- and design-based simulations, with the latter using undersmoothed highly adaptive lasso on the 'WASH Benefits Bangladesh' dataset to mimic real-world complexities. Five missingness-directed acyclic graphs are considered, capturing common missing data mechanisms in epidemiological research, particularly in one-point exposure studies. These mechanisms include also not-at-random missingness in the exposure, outcome, and confounders. We compare eight missing data methods in conjunction with TMLE as the analysis method, distinguishing between non-multiple imputation (non-MI) and multiple imputation (MI) approaches. The MI approaches use both parametric and machine-learning models. Results show that non-MI methods, particularly complete cases with TMLE incorporating an outcome-missingness model, exhibit lower bias compared to all other evaluated missing data methods and greater robustness against positivity violations across. In Comparison MI with classification and regression trees (CART) achieve lower root mean squared error, while often maintaining nominal coverage rates. Our findings highlight the trade-offs between bias and coverage, and we recommend using complete cases with TMLE incorporating an outcome-missingness model for bias reduction and MI CART when accurate confidence intervals are the priority.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates targeted maximum likelihood estimation (TMLE) for average treatment effect estimation under missing data and near-positivity violations. It employs both model-based and design-based simulations, the latter using undersmoothed highly adaptive lasso on the WASH Benefits Bangladesh dataset, along with five missingness DAGs that include MNAR mechanisms for exposure, outcome, and confounders. Eight missing-data methods (non-MI and MI, with parametric and ML variants) are compared when paired with TMLE; the central claim is that non-MI complete-case analysis with an explicit outcome-missingness model yields the lowest bias and greatest robustness to positivity violations, while MI with CART achieves lower RMSE and maintains nominal coverage.
Significance. If the simulation results hold under broader conditions, the work supplies concrete, actionable guidance on bias-coverage trade-offs when applying TMLE to incomplete epidemiological data with positivity concerns, a setting where practitioners routinely face these issues.
major comments (2)
- [Design-based simulation] Design-based simulation (described in the methods for the WASH Benefits analysis): the decision to undersmooth HAL and then impose the five fixed missingness mechanisms does not include a sensitivity check on the tail quantiles of g(A|W); because near-positivity violations are driven precisely by those tails, the reported ordering of bias and robustness may be an artifact of the chosen DGP rather than a general property of the estimators.
- [Results] Results tables comparing bias across the eight methods: the claim that complete-case TMLE with outcome-missingness model exhibits 'lower bias' and 'greater robustness' is presented as the headline finding, yet no formal comparison (e.g., paired t-tests or bootstrap intervals on the Monte Carlo bias differences) is reported; without this, it is impossible to judge whether the observed advantage exceeds simulation noise.
minor comments (3)
- [Abstract] Abstract: the final sentence is truncated ('across.'); it should read 'across scenarios' or equivalent.
- [Methods] Notation: the manuscript introduces 'outcome-missingness model' without an explicit equation or diagram showing how this model enters the TMLE targeting step; a short display equation would remove ambiguity.
- [Figures] Figure captions: the DAGs in the five missingness scenarios would benefit from explicit node labels (A, Y, W, R) and a legend distinguishing observed versus missing arrows.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address each major comment below and describe the changes we will make to the manuscript.
read point-by-point responses
-
Referee: [Design-based simulation] Design-based simulation (described in the methods for the WASH Benefits analysis): the decision to undersmooth HAL and then impose the five fixed missingness mechanisms does not include a sensitivity check on the tail quantiles of g(A|W); because near-positivity violations are driven precisely by those tails, the reported ordering of bias and robustness may be an artifact of the chosen DGP rather than a general property of the estimators.
Authors: We appreciate the referee's point that near-positivity violations are driven by the tails of g(A|W) and that a sensitivity check would strengthen the design-based results. Our choice of undersmoothed HAL was intended to retain the empirical distribution and tail behavior observed in the WASH Benefits data rather than impose an artificial DGP. The five missingness mechanisms are then applied to this fixed empirical structure. Nevertheless, we agree that additional checks are warranted. In the revision we will add a sensitivity analysis that varies the undersmoothing parameter of HAL and reports the resulting changes in the tail quantiles of the estimated propensity scores, together with the corresponding performance metrics for the leading methods. revision: yes
-
Referee: [Results] Results tables comparing bias across the eight methods: the claim that complete-case TMLE with outcome-missingness model exhibits 'lower bias' and 'greater robustness' is presented as the headline finding, yet no formal comparison (e.g., paired t-tests or bootstrap intervals on the Monte Carlo bias differences) is reported; without this, it is impossible to judge whether the observed advantage exceeds simulation noise.
Authors: We agree that formal assessment of whether the observed bias differences exceed Monte Carlo error would improve the credibility of the headline claim. In the revised manuscript we will report Monte Carlo standard errors for all bias estimates and add bootstrap intervals (or paired t-tests) on the differences in bias between the complete-case TMLE with outcome-missingness model and the other seven methods. These additions will allow readers to evaluate whether the reported advantages are statistically distinguishable from simulation noise. revision: yes
Circularity Check
No circularity: empirical simulation results are independent of fitted inputs
full rationale
The paper evaluates TMLE performance under missing data and positivity violations exclusively through model- and design-based simulations on specified DAGs and the WASH Benefits dataset. Performance metrics (bias, RMSE, coverage) are computed directly from applying the estimators to generated data; these quantities do not reduce by any equation or self-citation to previously fitted parameters. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the reported chain. The simulation design is externally specified and falsifiable, making the findings self-contained against the chosen benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Five missingness-directed acyclic graphs capture common missing data mechanisms in epidemiological research, particularly in one-point exposure studies.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We compare eight missing data methods in conjunction with TMLE... non-MI methods, particularly complete cases with TMLE incorporating an outcome-missingness model, exhibit lower bias... MI with CART achieve lower root mean squared error
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Five missingness-directed acyclic graphs... recoverability of the ATE
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Wiley, 2019.doi:10.1002/9781119482260
Roderick Little and Donald Rubin.Statistical Analysis with Missing Data, Third Edition. Wiley, 2019.doi:10.1002/9781119482260
-
[2]
Roderick J. Little, James R. Carpenter, and Katherine J. Lee. “A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation”. In:Sociological Methods & Research53.3 (2024), pp. 1105–1135.doi:10. 1177/00491241221113873
work page 2024
-
[3]
Graphical models for inference with missing data
Karthika Mohan, Pearl Judea, and Tian Jin. “Graphical models for inference with missing data”. In:Advances in Neural Information Processing Systems26 (2013), pp. 1277–1285
work page 2013
-
[4]
Graphical models for processing missing data
Karthika Mohan and Judea Pearl. “Graphical models for processing missing data”. In:Journal of the American Statistical Association116.534 (2021), pp. 1023–1037.doi:10.1080/01621459. 2021.1874961.url:https://doi.org/10.1080/01621459.2021.1874961. 20
-
[5]
Canonical Causal Diagrams to Guide the Treatment of Missing Data in Epidemiologic Studies
Margarita Moreno-Betancur et al. “Canonical Causal Diagrams to Guide the Treatment of Missing Data in Epidemiologic Studies”. In:American Journal of Epidemiology187.12 (2018), pp. 2705– 2715.doi:10.1093/aje/kwy173
-
[6]
Recoverability of Causal Effects under Presence of Missing Data: A Longitudinal Case Study
Anastasiia Holovchak et al. “Recoverability of Causal Effects under Presence of Missing Data: A Longitudinal Case Study”. In:Biostatistics26.1 (2025), kxae044.doi:10.1093/biostatistics/ kxae044
-
[7]
Katherine J Lee et al. “Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification”. In:International Journal of Epidemiology52.4 (2023), pp. 1268–1275.url:https://doi.org/10.1093/ije/dyad008
-
[8]
Miguel A. Hern´ an and James M. Robins.Causal Inference: What If. Chapman and Hall/CRC, 2020
work page 2020
-
[9]
Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies
Megan S. Schuler and Sherri Rose. “Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies”. In:American Journal of Epidemiology185 (2017), pp. 65–73.doi: https://doi.org/10.1093/aje/kww165
-
[10]
Targeted maximum likelihood estimation for a binary treatment: A tutorial
Miguel A. Luque-Fernandez et al. “Targeted maximum likelihood estimation for a binary treatment: A tutorial”. In:Statistics in Medicine37.16 (2018), pp. 2530–2546
work page 2018
-
[11]
Matthew J. Smith et al. “Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review”. In:Annals of Epidemiology86 (2023), 34–48.e28. issn: 1047-2797.doi:10.1016/j.annepidem.2023.06.004.url:https://www.sciencedirect. com/science/article/pii/S1047279723001151
work page doi:10.1016/j.annepidem.2023.06.004.url:https://www.sciencedirect 2023
-
[12]
Mark J. van der Laan and Sherri Rose.Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, 2011
work page 2011
-
[13]
Mark J. van der Laan, Eric C. Polley, and Alan E. Hubbard. “Super Learner”. In:Statistical Applications in Genetics and Molecular Biology6.1 (2007).doi:10.2202/1544-6115.1309
-
[14]
Handling missing data when estimating causal effects with targeted maximum likelihood estimation
S. G. Dashti et al. “Handling missing data when estimating causal effects with targeted maximum likelihood estimation”. In:arXiv2 (2021).doi:https://doi.org/10.48550/arXiv.2112.05274
-
[15]
Diagnosing and responding to violations in the positivity assumption
Mark L Petersen et al. “Diagnosing and responding to violations in the positivity assumption”. In: Statistical Methods in Medical Research21.1 (2012), pp. 31–54.doi:10.1177/0962280210386207
-
[16]
tmle: An R Package for Targeted Maximum Likelihood Estimation
Susan Gruber and Mark van der Laan. “tmle: An R Package for Targeted Maximum Likelihood Estimation”. In:Journal of Statistical Software51.13 (2012), pp. 1–35.doi:10.18637/jss.v051. i13.url:https://www.jstatsoft.org/v51/i13/
-
[17]
Causal inference in case of near-violation of positivity: comparison of methods
Marc L´ eger et al. “Causal inference in case of near-violation of positivity: comparison of methods”. In:Biometrical Journal64 (2022), pp. 1389–1403.doi:10.1002/bimj.202000323
-
[18]
Multiple-Imputation Inferences with Uncongenial Sources of Input
Xiao-Li Meng. “Multiple-Imputation Inferences with Uncongenial Sources of Input”. In:Statistical Science9.4 (1994), pp. 538–558.url:http://www.jstor.org/stable/2246252
-
[19]
Multiple imputation using chained equa- tions: Issues and guidance for practice
Ian R. White, Patrick Royston, and Angela M. Wood. “Multiple imputation using chained equa- tions: Issues and guidance for practice”. In:Statistics in Medicine30.4 (2011), pp. 377–399.doi: https://doi.org/10.1002/sim.4067
-
[20]
Helen A. Blake et al. “Estimating treatment effects with partially observed covariates using out- come regression with missing indicators”. In:Biometrical Journal62 (2020), pp. 428–443.doi: https://doi.org/10.1002/bimj.201900041
-
[21]
Roderick J. Little, James R. Carpenter, and Katherine J. Lee. “A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation”. In:Sociological Methods & Research53.3 (2024), pp. 1105–1135.doi:10. 1177/00491241221113873.url:https://doi.org/10.1177/00491241221113873
-
[22]
Recoverability and estimation of causal effects under typical multivariable miss- ingness mechanisms
J. Zhang et al. “Recoverability and estimation of causal effects under typical multivariable miss- ingness mechanisms”. In:Biometrical Journal66 (2024).doi:10.1002/bimj.202200326
-
[23]
Toward a standardized evaluation of imputation methodol- ogy
Hanne I. Oberman and Gerko Vink. “Toward a standardized evaluation of imputation methodol- ogy”. In:Biometrical Journal66.1 (2024).doi:https://doi.org/10.1002/bimj.202200107. 21
-
[24]
Haodong Li et al. “Evaluating the Robustness of Targeted Maximum Likelihood Estimators via Realistic Simulations in Nutrition Intervention Trials”. In:Statistics in Medicine41.12 (2022).doi: https://doi.org/10.1002/sim.9348
-
[25]
Stephen P Luby et al. “Effects of water quality, sanitation, handwashing, and nutritional interven- tions on diarrhoea and child growth in rural Bangladesh: a cluster randomised controlled trial”. In:The Lancet Global Health6.3 (2018), pp. 302–315.doi:https://doi.org/10.1016/S2214- 109X(17)30490-4
-
[26]
Causal Inference Using Potential Outcomes
Donald B. Rubin. “Causal Inference Using Potential Outcomes”. In:Journal of the American Statistical Association(2005).url:10.1198/016214504000001880
-
[27]
Targeted maximum likelihood estimation in safety analysis
Samuel D. Lendle, Bruce Fireman, and Mark J. van der Laan. “Targeted maximum likelihood estimation in safety analysis”. In:Journal of Clinical Epidemiology66 (2013), S91–S98
work page 2013
-
[28]
Eric Polley et al.SuperLearner: Super Learner Prediction. R package version 2.0-28. 2021.url: https://CRAN.R-project.org/package=SuperLearner
work page 2021
-
[29]
mice: Multivariate Imputation by Chained Equations in R
Stef van Buuren and Karin Groothuis-Oudshoorn. “mice: Multivariate Imputation by Chained Equations in R”. In:Journal of Statistical Software45.3 (2011), pp. 1–67.doi:10.18637/jss. v045.i03
-
[30]
Stef van Buuren.Flexible Imputation of Missing Data. Vol. 2. Chapman & Hall/CRC, 2018.url: https://stefvanbuuren.name/fimd/
work page 2018
-
[31]
Laura D’Agostino McGowan, Sarah C. Lotspeich, and Sarah A. Hepler. “The ”Why” behind including ”Y” in your imputation model”. In:Statistical Methods in Medical Research33.6 (2024), pp. 996–1020.doi:10.1177/09622802241244608
-
[32]
Appropriate inclusion of interactions was needed to avoid bias in multiple imputation
Kate Tilling et al. “Appropriate inclusion of interactions was needed to avoid bias in multiple imputation”. In:Journal of Clinical Epidemiology80 (2016), pp. 107–115.doi:https://doi. org/10.1016/j.jclinepi.2016.07.004
-
[33]
Amelia II: A Program for Missing Data
James Honaker, Gary King, and Matthew Blackwell. “Amelia II: A Program for Missing Data”. In:Journal of Statistical Software45.7 (2011), pp. 1–47.doi:10.18637/jss.v045.i07
-
[34]
What to do about missing values in time-series cross-section data?
James Honaker and Gary King. “What to do about missing values in time-series cross-section data?” In:American Journal of Political Science54 (2010), pp. 561–581
work page 2010
-
[35]
Joseph L Schafer.Analysis of Incomplete Multivariate Data. 1st. Chapman and Hall/CRC, 1997. doi:10.1201/9780367803025
-
[36]
The Prognosis of Common Mental Disorders in Adolescents: A 14-Year Prospective Cohort Study
George C. Patton et al. “The Prognosis of Common Mental Disorders in Adolescents: A 14-Year Prospective Cohort Study”. In:Lancet383.9926 (2014), pp. 1404–1411.doi:10 . 1016 / S0140 - 6736(13)62116-9
work page 2014
-
[37]
Marius Hofert et al.copula: Multivariate Dependence with Copulas. R package version 1.1-2. 2023. url:https://CRAN.R-project.org/package=copula
work page 2023
-
[38]
Generating missing values for simulation purposes: a multivariate amputation procedure
R. M. Schouten, P. Lugtig, and G. Vink. “Generating missing values for simulation purposes: a multivariate amputation procedure”. In:Journal of Statistical Computation and Simulation88.15 (2018), pp. 2909–2930.doi:10.1080/00949655.2018.1491577
-
[39]
TP Morris, IR White, and MJ. Crowther. “Using simulation studies to evaluate statistical meth- ods”. In:Statistics in Medicine38 (2019), pp. 2074–2102.doi:https://doi.org/10.1002/sim. 8086
work page doi:10.1002/sim 2019
-
[40]
The Highly Adaptive Lasso Estimator
David Benkeser and Mark J. van der Laan. “The Highly Adaptive Lasso Estimator”. In:Proc Int Conf Data Sci Adv Anal2016 (2016), pp. 689–696.doi:10.1109/DSAA.2016.93
-
[41]
A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso
Mark J. van der Laan. “A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso”. In:International Journal of Biostatistics13.2 (2017), /j/ijb.2017.13.issue- 2/ijb-2015-0097/ijb-2015–0097.xml.doi:10.1515/ijb-2015-0097
-
[42]
Mark J van der Laan, David Benkeser, and Wenjing Cai. “Efficient estimation of pathwise differen- tiable target parameters with the undersmoothed highly adaptive lasso”. In:International Journal of Biostatistics19.1 (2022), pp. 261–289.doi:10.1515/ijb-2019-0092. 22
-
[43]
Performance of Cross-Validated Targeted Maximum Likelihood Estimation
M. J. Smith et al. “Performance of Cross-Validated Targeted Maximum Likelihood Estimation”. In:Statistics in Medicine44.15–17 (2025), e70185.doi:https://doi.org/10.1002/sim.70185
-
[44]
Rolf Groenwold et al. “Missing covariate data in clinical research: When and when not to use the missing-indicator method for analysis”. In:CMAJ : Canadian Medical Association journal = journal de l’Association medicale canadienne184 (2012), pp. 1265–9.doi:10.1503/cmaj.110977
-
[45]
Using Causal Diagrams to Guide Analysis in Missing Data Problems
Rhian M. Daniel et al. “Using Causal Diagrams to Guide Analysis in Missing Data Problems”. In: Statistical Methods in Medical Research21.3 (2012), pp. 243–256.doi:10.1177/0962280210394469
-
[46]
Introduction to Double Robust Methods for Incomplete Data
Shaun R. Seaman and Stijn Vansteelandt. “Introduction to Double Robust Methods for Incomplete Data”. In:Statistical Science33.2 (2018), pp. 184–197.url:https://www.jstor.org/stable/ 26770990
work page 2018
-
[47]
Multiple Imputation: A Review of Practical and Theoretical Findings
Jared S. Murray. “Multiple Imputation: A Review of Practical and Theoretical Findings”. In: Statistical Science33.2 (2018), pp. 142–159.doi:https://doi.org/10.1214/18-STS644
-
[48]
A fair comparison of tree-based and parametric methods in multiple imputation by chained equations
Emily Slade and Melissa G. Naylor. “A fair comparison of tree-based and parametric methods in multiple imputation by chained equations”. In:Statistics in Medicine39.8 (2020), pp. 1156–1166. doi:https://doi.org/10.1002/sim.8468
-
[49]
Recursive partitioning for missing data imputation in the presence of interaction effects
L.L. Doove, S. Van Buuren, and E. Dusseldorp. “Recursive partitioning for missing data imputation in the presence of interaction effects”. In:Computational Statistics and Data Analysis72 (2014), pp. 92–104.doi:https://doi.org/10.1016/j.csda.2013.10.025
-
[50]
Veronica Mulenga et al. “Abacavir, zidovudine, or stavudine as paediatric tablets for African HIV- infected children (CHAPAS-3): an open-label, parallel-group, randomised controlled trial”. In:The Lancet Infectious diseases16.2 (2016), pp. 169–79
work page 2016
-
[51]
Andrzej Bienczak et al. “Plasma Efavirenz Exposure, Sex, and Age Predict Virological Response in HIV-Infected African Children”. In:Journal of acquired immune deficiency syndromes73.2 (2016), pp. 161–8
work page 2016
-
[52]
M. Schomaker et al. “Determining Targets for Antiretroviral Drug Concentrations: a Causal Frame- work Illustrated with Pediatric Efavirenz Data from the CHAPAS-3 Trial”. In:Pharmacoepidemi- ology and Drug Safety33 (2024), e70051
work page 2024
-
[53]
R Foundation for Sta- tistical Computing
R Core Team.R: A Language and Environment for Statistical Computing. R Foundation for Sta- tistical Computing. Vienna, Austria, 2023.url:https://www.R-project.org/
work page 2023
-
[54]
Jeremy R Coyle et al.hal9001: The scalable highly adaptive lasso. R package version 0.4.6. 2023. doi:10.5281/zenodo.3558313.url:https://github.com/tlverse/hal9001
work page doi:10.5281/zenodo.3558313.url:https://github.com/tlverse/hal9001 2023
-
[55]
Nima S Hejazi, Jeremy R Coyle, and Mark J van der Laan. “hal9001: Scalable highly adaptive lasso regression in R”. In:Journal of Open Source Software(2020).doi:10.21105/joss.02526. url:https://doi.org/10.21105/joss.02526. 23 24 A Results Figure 4: Model-based simulation: Coverage in ATE estimation using different missing data methods combined with TMLE an...
-
[56]
Generate samples from the Gaussian copula: (U1, U2, U3, U4, U5, U6)∼ C Gaussian(ρ) with the correlation matrixρ: ρ= 1 0.3−0.3 0.3 0.3−0.3 0.3 1 0.7 0.3 0.3 0.3 −0.3 0.7 1 0.3 0.7 0.3 0.3 0.3 0.3 1 0.7 0.3 0.3 0.3 0.7 0.7 1−0.3 −0.3 0.3 0.3 0.3−0.3 1
-
[57]
Transform copula samples into the desired variablesW 1 toW 6: •B∼ N(0,1) •W 1 =I(U 1 >logit −1(a0)) •W 2 =I(U 2 >logit −1(β0 +β 1B)) •W 3 = Categorical(pi) is a categorical variable with four categories. The probabilities of each category depend onBand are created using a softmax function: P(W 3 =i) = eγ0i+γ1i ·B P4 j=1 eγ0j +γ1j ·B ,fori, j= 1, ...,4. 31...
-
[58]
Generate exposureAand outcomeY: Exposure A was modeled via regression on B and W incor- porating two-way confounder-confounder interactions and the outcomeYwas generated through regression on A,W involving two-, three-, and four-way confounder-confounder interactions: A∼Binomial (1, p), p= logit −1(η0 +η 1W1 +η 2W2 +η 3W3 +η 4W4 +η 5W5 +η 6W6 +η 7B +η 10W...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.