pith. sign in

arxiv: 2506.03336 · v3 · submitted 2025-06-03 · 📊 stat.ME

Causal Inference with Missing Exposures and Missing Outcomes

Pith reviewed 2026-05-19 10:26 UTC · model grok-4.3

classification 📊 stat.ME
keywords causal inferencemissing datacounterfactual strata effectstargeted maximum likelihood estimationtuberculosisalcohol consumptionmissing at random
0
0 comments X

The pith

Causal effects with missing exposures and baseline outcomes can be identified using counterfactual strata effects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper extends causal inference methods to handle missing data on both exposures like alcohol consumption and outcomes like TB infection, while also addressing missing baseline information that defines the at-risk population. It introduces counterfactual strata effects as a way to define causal questions focused on groups affected by missingness or exposure. The approach is motivated by real challenges in the SEARCH-TB study in rural Uganda, where confounding and multiple layers of missing data complicate estimating alcohol's impact on incident TB. Under missing-at-random assumptions and no unmeasured confounding, identification results allow consistent estimation via targeted maximum likelihood estimation. This matters for public health research because incomplete data on behaviors and health events is routine, and the method provides a structured way to proceed without discarding cases or biasing results.

Core claim

The authors show that causal estimands can be defined on counterfactual strata to incorporate missing exposures and missingness on the baseline outcome that restricts the population of interest, yielding identification results under standard missing-at-random and no-unmeasured-confounding assumptions, with practical estimation demonstrated via TMLE and Super Learner in the alcohol-TB setting.

What carries the argument

Counterfactual Strata Effects: causal estimands in which the focus population is defined by potential values of the exposure and outcome that are themselves subject to missingness.

If this is right

  • The effect of alcohol consumption on TB risk can be estimated without bias from missing exposure data, missing baseline risk status, or missing follow-up infection status.
  • Causal models can be identified when missingness on the baseline outcome changes which individuals belong to the population of interest.
  • Targeted maximum likelihood estimation combined with Super Learner yields practical estimates under the extended identification results.
  • The framework directly addresses the combination of confounding, missing exposure, and dual missing outcomes observed in the Uganda TB study.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same strata-based approach could be adapted to other cohort studies where missing behavioral data and incomplete outcome ascertainment occur together.
  • Extensions to time-varying exposures and outcomes with intermittent missingness would follow naturally from the identification strategy.
  • Sensitivity analyses that vary the missingness model could quantify how much the conclusions depend on the missing-at-random assumption.

Load-bearing premise

Data are missing at random and there is no unmeasured confounding given the observed covariates.

What would settle it

Re-estimating the alcohol-TB effect in the SEARCH-TB data after altering the missingness mechanism to violate missing-at-random and observing whether the point estimate and confidence interval change beyond what sampling variability would explain.

Figures

Figures reproduced from arXiv: 2506.03336 by Atukunda Mucunguzi, Carina Marquez, Edwin D. Charlebois, Elijah Kakande, Florence Mwangwa, Kirsten E. Landsiedel, Laura B. Balzer, Moses R. Kamya, Rachel Abbott.

Figure 1
Figure 1. Figure 1: Directed acyclic graph (DAG) for a classic point-treatment problem with complete measurement of [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: To define causal effects when the exposure is subject to missingness, we now consider [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: DAG with missingness on the exposure and outcome: [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: DAG with missingness on the exposure, the baseline outcome, and the follow-up outcome: [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results from SEARCH-TB for the association of alcohol use on incident tuberculosis (TB) in [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
read the original abstract

Missing data are ubiquitous in public health research. When estimating causal effects, there are well-established methods to address bias to due missing outcomes. Commonly, causal estimands are defined under hypothetical interventions to "set" the exposure and to prevent missingness. We demonstrate how this framework can be extended to missing exposures. We further extend this framework to incorporate missingness on the baseline outcome, which induces missingness on the population of interest. To do so, we highlight the use of Counterfactual Strata Effects: causal estimands where the focus population is subject to missingness and/or impacted by the exposure. Our work is motivated by SEARCH-TB's investigation of the effect of alcohol consumption on the risk of incident tuberculosis (TB) infection in rural Uganda. This study posed several real-world challenges: confounding, missingness on the exposure (alcohol use), missingness on the baseline outcome (defining who was at-risk of TB and, thus, in the focus population), and missingness on the outcome at follow-up (capturing who acquired TB). We present a series of causal models and identification results to demonstrate the handling of missingness in these settings. We highlight the use of TMLE with Super Learner and the real-world consequences of our approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper extends causal inference methods to settings with missing exposures and missingness on baseline outcomes that define the population of interest. It introduces Counterfactual Strata Effects as the target estimands and provides identification results under MAR and conditional exchangeability assumptions. The framework is applied to the SEARCH-TB study on alcohol use and incident TB risk using TMLE with Super Learner, with emphasis on the real-world consequences of properly accounting for these missingness patterns.

Significance. If the identification results hold and the positivity conditions are satisfied, the work provides a coherent way to define and estimate causal effects when missingness affects both the exposure and the very definition of the target population. The use of TMLE with Super Learner and the concrete SEARCH-TB application are strengths that could make the approach useful for other public-health studies with similar missing-data structures.

major comments (2)
  1. [§4] §4 (Identification results for Counterfactual Strata Effects): The identification of the strata-specific effects under baseline-outcome missingness requires stratum-specific positivity (P(baseline observed, exposure level, outcome observed | covariates) > 0 within each observed-covariate pattern). The manuscript invokes standard MAR and no-unmeasured-confounding assumptions but does not report any empirical check, trimming, or sensitivity analysis for this condition in the SEARCH-TB data or the simulations; violation would render the TMLE targeting step unstable or biased even when the stated assumptions hold.
  2. [§5] §5 (Application and estimation): The paper claims that the approach correctly handles missingness on the population of interest, yet the reported results do not include diagnostics for effective sample size after stratification or for the performance of the Super Learner under the induced missingness mechanism; without these, it is difficult to assess whether the estimated effects are driven by extrapolation in sparse strata.
minor comments (2)
  1. The notation for the three missingness indicators and the counterfactual strata is introduced without a consolidated table; adding one would improve readability when comparing the different estimands.
  2. Several sentences in the introduction repeat the motivation from the abstract; tightening this overlap would reduce redundancy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each major point below and will revise the manuscript to incorporate additional diagnostics and discussion where appropriate.

read point-by-point responses
  1. Referee: [§4] §4 (Identification results for Counterfactual Strata Effects): The identification of the strata-specific effects under baseline-outcome missingness requires stratum-specific positivity (P(baseline observed, exposure level, outcome observed | covariates) > 0 within each observed-covariate pattern). The manuscript invokes standard MAR and no-unmeasured-confounding assumptions but does not report any empirical check, trimming, or sensitivity analysis for this condition in the SEARCH-TB data or the simulations; violation would render the TMLE targeting step unstable or biased even when the stated assumptions hold.

    Authors: We thank the referee for emphasizing the critical role of the stratum-specific positivity assumption in the identification of Counterfactual Strata Effects. The manuscript explicitly lists the required positivity conditions alongside the MAR and conditional exchangeability assumptions. However, we did not include empirical assessments such as propensity score distributions within strata, trimming procedures, or sensitivity analyses for the SEARCH-TB data or the simulation studies. In the revision we will add a dedicated subsection on practical positivity diagnostics, including reporting of minimum estimated probabilities within observed covariate patterns and a brief sensitivity analysis exploring the impact of near-violations. revision: yes

  2. Referee: [§5] §5 (Application and estimation): The paper claims that the approach correctly handles missingness on the population of interest, yet the reported results do not include diagnostics for effective sample size after stratification or for the performance of the Super Learner under the induced missingness mechanism; without these, it is difficult to assess whether the estimated effects are driven by extrapolation in sparse strata.

    Authors: We agree that reporting effective sample size after stratification and Super Learner performance metrics would strengthen the application section. The current manuscript presents the TMLE estimates with Super Learner but omits these specific diagnostics. We will add tables or text reporting the effective sample sizes for each counterfactual stratum in the SEARCH-TB analysis and include summaries of the cross-validated performance of the nuisance estimators (e.g., risk or R-squared) under the observed missingness patterns to help readers evaluate potential extrapolation. revision: yes

Circularity Check

0 steps flagged

No circularity: standard causal identification extended to missing data without self-referential reductions

full rationale

The paper defines counterfactual strata effects as an extension of existing causal frameworks to handle missing exposures, baseline outcomes, and follow-up outcomes. Identification relies on standard MAR assumptions and conditional exchangeability given observed covariates, which are invoked explicitly rather than derived from the paper's own fitted quantities or equations. TMLE with Super Learner is applied as an established estimation procedure to the SEARCH-TB data; no central claim reduces by construction to a fitted parameter renamed as a prediction, a self-citation chain, or an ansatz smuggled in via prior work. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are detailed beyond standard causal assumptions and the new term Counterfactual Strata Effects.

axioms (2)
  • domain assumption Missing at random conditional on observed covariates for exposures and outcomes
    Invoked when extending the framework to missing exposures and baseline missingness
  • domain assumption No unmeasured confounding for the exposure-outcome relationship
    Standard assumption for causal identification in observational data
invented entities (1)
  • Counterfactual Strata Effects no independent evidence
    purpose: Causal estimands focused on populations subject to missingness or impacted by the exposure
    Introduced to handle missing baseline outcome that defines the focus population

pith-pipeline@v0.9.0 · 5784 in / 1444 out tokens · 27583 ms · 2026-05-19T10:26:48.120123+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages

  1. [1]

    The prevention and treatment of missing data in clinical trials

    Roderick J Little, Ralph D’Agostino, Michael L Cohen, Kay Dickersin, Scott S Emerson, John T Farrar, Constantine Frangakis, Joseph W Hogan, Geert Molenberghs, Susan A Murphy, et al. The prevention and treatment of missing data in clinical trials. New England Journal of Medicine , 367(14): 1355–1360, 2012

  2. [2]

    Strategies for handling missing data in electronic health record derived data

    Brian J Wells, Kevin M Chagin, Amy S Nowacki, and Michael W Kattan. Strategies for handling missing data in electronic health record derived data. Egems, 1(3), 2013

  3. [3]

    Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls

    Jonathan AC Sterne, Ian R White, John B Carlin, Michael Spratt, Patrick Royston, Michael G Kenward, Angela M Wood, and James R Carpenter. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ, 338, 2009

  4. [4]

    Canonical causal diagrams to guide the treatment of missing data in epidemiologic studies

    Margarita Moreno-Betancur, Katherine J Lee, Finbarr P Leacy, Ian R White, Julie A Simpson, and John B Carlin. Canonical causal diagrams to guide the treatment of missing data in epidemiologic studies. American Journal of Epidemiology , 187(12):2705–2715, 2018

  5. [5]

    Far from MCAR: obtaining population-level estimates of HIV viral suppression

    Laura B Balzer, James Ayieko, Dalsone Kwarisiima, Gabriel Chamie, Edwin D Charlebois, Joshua Schwab, Mark J van der Laan, Moses R Kamya, Diane V Havlir, and Maya L Petersen. Far from MCAR: obtaining population-level estimates of HIV viral suppression. Epidemiology (Cambridge, Mass.), 31(5):620, 2020

  6. [6]

    Missing outcome data in epidemiologic studies

    Stephen R Cole, Paul N Zivich, Jessie K Edwards, Rachael K Ross, Bonnie E Shook-Sa, Joan T Price, and Jeffrey SA Stringer. Missing outcome data in epidemiologic studies. American Journal of Epidemiology, 192(1):6–10, 2023

  7. [7]

    Missing outcome data in randomised clinical trials of psychological interventions: a review of published trial reports in major psychiatry journals

    Sophie Juul, Pascal Faltermeier, Johanne Juul Petersen, Markus Harboe Olsen, Rebecca Kjaer Andersen, Caroline Barkholt Kamp, Faiza Siddiqui, Sebastian Simonsen, Lawrence Mbuagbaw, Lehana Thabane, et al. Missing outcome data in randomised clinical trials of psychological interventions: a review of published trial reports in major psychiatry journals. BMC p...

  8. [8]

    Addressing missing outcome data in randomised controlled trials: a methodological scoping review

    Ellie Medcalf, Robin M Turner, David Espinoza, Vicky He, and Katy JL Bell. Addressing missing outcome data in randomised controlled trials: a methodological scoping review. Contemporary clinical trials, page 107602, 2024

  9. [9]

    Inference and missing data

    Donald B Rubin. Inference and missing data. Biometrika, 63(3):581–592, 1976

  10. [10]

    Addressing missing data in randomized clinical trials: A causal inference perspective

    Ilja Cornelisz, Pim Cuijpers, Tara Donker, and Chris van Klaveren. Addressing missing data in randomized clinical trials: A causal inference perspective. PloS One, 15(7):e0234349, 2020

  11. [11]

    D. G. Horvitz and D. J. Thompson. A Generalization of Sampling Without Replacement From a Finite Universe. Journal of the American Statistical Association , 47(260):663–685, 1952. ISSN 0162-1459

  12. [12]

    James M. Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling, 7(9): 1393–1512, 1986

  13. [13]

    van der Laan and J.M

    M.J. van der Laan and J.M. Robins. Unified Methods for Censored Longitudinal Data and Causality . Springer-Verlag, New York Berlin Heidelberg, 2003

  14. [14]

    Targeted learning: Causal inference for observational and experimental data, volume 4

    Mark J van der Laan, Sherri Rose, et al. Targeted learning: Causal inference for observational and experimental data, volume 4. Springer, 2011

  15. [15]

    A causal framework for classical statistical estimands in failure-time settings with competing events

    Jessica G Young, Mats J Stensrud, Eric J Tchetgen Tchetgen, and Miguel A Hern´ an. A causal framework for classical statistical estimands in failure-time settings with competing events. Statistics in medicine, 39(8):1199–1236, 2020

  16. [16]

    A targeted maximum likelihood estimator for two-stage designs

    Sherri Rose and Mark J van der Laan. A targeted maximum likelihood estimator for two-stage designs. The international journal of biostatistics , 7(1):0000102202155746791217, 2011

  17. [17]

    Handling missing data when estimating causal effects with targeted maximum likelihood estimation

    S Ghazaleh Dashti, Katherine J Lee, Julie A Simpson, Ian R White, John B Carlin, and Margarita Moreno-Betancur. Handling missing data when estimating causal effects with targeted maximum likelihood estimation. American Journal of Epidemiology , 193(7):1019–1030, 2024. 25

  18. [18]

    Causal inference with missing exposure information: Methods and applications to an obstetric study

    Zhiwei Zhang, Wei Liu, Bo Zhang, Li Tang, and Jun Zhang. Causal inference with missing exposure information: Methods and applications to an obstetric study. Statistical Methods in Medical Research, 25(5):2053–2066, 2016

  19. [19]

    Efficient nonparametric causal inference with missing exposure information

    Edward H Kennedy. Efficient nonparametric causal inference with missing exposure information. The International Journal of Biostatistics , 16(1):20190087, 2020

  20. [20]

    Rothman, S

    K.J. Rothman, S. Greenland, and T.L. Lash. Modern Epidemiology. Lippincott Williams & Wilkins, Phildelphia, 3rd edition, 2008

  21. [21]

    Balzer, J

    L.B. Balzer, J. Schwab, M.J. van der Laan, and M.L. Petersen. Evaluation of progress towards the UNAIDS 90-90-90 HIV care cascade: A description of statistical methods used in an interim analysis of the intervention communities in the SEARCH study. Technical Report 357, University of California at Berkeley, 2017. URL http://biostats.bepress.com/ucbbiostat...

  22. [22]

    Balzer, M

    L.B. Balzer, M. van der Laan, J. Ayieko, M. Kamya, et al. Two-stage TMLE to reduce bias and improve efficiency in cluster randomized trials. Biostatistics, kxab043, 2021

  23. [23]

    Blurring cluster randomized trials and observational studies: Two-stage TMLE for subsampling, missingness, and few independent units

    Joshua R Nugent, Carina Marquez, Edwin D Charlebois, Rachel Abbott, Laura B Balzer, and SEARCH Collaboration. Blurring cluster randomized trials and observational studies: Two-stage TMLE for subsampling, missingness, and few independent units. Biostatistics, 24:kxad015, 2023

  24. [24]

    The Causal Roadmap in the age of AI: from all wheel drive to formula 1

    Maya Petersen. The Causal Roadmap in the age of AI: from all wheel drive to formula 1. In European Causal Inference Meeting, Copenhagen, Denmark, 2024

  25. [25]

    Balzer, Moses R

    Shalika Gupta, Laura B. Balzer, Moses R. Kamya, Diane V. Havlir, and Maya L. Petersen. When exposure affects subgroup membership: Framing relevant causal questions in perinatal epidemiology and beyond, January 2024. URL http://arxiv.org/abs/2401.11368. arXiv:2401.11368 [stat]

  26. [26]

    Balzer, and the OPAL Study team

    Joy Nakato, Laura B. Balzer, and the OPAL Study team. When measurement mediates the causal effect of interest. In Society of Epidemiologic Research (SER) , Austin, TX, 2024

  27. [27]

    Havlir, Laura B

    Diane V. Havlir, Laura B. Balzer, Edwin D. Charlebois, Tamara D. Clark, Dalsone Kwarisiima, James Ayieko, Jane Kabami, Norton Sang, Teri Liegler, Gabriel Chamie, and et al. HIV Testing and 26 Treatment with the Use of a Community Health Approach in Rural Africa. New England Journal of Medicine, 381(3):219–229, 2019. ISSN 0028-4793. doi: 10.1056/NEJMoa1809...

  28. [28]

    A hybrid mobile approach for population-wide HIV testing in rural east Africa: an observational study

    Gabriel Chamie, Tamara D Clark, Jane Kabami, Kevin Kadede, Emmanuel Ssemmondo, Rachel Steinfeld, Geoff Lavoy, Dalsone Kwarisiima, Norton Sang, Vivek Jain, Harsha Thirumurthy, Teri Liegler, Laura B Balzer, Maya L Petersen, Craig R Cohen, Elizabeth A Bukusi, Moses R Kamya, Diane V Havlir, and Edwin D Charlebois. A hybrid mobile approach for population-wide ...

  29. [29]

    Marquez, M

    C. Marquez, M. Atukunda, L.B. Balzer, G. Chamie, et al. The age-specific burden and household and school-based predictors of child and adolescent tuberculosis infection in rural uganda. PloS ONE, 15 (1):e0228102, 2020

  30. [30]

    Community-wide universal human immunodeficiency virus (HIV) test and treat intervention reduces tuberculosis transmission in rural Uganda: A cluster-randomized trial

    Carina Marquez, Mucunguzi Atukunda, Joshua Nugent, Edwin D Charlebois, Gabriel Chamie, Florence Mwangwa, Emmanuel Ssemmondo, Joel Kironde, Jane Kabami, Asiphas Owaraganise, et al. Community-wide universal human immunodeficiency virus (HIV) test and treat intervention reduces tuberculosis transmission in rural Uganda: A cluster-randomized trial. Clinical I...

  31. [31]

    Incident tuberculosis infection is associated with alcohol use in adults in rural Uganda

    Rachel Abbott, Kirsten Landsiedel, Mucunguzi Atukunda, Sarah B Puryear, Gabriel Chamie, Judith A Hahn, Florence Mwangwa, Elijah Kakande, Maya L Petersen, Diane V Havlir, et al. Incident tuberculosis infection is associated with alcohol use in adults in rural Uganda. Clinical Infectious Diseases, 78:ciae304, 2024

  32. [32]

    Bang and J.M

    H. Bang and J.M. Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61:962–972, 2005

  33. [33]

    van der Laan and S

    M.J. van der Laan and S. Gruber. Targeted minimum loss based estimation of causal effects of multiple time point interventions. The International Journal of Biostatistics , 8(1), 2012. 27

  34. [34]

    Comparison of dynamic treatment regimes via inverse probability weighting

    Miguel A Hern´ an, Emilie Lanoy, Dominique Costagliola, and James M Robins. Comparison of dynamic treatment regimes via inverse probability weighting. Basic & clinical pharmacology & toxicology, 98(3):237–242, 2006

  35. [35]

    Causal effect models for realistic individualized treatment and intention to treat rules

    Mark J Van der Laan and Maya L Petersen. Causal effect models for realistic individualized treatment and intention to treat rules. The international journal of biostatistics , 3(1), 2007

  36. [36]

    Estimation and extrapolation of optimal treatment and testing strategies

    James Robins, Liliana Orellana, and Andrea Rotnitzky. Estimation and extrapolation of optimal treatment and testing strategies. Statistics in medicine , 27(23):4678–4721, 2008

  37. [37]

    Principal stratification in causal inference

    Constantine E Frangakis and Donald B Rubin. Principal stratification in causal inference. Biometrics, 58(1):21–29, 2002

  38. [38]

    University studies and employment: An application of the principal strata approach to causal analysis

    Leonardo Grilli and Fabrizia Mealli. University studies and employment: An application of the principal strata approach to causal analysis. Effectiveness of University Education in Italy: Employability, Competences, Human Capital , pages 219–231, 2007

  39. [39]

    Nonparametric bounds on the causal effect of university studies on job opportunities using principal stratification

    Leonardo Grilli and Fabrizia Mealli. Nonparametric bounds on the causal effect of university studies on job opportunities using principal stratification. Journal of Educational and Behavioral Statistics , 33 (1):111–130, 2008

  40. [40]

    Principal stratification: A tool for understanding variation in program effects across endogenous subgroups

    Lindsay C Page, Avi Feller, Todd Grindal, Luke Miratrix, and Marie-Andree Somers. Principal stratification: A tool for understanding variation in program effects across endogenous subgroups. American Journal of Evaluation , 36(4):514–531, 2015

  41. [41]

    Study designs for dependent happenings

    M Elizabeth Halloran and Claudio J Struchiner. Study designs for dependent happenings. Epidemiology, 2(5):331–338, 1991

  42. [42]

    Causal inference in infectious diseases

    M Elizabeth Halloran and Claudio J Struchiner. Causal inference in infectious diseases. Epidemiology, pages 142–151, 1995

  43. [43]

    Toward causal inference with interference

    Michael G Hudgens and M Elizabeth Halloran. Toward causal inference with interference. Journal of the american statistical association , 103(482):832–842, 2008. 28

  44. [44]

    A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

    Laura B Balzer, Wenjing Zheng, Mark J van der Laan, and Maya L Petersen. A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure. Stat Methods Med Res , 28(6):1761–1780, June 2019. ISSN 0962-2802. doi: 10.1177/0962280218774936. URL https://doi.org/10.1177/0962280218774936

  45. [45]

    Petersen and M.J

    M.L. Petersen and M.J. van der Laan. Causal models and learning from data: Integrating causal modeling and statistical estimation. Epidemiology, 25(3):418–426, 2014

  46. [46]

    Hern´ an and J.M

    M.A. Hern´ an and J.M. Robins. Using big data to emulate a target trial when a randomized trial is not available. American Journal of Epidemiology , 183(8):758–764, 2016

  47. [47]

    van der Laan, Maya Petersen, and Wenjing Zheng

    Mark J. van der Laan, Maya Petersen, and Wenjing Zheng. Estimating the Effect of a Community-Based Intervention with Two Communities. Journal of Causal Inference , 1(1):83–106, May 2013. ISSN 2193-3685. URL http://www.degruyter.com/document/doi/10.1515/jci-2012-0011/html

  48. [48]

    Causal inference in randomized trials with partial clustering and imbalanced dependence structures

    Joshua R Nugent, Elijah Kakande, Gabriel Chamie, Jane Kabami, Asiphas Owaraganise, Diane V Havlir, Moses Kamya, and Laura B Balzer. Causal inference in randomized trials with partial clustering and imbalanced dependence structures. arXiv preprint arXiv:2406.04505 , 2024

  49. [49]

    Super learner

    Mark J van der Laan, Eric C Polley, and Alan E Hubbard. Super learner. Statistical Applications in Genetics and Molecular Biology , 6(1), 2007

  50. [50]

    van der Vaart

    A.W. van der Vaart. Asymptotic Statistics. Cambridge University Press, New York, 1998

  51. [51]

    Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data

    Mireille E Schnitzer, Mark J van der Laan, Erica EM Moodie, and Robert W Platt. Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data. The Annals of Applied Statistics , 8(2):703, 2014

  52. [52]

    Greenhouse , title =

    Susan Gruber, Rachael V. Phillips, Hana Lee, Martin Ho, John Concato, and Mark J. van der Laan and. Targeted learning: Toward a future informed by real-world evidence. Statistics in Biopharmaceutical Research, 16(1):11–25, 2024. doi: 10.1080/19466315.2023.2182356. 29

  53. [53]

    Nance, M

    N. Nance, M. Petersen, M. van der Laan, and L.B. Balzer. The causal roadmap and simulations to improve the rigor and reproducibility of real-data applications. Epidemiology, 35(6):791–800, 2024

  54. [54]

    Donald B. Rubin. Multiple Imputation for Nonresponse in Surveys . Wiley Series in Probability and Statistics. John Wiley & Sons, New York, 1987. ISBN 9780471087052. doi: 10.1002/9780470316696

  55. [55]

    MISL: Multiple imputation by super learning

    Thomas Carpenito and Justin Manjourides. MISL: Multiple imputation by super learning. Statistical Methods in Medical Research, 31(10):1904–1915, 2022

  56. [56]

    SuperMICE: An ensemble machine learning approach to multiple imputation by chained equations

    Hannah S Laqueur, Aaron B Shev, and Rose MC Kagawa. SuperMICE: An ensemble machine learning approach to multiple imputation by chained equations. American Journal of Epidemiology , 191(3):516–525, 2022

  57. [57]

    Good practices for quantitative bias analysis

    Timothy L Lash, Matthew P Fox, Richard F MacLehose, George Maldonado, Lawrence C McCandless, and Sander Greenland. Good practices for quantitative bias analysis. International Journal of Epidemiology , 43(6):1969–1985, 07 2014. ISSN 0300-5771. doi: 10.1093/ije/dyu149. URL https://doi.org/10.1093/ije/dyu149

  58. [58]

    Dang and L.B

    L.E. Dang and L.B. Balzer. Start with the target trial protocol; then follow the Roadmap for causal inference. Epidemiology, 34(5):619–623, 2023

  59. [59]

    A generalized theory of separable effects in competing event settings

    Mats J Stensrud, Miguel A Hern´ an, Eric J Tchetgen Tchetgen, James M Robins, Vanessa Didelez, and Jessica G Young. A generalized theory of separable effects in competing event settings. Lifetime data analysis, 27(4):588–631, 2021

  60. [60]

    Separable effects for causal inference in the presence of competing events

    Mats J Stensrud, Jessica G Young, Vanessa Didelez, James M Robins, and Miguel A Hern´ an. Separable effects for causal inference in the presence of competing events. Journal of the American Statistical Association, 117(537):175–183, 2022. 30