Causal Inference with Missing Exposures and Missing Outcomes

Atukunda Mucunguzi; Carina Marquez; Edwin D. Charlebois; Elijah Kakande; Florence Mwangwa; Kirsten E. Landsiedel; Laura B. Balzer; Moses R. Kamya; Rachel Abbott

arxiv: 2506.03336 · v3 · submitted 2025-06-03 · 📊 stat.ME

Causal Inference with Missing Exposures and Missing Outcomes

Kirsten E. Landsiedel , Rachel Abbott , Atukunda Mucunguzi , Florence Mwangwa , Elijah Kakande , Edwin D. Charlebois , Carina Marquez , Moses R. Kamya

show 1 more author

Laura B. Balzer

This is my paper

Pith reviewed 2026-05-19 10:26 UTC · model grok-4.3

classification 📊 stat.ME

keywords causal inferencemissing datacounterfactual strata effectstargeted maximum likelihood estimationtuberculosisalcohol consumptionmissing at random

0 comments

The pith

Causal effects with missing exposures and baseline outcomes can be identified using counterfactual strata effects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper extends causal inference methods to handle missing data on both exposures like alcohol consumption and outcomes like TB infection, while also addressing missing baseline information that defines the at-risk population. It introduces counterfactual strata effects as a way to define causal questions focused on groups affected by missingness or exposure. The approach is motivated by real challenges in the SEARCH-TB study in rural Uganda, where confounding and multiple layers of missing data complicate estimating alcohol's impact on incident TB. Under missing-at-random assumptions and no unmeasured confounding, identification results allow consistent estimation via targeted maximum likelihood estimation. This matters for public health research because incomplete data on behaviors and health events is routine, and the method provides a structured way to proceed without discarding cases or biasing results.

Core claim

The authors show that causal estimands can be defined on counterfactual strata to incorporate missing exposures and missingness on the baseline outcome that restricts the population of interest, yielding identification results under standard missing-at-random and no-unmeasured-confounding assumptions, with practical estimation demonstrated via TMLE and Super Learner in the alcohol-TB setting.

What carries the argument

Counterfactual Strata Effects: causal estimands in which the focus population is defined by potential values of the exposure and outcome that are themselves subject to missingness.

If this is right

The effect of alcohol consumption on TB risk can be estimated without bias from missing exposure data, missing baseline risk status, or missing follow-up infection status.
Causal models can be identified when missingness on the baseline outcome changes which individuals belong to the population of interest.
Targeted maximum likelihood estimation combined with Super Learner yields practical estimates under the extended identification results.
The framework directly addresses the combination of confounding, missing exposure, and dual missing outcomes observed in the Uganda TB study.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same strata-based approach could be adapted to other cohort studies where missing behavioral data and incomplete outcome ascertainment occur together.
Extensions to time-varying exposures and outcomes with intermittent missingness would follow naturally from the identification strategy.
Sensitivity analyses that vary the missingness model could quantify how much the conclusions depend on the missing-at-random assumption.

Load-bearing premise

Data are missing at random and there is no unmeasured confounding given the observed covariates.

What would settle it

Re-estimating the alcohol-TB effect in the SEARCH-TB data after altering the missingness mechanism to violate missing-at-random and observing whether the point estimate and confidence interval change beyond what sampling variability would explain.

Figures

Figures reproduced from arXiv: 2506.03336 by Atukunda Mucunguzi, Carina Marquez, Edwin D. Charlebois, Elijah Kakande, Florence Mwangwa, Kirsten E. Landsiedel, Laura B. Balzer, Moses R. Kamya, Rachel Abbott.

**Figure 2.** Figure 2: To define causal effects when the exposure is subject to missingness, we now consider [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: DAG with missingness on the exposure and outcome: [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: DAG with missingness on the exposure, the baseline outcome, and the follow-up outcome: [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Results from SEARCH-TB for the association of alcohol use on incident tuberculosis (TB) in [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

read the original abstract

Missing data are ubiquitous in public health research. When estimating causal effects, there are well-established methods to address bias to due missing outcomes. Commonly, causal estimands are defined under hypothetical interventions to "set" the exposure and to prevent missingness. We demonstrate how this framework can be extended to missing exposures. We further extend this framework to incorporate missingness on the baseline outcome, which induces missingness on the population of interest. To do so, we highlight the use of Counterfactual Strata Effects: causal estimands where the focus population is subject to missingness and/or impacted by the exposure. Our work is motivated by SEARCH-TB's investigation of the effect of alcohol consumption on the risk of incident tuberculosis (TB) infection in rural Uganda. This study posed several real-world challenges: confounding, missingness on the exposure (alcohol use), missingness on the baseline outcome (defining who was at-risk of TB and, thus, in the focus population), and missingness on the outcome at follow-up (capturing who acquired TB). We present a series of causal models and identification results to demonstrate the handling of missingness in these settings. We highlight the use of TMLE with Super Learner and the real-world consequences of our approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical extension for causal effects when exposures and baseline outcomes are missing by defining counterfactual strata effects, applied to the SEARCH-TB alcohol-TB data, though positivity for the strata is not clearly checked.

read the letter

The main thing here is a direct extension of missing-outcome causal models to also cover missing exposures and missing baseline outcomes that shrink the population of interest. They define counterfactual strata effects for the subpopulation whose baseline is observed and then identify those effects under standard MAR plus conditional exchangeability. The SEARCH-TB example on alcohol use and incident TB supplies a concrete case with all three missingness problems at once, and they show how TMLE plus Super Learner can be plugged in to estimate the quantities.

Referee Report

2 major / 2 minor

Summary. The paper extends causal inference methods to settings with missing exposures and missingness on baseline outcomes that define the population of interest. It introduces Counterfactual Strata Effects as the target estimands and provides identification results under MAR and conditional exchangeability assumptions. The framework is applied to the SEARCH-TB study on alcohol use and incident TB risk using TMLE with Super Learner, with emphasis on the real-world consequences of properly accounting for these missingness patterns.

Significance. If the identification results hold and the positivity conditions are satisfied, the work provides a coherent way to define and estimate causal effects when missingness affects both the exposure and the very definition of the target population. The use of TMLE with Super Learner and the concrete SEARCH-TB application are strengths that could make the approach useful for other public-health studies with similar missing-data structures.

major comments (2)

[§4] §4 (Identification results for Counterfactual Strata Effects): The identification of the strata-specific effects under baseline-outcome missingness requires stratum-specific positivity (P(baseline observed, exposure level, outcome observed | covariates) > 0 within each observed-covariate pattern). The manuscript invokes standard MAR and no-unmeasured-confounding assumptions but does not report any empirical check, trimming, or sensitivity analysis for this condition in the SEARCH-TB data or the simulations; violation would render the TMLE targeting step unstable or biased even when the stated assumptions hold.
[§5] §5 (Application and estimation): The paper claims that the approach correctly handles missingness on the population of interest, yet the reported results do not include diagnostics for effective sample size after stratification or for the performance of the Super Learner under the induced missingness mechanism; without these, it is difficult to assess whether the estimated effects are driven by extrapolation in sparse strata.

minor comments (2)

The notation for the three missingness indicators and the counterfactual strata is introduced without a consolidated table; adding one would improve readability when comparing the different estimands.
Several sentences in the introduction repeat the motivation from the abstract; tightening this overlap would reduce redundancy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each major point below and will revise the manuscript to incorporate additional diagnostics and discussion where appropriate.

read point-by-point responses

Referee: [§4] §4 (Identification results for Counterfactual Strata Effects): The identification of the strata-specific effects under baseline-outcome missingness requires stratum-specific positivity (P(baseline observed, exposure level, outcome observed | covariates) > 0 within each observed-covariate pattern). The manuscript invokes standard MAR and no-unmeasured-confounding assumptions but does not report any empirical check, trimming, or sensitivity analysis for this condition in the SEARCH-TB data or the simulations; violation would render the TMLE targeting step unstable or biased even when the stated assumptions hold.

Authors: We thank the referee for emphasizing the critical role of the stratum-specific positivity assumption in the identification of Counterfactual Strata Effects. The manuscript explicitly lists the required positivity conditions alongside the MAR and conditional exchangeability assumptions. However, we did not include empirical assessments such as propensity score distributions within strata, trimming procedures, or sensitivity analyses for the SEARCH-TB data or the simulation studies. In the revision we will add a dedicated subsection on practical positivity diagnostics, including reporting of minimum estimated probabilities within observed covariate patterns and a brief sensitivity analysis exploring the impact of near-violations. revision: yes
Referee: [§5] §5 (Application and estimation): The paper claims that the approach correctly handles missingness on the population of interest, yet the reported results do not include diagnostics for effective sample size after stratification or for the performance of the Super Learner under the induced missingness mechanism; without these, it is difficult to assess whether the estimated effects are driven by extrapolation in sparse strata.

Authors: We agree that reporting effective sample size after stratification and Super Learner performance metrics would strengthen the application section. The current manuscript presents the TMLE estimates with Super Learner but omits these specific diagnostics. We will add tables or text reporting the effective sample sizes for each counterfactual stratum in the SEARCH-TB analysis and include summaries of the cross-validated performance of the nuisance estimators (e.g., risk or R-squared) under the observed missingness patterns to help readers evaluate potential extrapolation. revision: yes

Circularity Check

0 steps flagged

No circularity: standard causal identification extended to missing data without self-referential reductions

full rationale

The paper defines counterfactual strata effects as an extension of existing causal frameworks to handle missing exposures, baseline outcomes, and follow-up outcomes. Identification relies on standard MAR assumptions and conditional exchangeability given observed covariates, which are invoked explicitly rather than derived from the paper's own fitted quantities or equations. TMLE with Super Learner is applied as an established estimation procedure to the SEARCH-TB data; no central claim reduces by construction to a fitted parameter renamed as a prediction, a self-citation chain, or an ansatz smuggled in via prior work. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are detailed beyond standard causal assumptions and the new term Counterfactual Strata Effects.

axioms (2)

domain assumption Missing at random conditional on observed covariates for exposures and outcomes
Invoked when extending the framework to missing exposures and baseline missingness
domain assumption No unmeasured confounding for the exposure-outcome relationship
Standard assumption for causal identification in observational data

invented entities (1)

Counterfactual Strata Effects no independent evidence
purpose: Causal estimands focused on populations subject to missingness or impacted by the exposure
Introduced to handle missing baseline outcome that defines the focus population

pith-pipeline@v0.9.0 · 5784 in / 1444 out tokens · 27583 ms · 2026-05-19T10:26:48.120123+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages

[1]

The prevention and treatment of missing data in clinical trials

Roderick J Little, Ralph D’Agostino, Michael L Cohen, Kay Dickersin, Scott S Emerson, John T Farrar, Constantine Frangakis, Joseph W Hogan, Geert Molenberghs, Susan A Murphy, et al. The prevention and treatment of missing data in clinical trials. New England Journal of Medicine , 367(14): 1355–1360, 2012

work page 2012
[2]

Strategies for handling missing data in electronic health record derived data

Brian J Wells, Kevin M Chagin, Amy S Nowacki, and Michael W Kattan. Strategies for handling missing data in electronic health record derived data. Egems, 1(3), 2013

work page 2013
[3]

Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls

Jonathan AC Sterne, Ian R White, John B Carlin, Michael Spratt, Patrick Royston, Michael G Kenward, Angela M Wood, and James R Carpenter. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ, 338, 2009

work page 2009
[4]

Canonical causal diagrams to guide the treatment of missing data in epidemiologic studies

Margarita Moreno-Betancur, Katherine J Lee, Finbarr P Leacy, Ian R White, Julie A Simpson, and John B Carlin. Canonical causal diagrams to guide the treatment of missing data in epidemiologic studies. American Journal of Epidemiology , 187(12):2705–2715, 2018

work page 2018
[5]

Far from MCAR: obtaining population-level estimates of HIV viral suppression

Laura B Balzer, James Ayieko, Dalsone Kwarisiima, Gabriel Chamie, Edwin D Charlebois, Joshua Schwab, Mark J van der Laan, Moses R Kamya, Diane V Havlir, and Maya L Petersen. Far from MCAR: obtaining population-level estimates of HIV viral suppression. Epidemiology (Cambridge, Mass.), 31(5):620, 2020

work page 2020
[6]

Missing outcome data in epidemiologic studies

Stephen R Cole, Paul N Zivich, Jessie K Edwards, Rachael K Ross, Bonnie E Shook-Sa, Joan T Price, and Jeffrey SA Stringer. Missing outcome data in epidemiologic studies. American Journal of Epidemiology, 192(1):6–10, 2023

work page 2023
[7]

Missing outcome data in randomised clinical trials of psychological interventions: a review of published trial reports in major psychiatry journals

Sophie Juul, Pascal Faltermeier, Johanne Juul Petersen, Markus Harboe Olsen, Rebecca Kjaer Andersen, Caroline Barkholt Kamp, Faiza Siddiqui, Sebastian Simonsen, Lawrence Mbuagbaw, Lehana Thabane, et al. Missing outcome data in randomised clinical trials of psychological interventions: a review of published trial reports in major psychiatry journals. BMC p...

work page 2024
[8]

Addressing missing outcome data in randomised controlled trials: a methodological scoping review

Ellie Medcalf, Robin M Turner, David Espinoza, Vicky He, and Katy JL Bell. Addressing missing outcome data in randomised controlled trials: a methodological scoping review. Contemporary clinical trials, page 107602, 2024

work page 2024
[9]

Inference and missing data

Donald B Rubin. Inference and missing data. Biometrika, 63(3):581–592, 1976

work page 1976
[10]

Addressing missing data in randomized clinical trials: A causal inference perspective

Ilja Cornelisz, Pim Cuijpers, Tara Donker, and Chris van Klaveren. Addressing missing data in randomized clinical trials: A causal inference perspective. PloS One, 15(7):e0234349, 2020

work page 2020
[11]

D. G. Horvitz and D. J. Thompson. A Generalization of Sampling Without Replacement From a Finite Universe. Journal of the American Statistical Association , 47(260):663–685, 1952. ISSN 0162-1459

work page 1952
[12]

James M. Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling, 7(9): 1393–1512, 1986

work page 1986
[13]

van der Laan and J.M

M.J. van der Laan and J.M. Robins. Unified Methods for Censored Longitudinal Data and Causality . Springer-Verlag, New York Berlin Heidelberg, 2003

work page 2003
[14]

Targeted learning: Causal inference for observational and experimental data, volume 4

Mark J van der Laan, Sherri Rose, et al. Targeted learning: Causal inference for observational and experimental data, volume 4. Springer, 2011

work page 2011
[15]

A causal framework for classical statistical estimands in failure-time settings with competing events

Jessica G Young, Mats J Stensrud, Eric J Tchetgen Tchetgen, and Miguel A Hern´ an. A causal framework for classical statistical estimands in failure-time settings with competing events. Statistics in medicine, 39(8):1199–1236, 2020

work page 2020
[16]

A targeted maximum likelihood estimator for two-stage designs

Sherri Rose and Mark J van der Laan. A targeted maximum likelihood estimator for two-stage designs. The international journal of biostatistics , 7(1):0000102202155746791217, 2011

work page 2011
[17]

Handling missing data when estimating causal effects with targeted maximum likelihood estimation

S Ghazaleh Dashti, Katherine J Lee, Julie A Simpson, Ian R White, John B Carlin, and Margarita Moreno-Betancur. Handling missing data when estimating causal effects with targeted maximum likelihood estimation. American Journal of Epidemiology , 193(7):1019–1030, 2024. 25

work page 2024
[18]

Causal inference with missing exposure information: Methods and applications to an obstetric study

Zhiwei Zhang, Wei Liu, Bo Zhang, Li Tang, and Jun Zhang. Causal inference with missing exposure information: Methods and applications to an obstetric study. Statistical Methods in Medical Research, 25(5):2053–2066, 2016

work page 2053
[19]

Efficient nonparametric causal inference with missing exposure information

Edward H Kennedy. Efficient nonparametric causal inference with missing exposure information. The International Journal of Biostatistics , 16(1):20190087, 2020

work page 2020
[20]

Rothman, S

K.J. Rothman, S. Greenland, and T.L. Lash. Modern Epidemiology. Lippincott Williams & Wilkins, Phildelphia, 3rd edition, 2008

work page 2008
[21]

Balzer, J

L.B. Balzer, J. Schwab, M.J. van der Laan, and M.L. Petersen. Evaluation of progress towards the UNAIDS 90-90-90 HIV care cascade: A description of statistical methods used in an interim analysis of the intervention communities in the SEARCH study. Technical Report 357, University of California at Berkeley, 2017. URL http://biostats.bepress.com/ucbbiostat...

work page 2017
[22]

Balzer, M

L.B. Balzer, M. van der Laan, J. Ayieko, M. Kamya, et al. Two-stage TMLE to reduce bias and improve efficiency in cluster randomized trials. Biostatistics, kxab043, 2021

work page 2021
[23]

Blurring cluster randomized trials and observational studies: Two-stage TMLE for subsampling, missingness, and few independent units

Joshua R Nugent, Carina Marquez, Edwin D Charlebois, Rachel Abbott, Laura B Balzer, and SEARCH Collaboration. Blurring cluster randomized trials and observational studies: Two-stage TMLE for subsampling, missingness, and few independent units. Biostatistics, 24:kxad015, 2023

work page 2023
[24]

The Causal Roadmap in the age of AI: from all wheel drive to formula 1

Maya Petersen. The Causal Roadmap in the age of AI: from all wheel drive to formula 1. In European Causal Inference Meeting, Copenhagen, Denmark, 2024

work page 2024
[25]

Balzer, Moses R

Shalika Gupta, Laura B. Balzer, Moses R. Kamya, Diane V. Havlir, and Maya L. Petersen. When exposure affects subgroup membership: Framing relevant causal questions in perinatal epidemiology and beyond, January 2024. URL http://arxiv.org/abs/2401.11368. arXiv:2401.11368 [stat]

work page arXiv 2024
[26]

Balzer, and the OPAL Study team

Joy Nakato, Laura B. Balzer, and the OPAL Study team. When measurement mediates the causal effect of interest. In Society of Epidemiologic Research (SER) , Austin, TX, 2024

work page 2024
[27]

Havlir, Laura B

Diane V. Havlir, Laura B. Balzer, Edwin D. Charlebois, Tamara D. Clark, Dalsone Kwarisiima, James Ayieko, Jane Kabami, Norton Sang, Teri Liegler, Gabriel Chamie, and et al. HIV Testing and 26 Treatment with the Use of a Community Health Approach in Rural Africa. New England Journal of Medicine, 381(3):219–229, 2019. ISSN 0028-4793. doi: 10.1056/NEJMoa1809...

work page doi:10.1056/nejmoa1809866 2019
[28]

A hybrid mobile approach for population-wide HIV testing in rural east Africa: an observational study

Gabriel Chamie, Tamara D Clark, Jane Kabami, Kevin Kadede, Emmanuel Ssemmondo, Rachel Steinfeld, Geoff Lavoy, Dalsone Kwarisiima, Norton Sang, Vivek Jain, Harsha Thirumurthy, Teri Liegler, Laura B Balzer, Maya L Petersen, Craig R Cohen, Elizabeth A Bukusi, Moses R Kamya, Diane V Havlir, and Edwin D Charlebois. A hybrid mobile approach for population-wide ...

work page doi:10.1016/s2352-3018(15)00251-9 2016
[29]

Marquez, M

C. Marquez, M. Atukunda, L.B. Balzer, G. Chamie, et al. The age-specific burden and household and school-based predictors of child and adolescent tuberculosis infection in rural uganda. PloS ONE, 15 (1):e0228102, 2020

work page 2020
[30]

Community-wide universal human immunodeficiency virus (HIV) test and treat intervention reduces tuberculosis transmission in rural Uganda: A cluster-randomized trial

Carina Marquez, Mucunguzi Atukunda, Joshua Nugent, Edwin D Charlebois, Gabriel Chamie, Florence Mwangwa, Emmanuel Ssemmondo, Joel Kironde, Jane Kabami, Asiphas Owaraganise, et al. Community-wide universal human immunodeficiency virus (HIV) test and treat intervention reduces tuberculosis transmission in rural Uganda: A cluster-randomized trial. Clinical I...

work page 2024
[31]

Incident tuberculosis infection is associated with alcohol use in adults in rural Uganda

Rachel Abbott, Kirsten Landsiedel, Mucunguzi Atukunda, Sarah B Puryear, Gabriel Chamie, Judith A Hahn, Florence Mwangwa, Elijah Kakande, Maya L Petersen, Diane V Havlir, et al. Incident tuberculosis infection is associated with alcohol use in adults in rural Uganda. Clinical Infectious Diseases, 78:ciae304, 2024

work page 2024
[32]

Bang and J.M

H. Bang and J.M. Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61:962–972, 2005

work page 2005
[33]

van der Laan and S

M.J. van der Laan and S. Gruber. Targeted minimum loss based estimation of causal effects of multiple time point interventions. The International Journal of Biostatistics , 8(1), 2012. 27

work page 2012
[34]

Comparison of dynamic treatment regimes via inverse probability weighting

Miguel A Hern´ an, Emilie Lanoy, Dominique Costagliola, and James M Robins. Comparison of dynamic treatment regimes via inverse probability weighting. Basic & clinical pharmacology & toxicology, 98(3):237–242, 2006

work page 2006
[35]

Causal effect models for realistic individualized treatment and intention to treat rules

Mark J Van der Laan and Maya L Petersen. Causal effect models for realistic individualized treatment and intention to treat rules. The international journal of biostatistics , 3(1), 2007

work page 2007
[36]

Estimation and extrapolation of optimal treatment and testing strategies

James Robins, Liliana Orellana, and Andrea Rotnitzky. Estimation and extrapolation of optimal treatment and testing strategies. Statistics in medicine , 27(23):4678–4721, 2008

work page 2008
[37]

Principal stratification in causal inference

Constantine E Frangakis and Donald B Rubin. Principal stratification in causal inference. Biometrics, 58(1):21–29, 2002

work page 2002
[38]

University studies and employment: An application of the principal strata approach to causal analysis

Leonardo Grilli and Fabrizia Mealli. University studies and employment: An application of the principal strata approach to causal analysis. Effectiveness of University Education in Italy: Employability, Competences, Human Capital , pages 219–231, 2007

work page 2007
[39]

Nonparametric bounds on the causal effect of university studies on job opportunities using principal stratification

Leonardo Grilli and Fabrizia Mealli. Nonparametric bounds on the causal effect of university studies on job opportunities using principal stratification. Journal of Educational and Behavioral Statistics , 33 (1):111–130, 2008

work page 2008
[40]

Principal stratification: A tool for understanding variation in program effects across endogenous subgroups

Lindsay C Page, Avi Feller, Todd Grindal, Luke Miratrix, and Marie-Andree Somers. Principal stratification: A tool for understanding variation in program effects across endogenous subgroups. American Journal of Evaluation , 36(4):514–531, 2015

work page 2015
[41]

Study designs for dependent happenings

M Elizabeth Halloran and Claudio J Struchiner. Study designs for dependent happenings. Epidemiology, 2(5):331–338, 1991

work page 1991
[42]

Causal inference in infectious diseases

M Elizabeth Halloran and Claudio J Struchiner. Causal inference in infectious diseases. Epidemiology, pages 142–151, 1995

work page 1995
[43]

Toward causal inference with interference

Michael G Hudgens and M Elizabeth Halloran. Toward causal inference with interference. Journal of the american statistical association , 103(482):832–842, 2008. 28

work page 2008
[44]

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

Laura B Balzer, Wenjing Zheng, Mark J van der Laan, and Maya L Petersen. A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure. Stat Methods Med Res , 28(6):1761–1780, June 2019. ISSN 0962-2802. doi: 10.1177/0962280218774936. URL https://doi.org/10.1177/0962280218774936

work page doi:10.1177/0962280218774936 2019
[45]

Petersen and M.J

M.L. Petersen and M.J. van der Laan. Causal models and learning from data: Integrating causal modeling and statistical estimation. Epidemiology, 25(3):418–426, 2014

work page 2014
[46]

Hern´ an and J.M

M.A. Hern´ an and J.M. Robins. Using big data to emulate a target trial when a randomized trial is not available. American Journal of Epidemiology , 183(8):758–764, 2016

work page 2016
[47]

van der Laan, Maya Petersen, and Wenjing Zheng

Mark J. van der Laan, Maya Petersen, and Wenjing Zheng. Estimating the Effect of a Community-Based Intervention with Two Communities. Journal of Causal Inference , 1(1):83–106, May 2013. ISSN 2193-3685. URL http://www.degruyter.com/document/doi/10.1515/jci-2012-0011/html

work page doi:10.1515/jci-2012-0011/html 2013
[48]

Causal inference in randomized trials with partial clustering and imbalanced dependence structures

Joshua R Nugent, Elijah Kakande, Gabriel Chamie, Jane Kabami, Asiphas Owaraganise, Diane V Havlir, Moses Kamya, and Laura B Balzer. Causal inference in randomized trials with partial clustering and imbalanced dependence structures. arXiv preprint arXiv:2406.04505 , 2024

work page arXiv 2024
[49]

Super learner

Mark J van der Laan, Eric C Polley, and Alan E Hubbard. Super learner. Statistical Applications in Genetics and Molecular Biology , 6(1), 2007

work page 2007
[50]

van der Vaart

A.W. van der Vaart. Asymptotic Statistics. Cambridge University Press, New York, 1998

work page 1998
[51]

Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data

Mireille E Schnitzer, Mark J van der Laan, Erica EM Moodie, and Robert W Platt. Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data. The Annals of Applied Statistics , 8(2):703, 2014

work page 2014
[52]

Greenhouse , title =

Susan Gruber, Rachael V. Phillips, Hana Lee, Martin Ho, John Concato, and Mark J. van der Laan and. Targeted learning: Toward a future informed by real-world evidence. Statistics in Biopharmaceutical Research, 16(1):11–25, 2024. doi: 10.1080/19466315.2023.2182356. 29

work page doi:10.1080/19466315.2023.2182356 2024
[53]

Nance, M

N. Nance, M. Petersen, M. van der Laan, and L.B. Balzer. The causal roadmap and simulations to improve the rigor and reproducibility of real-data applications. Epidemiology, 35(6):791–800, 2024

work page 2024
[54]

Donald B. Rubin. Multiple Imputation for Nonresponse in Surveys . Wiley Series in Probability and Statistics. John Wiley & Sons, New York, 1987. ISBN 9780471087052. doi: 10.1002/9780470316696

work page doi:10.1002/9780470316696 1987
[55]

MISL: Multiple imputation by super learning

Thomas Carpenito and Justin Manjourides. MISL: Multiple imputation by super learning. Statistical Methods in Medical Research, 31(10):1904–1915, 2022

work page 1904
[56]

SuperMICE: An ensemble machine learning approach to multiple imputation by chained equations

Hannah S Laqueur, Aaron B Shev, and Rose MC Kagawa. SuperMICE: An ensemble machine learning approach to multiple imputation by chained equations. American Journal of Epidemiology , 191(3):516–525, 2022

work page 2022
[57]

Good practices for quantitative bias analysis

Timothy L Lash, Matthew P Fox, Richard F MacLehose, George Maldonado, Lawrence C McCandless, and Sander Greenland. Good practices for quantitative bias analysis. International Journal of Epidemiology , 43(6):1969–1985, 07 2014. ISSN 0300-5771. doi: 10.1093/ije/dyu149. URL https://doi.org/10.1093/ije/dyu149

work page doi:10.1093/ije/dyu149 1969
[58]

Dang and L.B

L.E. Dang and L.B. Balzer. Start with the target trial protocol; then follow the Roadmap for causal inference. Epidemiology, 34(5):619–623, 2023

work page 2023
[59]

A generalized theory of separable effects in competing event settings

Mats J Stensrud, Miguel A Hern´ an, Eric J Tchetgen Tchetgen, James M Robins, Vanessa Didelez, and Jessica G Young. A generalized theory of separable effects in competing event settings. Lifetime data analysis, 27(4):588–631, 2021

work page 2021
[60]

Separable effects for causal inference in the presence of competing events

Mats J Stensrud, Jessica G Young, Vanessa Didelez, James M Robins, and Miguel A Hern´ an. Separable effects for causal inference in the presence of competing events. Journal of the American Statistical Association, 117(537):175–183, 2022. 30

work page 2022

[1] [1]

The prevention and treatment of missing data in clinical trials

Roderick J Little, Ralph D’Agostino, Michael L Cohen, Kay Dickersin, Scott S Emerson, John T Farrar, Constantine Frangakis, Joseph W Hogan, Geert Molenberghs, Susan A Murphy, et al. The prevention and treatment of missing data in clinical trials. New England Journal of Medicine , 367(14): 1355–1360, 2012

work page 2012

[2] [2]

Strategies for handling missing data in electronic health record derived data

Brian J Wells, Kevin M Chagin, Amy S Nowacki, and Michael W Kattan. Strategies for handling missing data in electronic health record derived data. Egems, 1(3), 2013

work page 2013

[3] [3]

Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls

Jonathan AC Sterne, Ian R White, John B Carlin, Michael Spratt, Patrick Royston, Michael G Kenward, Angela M Wood, and James R Carpenter. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ, 338, 2009

work page 2009

[4] [4]

Canonical causal diagrams to guide the treatment of missing data in epidemiologic studies

Margarita Moreno-Betancur, Katherine J Lee, Finbarr P Leacy, Ian R White, Julie A Simpson, and John B Carlin. Canonical causal diagrams to guide the treatment of missing data in epidemiologic studies. American Journal of Epidemiology , 187(12):2705–2715, 2018

work page 2018

[5] [5]

Far from MCAR: obtaining population-level estimates of HIV viral suppression

Laura B Balzer, James Ayieko, Dalsone Kwarisiima, Gabriel Chamie, Edwin D Charlebois, Joshua Schwab, Mark J van der Laan, Moses R Kamya, Diane V Havlir, and Maya L Petersen. Far from MCAR: obtaining population-level estimates of HIV viral suppression. Epidemiology (Cambridge, Mass.), 31(5):620, 2020

work page 2020

[6] [6]

Missing outcome data in epidemiologic studies

Stephen R Cole, Paul N Zivich, Jessie K Edwards, Rachael K Ross, Bonnie E Shook-Sa, Joan T Price, and Jeffrey SA Stringer. Missing outcome data in epidemiologic studies. American Journal of Epidemiology, 192(1):6–10, 2023

work page 2023

[7] [7]

Missing outcome data in randomised clinical trials of psychological interventions: a review of published trial reports in major psychiatry journals

Sophie Juul, Pascal Faltermeier, Johanne Juul Petersen, Markus Harboe Olsen, Rebecca Kjaer Andersen, Caroline Barkholt Kamp, Faiza Siddiqui, Sebastian Simonsen, Lawrence Mbuagbaw, Lehana Thabane, et al. Missing outcome data in randomised clinical trials of psychological interventions: a review of published trial reports in major psychiatry journals. BMC p...

work page 2024

[8] [8]

Addressing missing outcome data in randomised controlled trials: a methodological scoping review

Ellie Medcalf, Robin M Turner, David Espinoza, Vicky He, and Katy JL Bell. Addressing missing outcome data in randomised controlled trials: a methodological scoping review. Contemporary clinical trials, page 107602, 2024

work page 2024

[9] [9]

Inference and missing data

Donald B Rubin. Inference and missing data. Biometrika, 63(3):581–592, 1976

work page 1976

[10] [10]

Addressing missing data in randomized clinical trials: A causal inference perspective

Ilja Cornelisz, Pim Cuijpers, Tara Donker, and Chris van Klaveren. Addressing missing data in randomized clinical trials: A causal inference perspective. PloS One, 15(7):e0234349, 2020

work page 2020

[11] [11]

D. G. Horvitz and D. J. Thompson. A Generalization of Sampling Without Replacement From a Finite Universe. Journal of the American Statistical Association , 47(260):663–685, 1952. ISSN 0162-1459

work page 1952

[12] [12]

James M. Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling, 7(9): 1393–1512, 1986

work page 1986

[13] [13]

van der Laan and J.M

M.J. van der Laan and J.M. Robins. Unified Methods for Censored Longitudinal Data and Causality . Springer-Verlag, New York Berlin Heidelberg, 2003

work page 2003

[14] [14]

Targeted learning: Causal inference for observational and experimental data, volume 4

Mark J van der Laan, Sherri Rose, et al. Targeted learning: Causal inference for observational and experimental data, volume 4. Springer, 2011

work page 2011

[15] [15]

A causal framework for classical statistical estimands in failure-time settings with competing events

Jessica G Young, Mats J Stensrud, Eric J Tchetgen Tchetgen, and Miguel A Hern´ an. A causal framework for classical statistical estimands in failure-time settings with competing events. Statistics in medicine, 39(8):1199–1236, 2020

work page 2020

[16] [16]

A targeted maximum likelihood estimator for two-stage designs

Sherri Rose and Mark J van der Laan. A targeted maximum likelihood estimator for two-stage designs. The international journal of biostatistics , 7(1):0000102202155746791217, 2011

work page 2011

[17] [17]

Handling missing data when estimating causal effects with targeted maximum likelihood estimation

S Ghazaleh Dashti, Katherine J Lee, Julie A Simpson, Ian R White, John B Carlin, and Margarita Moreno-Betancur. Handling missing data when estimating causal effects with targeted maximum likelihood estimation. American Journal of Epidemiology , 193(7):1019–1030, 2024. 25

work page 2024

[18] [18]

Causal inference with missing exposure information: Methods and applications to an obstetric study

Zhiwei Zhang, Wei Liu, Bo Zhang, Li Tang, and Jun Zhang. Causal inference with missing exposure information: Methods and applications to an obstetric study. Statistical Methods in Medical Research, 25(5):2053–2066, 2016

work page 2053

[19] [19]

Efficient nonparametric causal inference with missing exposure information

Edward H Kennedy. Efficient nonparametric causal inference with missing exposure information. The International Journal of Biostatistics , 16(1):20190087, 2020

work page 2020

[20] [20]

Rothman, S

K.J. Rothman, S. Greenland, and T.L. Lash. Modern Epidemiology. Lippincott Williams & Wilkins, Phildelphia, 3rd edition, 2008

work page 2008

[21] [21]

Balzer, J

L.B. Balzer, J. Schwab, M.J. van der Laan, and M.L. Petersen. Evaluation of progress towards the UNAIDS 90-90-90 HIV care cascade: A description of statistical methods used in an interim analysis of the intervention communities in the SEARCH study. Technical Report 357, University of California at Berkeley, 2017. URL http://biostats.bepress.com/ucbbiostat...

work page 2017

[22] [22]

Balzer, M

L.B. Balzer, M. van der Laan, J. Ayieko, M. Kamya, et al. Two-stage TMLE to reduce bias and improve efficiency in cluster randomized trials. Biostatistics, kxab043, 2021

work page 2021

[23] [23]

Blurring cluster randomized trials and observational studies: Two-stage TMLE for subsampling, missingness, and few independent units

Joshua R Nugent, Carina Marquez, Edwin D Charlebois, Rachel Abbott, Laura B Balzer, and SEARCH Collaboration. Blurring cluster randomized trials and observational studies: Two-stage TMLE for subsampling, missingness, and few independent units. Biostatistics, 24:kxad015, 2023

work page 2023

[24] [24]

The Causal Roadmap in the age of AI: from all wheel drive to formula 1

Maya Petersen. The Causal Roadmap in the age of AI: from all wheel drive to formula 1. In European Causal Inference Meeting, Copenhagen, Denmark, 2024

work page 2024

[25] [25]

Balzer, Moses R

Shalika Gupta, Laura B. Balzer, Moses R. Kamya, Diane V. Havlir, and Maya L. Petersen. When exposure affects subgroup membership: Framing relevant causal questions in perinatal epidemiology and beyond, January 2024. URL http://arxiv.org/abs/2401.11368. arXiv:2401.11368 [stat]

work page arXiv 2024

[26] [26]

Balzer, and the OPAL Study team

Joy Nakato, Laura B. Balzer, and the OPAL Study team. When measurement mediates the causal effect of interest. In Society of Epidemiologic Research (SER) , Austin, TX, 2024

work page 2024

[27] [27]

Havlir, Laura B

Diane V. Havlir, Laura B. Balzer, Edwin D. Charlebois, Tamara D. Clark, Dalsone Kwarisiima, James Ayieko, Jane Kabami, Norton Sang, Teri Liegler, Gabriel Chamie, and et al. HIV Testing and 26 Treatment with the Use of a Community Health Approach in Rural Africa. New England Journal of Medicine, 381(3):219–229, 2019. ISSN 0028-4793. doi: 10.1056/NEJMoa1809...

work page doi:10.1056/nejmoa1809866 2019

[28] [28]

A hybrid mobile approach for population-wide HIV testing in rural east Africa: an observational study

Gabriel Chamie, Tamara D Clark, Jane Kabami, Kevin Kadede, Emmanuel Ssemmondo, Rachel Steinfeld, Geoff Lavoy, Dalsone Kwarisiima, Norton Sang, Vivek Jain, Harsha Thirumurthy, Teri Liegler, Laura B Balzer, Maya L Petersen, Craig R Cohen, Elizabeth A Bukusi, Moses R Kamya, Diane V Havlir, and Edwin D Charlebois. A hybrid mobile approach for population-wide ...

work page doi:10.1016/s2352-3018(15)00251-9 2016

[29] [29]

Marquez, M

C. Marquez, M. Atukunda, L.B. Balzer, G. Chamie, et al. The age-specific burden and household and school-based predictors of child and adolescent tuberculosis infection in rural uganda. PloS ONE, 15 (1):e0228102, 2020

work page 2020

[30] [30]

Community-wide universal human immunodeficiency virus (HIV) test and treat intervention reduces tuberculosis transmission in rural Uganda: A cluster-randomized trial

Carina Marquez, Mucunguzi Atukunda, Joshua Nugent, Edwin D Charlebois, Gabriel Chamie, Florence Mwangwa, Emmanuel Ssemmondo, Joel Kironde, Jane Kabami, Asiphas Owaraganise, et al. Community-wide universal human immunodeficiency virus (HIV) test and treat intervention reduces tuberculosis transmission in rural Uganda: A cluster-randomized trial. Clinical I...

work page 2024

[31] [31]

Incident tuberculosis infection is associated with alcohol use in adults in rural Uganda

Rachel Abbott, Kirsten Landsiedel, Mucunguzi Atukunda, Sarah B Puryear, Gabriel Chamie, Judith A Hahn, Florence Mwangwa, Elijah Kakande, Maya L Petersen, Diane V Havlir, et al. Incident tuberculosis infection is associated with alcohol use in adults in rural Uganda. Clinical Infectious Diseases, 78:ciae304, 2024

work page 2024

[32] [32]

Bang and J.M

H. Bang and J.M. Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61:962–972, 2005

work page 2005

[33] [33]

van der Laan and S

M.J. van der Laan and S. Gruber. Targeted minimum loss based estimation of causal effects of multiple time point interventions. The International Journal of Biostatistics , 8(1), 2012. 27

work page 2012

[34] [34]

Comparison of dynamic treatment regimes via inverse probability weighting

Miguel A Hern´ an, Emilie Lanoy, Dominique Costagliola, and James M Robins. Comparison of dynamic treatment regimes via inverse probability weighting. Basic & clinical pharmacology & toxicology, 98(3):237–242, 2006

work page 2006

[35] [35]

Causal effect models for realistic individualized treatment and intention to treat rules

Mark J Van der Laan and Maya L Petersen. Causal effect models for realistic individualized treatment and intention to treat rules. The international journal of biostatistics , 3(1), 2007

work page 2007

[36] [36]

Estimation and extrapolation of optimal treatment and testing strategies

James Robins, Liliana Orellana, and Andrea Rotnitzky. Estimation and extrapolation of optimal treatment and testing strategies. Statistics in medicine , 27(23):4678–4721, 2008

work page 2008

[37] [37]

Principal stratification in causal inference

Constantine E Frangakis and Donald B Rubin. Principal stratification in causal inference. Biometrics, 58(1):21–29, 2002

work page 2002

[38] [38]

University studies and employment: An application of the principal strata approach to causal analysis

Leonardo Grilli and Fabrizia Mealli. University studies and employment: An application of the principal strata approach to causal analysis. Effectiveness of University Education in Italy: Employability, Competences, Human Capital , pages 219–231, 2007

work page 2007

[39] [39]

Nonparametric bounds on the causal effect of university studies on job opportunities using principal stratification

Leonardo Grilli and Fabrizia Mealli. Nonparametric bounds on the causal effect of university studies on job opportunities using principal stratification. Journal of Educational and Behavioral Statistics , 33 (1):111–130, 2008

work page 2008

[40] [40]

Principal stratification: A tool for understanding variation in program effects across endogenous subgroups

Lindsay C Page, Avi Feller, Todd Grindal, Luke Miratrix, and Marie-Andree Somers. Principal stratification: A tool for understanding variation in program effects across endogenous subgroups. American Journal of Evaluation , 36(4):514–531, 2015

work page 2015

[41] [41]

Study designs for dependent happenings

M Elizabeth Halloran and Claudio J Struchiner. Study designs for dependent happenings. Epidemiology, 2(5):331–338, 1991

work page 1991

[42] [42]

Causal inference in infectious diseases

M Elizabeth Halloran and Claudio J Struchiner. Causal inference in infectious diseases. Epidemiology, pages 142–151, 1995

work page 1995

[43] [43]

Toward causal inference with interference

Michael G Hudgens and M Elizabeth Halloran. Toward causal inference with interference. Journal of the american statistical association , 103(482):832–842, 2008. 28

work page 2008

[44] [44]

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

Laura B Balzer, Wenjing Zheng, Mark J van der Laan, and Maya L Petersen. A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure. Stat Methods Med Res , 28(6):1761–1780, June 2019. ISSN 0962-2802. doi: 10.1177/0962280218774936. URL https://doi.org/10.1177/0962280218774936

work page doi:10.1177/0962280218774936 2019

[45] [45]

Petersen and M.J

M.L. Petersen and M.J. van der Laan. Causal models and learning from data: Integrating causal modeling and statistical estimation. Epidemiology, 25(3):418–426, 2014

work page 2014

[46] [46]

Hern´ an and J.M

M.A. Hern´ an and J.M. Robins. Using big data to emulate a target trial when a randomized trial is not available. American Journal of Epidemiology , 183(8):758–764, 2016

work page 2016

[47] [47]

van der Laan, Maya Petersen, and Wenjing Zheng

Mark J. van der Laan, Maya Petersen, and Wenjing Zheng. Estimating the Effect of a Community-Based Intervention with Two Communities. Journal of Causal Inference , 1(1):83–106, May 2013. ISSN 2193-3685. URL http://www.degruyter.com/document/doi/10.1515/jci-2012-0011/html

work page doi:10.1515/jci-2012-0011/html 2013

[48] [48]

Causal inference in randomized trials with partial clustering and imbalanced dependence structures

Joshua R Nugent, Elijah Kakande, Gabriel Chamie, Jane Kabami, Asiphas Owaraganise, Diane V Havlir, Moses Kamya, and Laura B Balzer. Causal inference in randomized trials with partial clustering and imbalanced dependence structures. arXiv preprint arXiv:2406.04505 , 2024

work page arXiv 2024

[49] [49]

Super learner

Mark J van der Laan, Eric C Polley, and Alan E Hubbard. Super learner. Statistical Applications in Genetics and Molecular Biology , 6(1), 2007

work page 2007

[50] [50]

van der Vaart

A.W. van der Vaart. Asymptotic Statistics. Cambridge University Press, New York, 1998

work page 1998

[51] [51]

Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data

Mireille E Schnitzer, Mark J van der Laan, Erica EM Moodie, and Robert W Platt. Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data. The Annals of Applied Statistics , 8(2):703, 2014

work page 2014

[52] [52]

Greenhouse , title =

Susan Gruber, Rachael V. Phillips, Hana Lee, Martin Ho, John Concato, and Mark J. van der Laan and. Targeted learning: Toward a future informed by real-world evidence. Statistics in Biopharmaceutical Research, 16(1):11–25, 2024. doi: 10.1080/19466315.2023.2182356. 29

work page doi:10.1080/19466315.2023.2182356 2024

[53] [53]

Nance, M

N. Nance, M. Petersen, M. van der Laan, and L.B. Balzer. The causal roadmap and simulations to improve the rigor and reproducibility of real-data applications. Epidemiology, 35(6):791–800, 2024

work page 2024

[54] [54]

Donald B. Rubin. Multiple Imputation for Nonresponse in Surveys . Wiley Series in Probability and Statistics. John Wiley & Sons, New York, 1987. ISBN 9780471087052. doi: 10.1002/9780470316696

work page doi:10.1002/9780470316696 1987

[55] [55]

MISL: Multiple imputation by super learning

Thomas Carpenito and Justin Manjourides. MISL: Multiple imputation by super learning. Statistical Methods in Medical Research, 31(10):1904–1915, 2022

work page 1904

[56] [56]

SuperMICE: An ensemble machine learning approach to multiple imputation by chained equations

Hannah S Laqueur, Aaron B Shev, and Rose MC Kagawa. SuperMICE: An ensemble machine learning approach to multiple imputation by chained equations. American Journal of Epidemiology , 191(3):516–525, 2022

work page 2022

[57] [57]

Good practices for quantitative bias analysis

Timothy L Lash, Matthew P Fox, Richard F MacLehose, George Maldonado, Lawrence C McCandless, and Sander Greenland. Good practices for quantitative bias analysis. International Journal of Epidemiology , 43(6):1969–1985, 07 2014. ISSN 0300-5771. doi: 10.1093/ije/dyu149. URL https://doi.org/10.1093/ije/dyu149

work page doi:10.1093/ije/dyu149 1969

[58] [58]

Dang and L.B

L.E. Dang and L.B. Balzer. Start with the target trial protocol; then follow the Roadmap for causal inference. Epidemiology, 34(5):619–623, 2023

work page 2023

[59] [59]

A generalized theory of separable effects in competing event settings

Mats J Stensrud, Miguel A Hern´ an, Eric J Tchetgen Tchetgen, James M Robins, Vanessa Didelez, and Jessica G Young. A generalized theory of separable effects in competing event settings. Lifetime data analysis, 27(4):588–631, 2021

work page 2021

[60] [60]

Separable effects for causal inference in the presence of competing events

Mats J Stensrud, Jessica G Young, Vanessa Didelez, James M Robins, and Miguel A Hern´ an. Separable effects for causal inference in the presence of competing events. Journal of the American Statistical Association, 117(537):175–183, 2022. 30

work page 2022