Self-separated and self-connected models for mediator and outcome missingness in mediation analysis
Pith reviewed 2026-05-23 17:22 UTC · model grok-4.3
The pith
Identification of mediation effects remains possible when both mediator and outcome are missing under self-separated conditional independence or self-connected shadow variable models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Self-separated missingness models identify mediation quantities via conditional independence assumptions that eliminate certain edges, while self-connected models identify the same quantities by leveraging shadow variables placed at different points in the graph and by allowing unobserved common causes of the missingness indicators; together these classes yield identification templates and a substantial extension of shadow variable theory for the mediation setting.
What carries the argument
Self-separated missingness models (identification via conditional independence assumptions that remove connections) and self-connected missingness models (identification via shadow variables that may be built-in or auxiliary).
If this is right
- Identification templates apply directly to mediation problems with missing mediator and outcome under the stated assumptions.
- Models can incorporate dependencies from unobserved causes of missingness without losing identifiability where the shadow-variable conditions hold.
- Shadow variables can be placed at multiple positions including mediator, outcome, covariates, or external auxiliaries.
- The synthesis connects existing missingness models to new variants and supplies generally useful identification techniques beyond the mediation setting.
Where Pith is reading between the lines
- The same self-separated and self-connected constructions could be tested for applicability to missingness in longitudinal or survival mediation settings.
- Sensitivity analyses that vary the strength of the required independences or shadow-variable relations would quantify robustness of the identification results.
- Empirical checks for the presence of usable shadow variables in observational datasets could guide whether self-connected models are feasible in a given study.
Load-bearing premise
The stated conditional independence assumptions hold or the required shadow variables exist with the necessary properties in the actual data-generating process.
What would settle it
A simulation or real dataset in which the conditional independences are violated or no suitable shadow variables are present would produce non-identifiable mediation effects and biased estimates under the proposed models.
Figures
read the original abstract
Missing data is a common challenge in studying treatment effects. In the context of mediation analysis, this paper addresses missingness in the mediator and outcome, focusing on identification. We first consider self-separated missingness models where identification is achieved by conditional independence assumptions. This model class is somewhat limited as it is constrained by the need to remove a certain number of connections from the model. We then turn to self-connected missingness models where identification relies on information from shadow variables. This model class turns out to contain substantial variation, allowing models with built-in shadow variables (mediator, outcome or covariates) and models with auxiliary shadow variables at different positions in the causal structure. To improve the practical value of the missingness mechanisms, we allow where possible for dependencies due to unobserved causes of the missingness, a feature often neglected. In this exploration, we review existing models, connect to new models, and develop theory where needed. This results in templates for identification in the mediation setting, generally useful identification techniques, and perhaps most importantly a synthesis and substantial extension of shadow variable theory. Two examples relate the models to practical considerations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops identification results for mediation analysis when both the mediator and outcome are subject to missingness. It introduces self-separated missingness models that achieve identification via conditional independence assumptions removing certain edges from the causal graph, and self-connected models that rely on shadow variables (either built-in from the mediator/outcome/covariates or auxiliary at various positions). The work allows for dependencies induced by unobserved causes of missingness, reviews prior models, develops new theory as needed, and supplies identification templates along with two practical examples. The central contribution is framed as a synthesis and substantial extension of shadow-variable theory to this setting.
Significance. If the derivations are correct, the results supply concrete identification functionals and templates for a practically relevant missing-data problem in causal mediation analysis. The synthesis of shadow-variable approaches, combined with explicit allowance for unobserved missingness causes, could reduce reliance on complete-case or ad-hoc imputation methods and provide generally reusable identification techniques beyond the specific mediation setting.
minor comments (3)
- The abstract notes that self-separated models are 'somewhat limited' by the need to remove connections; the main text should quantify this limitation (e.g., maximum number of edges removable while preserving identification) and delineate the precise graph structures for which the class remains useful.
- The claim of a 'substantial extension of shadow variable theory' would benefit from an explicit comparison table or subsection contrasting the new model classes against the models reviewed in the literature section, highlighting which identification functionals are novel.
- The two examples are described only at the abstract level; the main text should clarify whether they use real data, simulated data, or both, and report sensitivity checks when the external justification for the conditional independences or shadow-variable properties is varied.
Simulated Author's Rebuttal
We thank the referee for the constructive summary of our work on self-separated and self-connected missingness models in mediation analysis and for recommending minor revision. No specific major comments were provided in the report, so our response below addresses the overall assessment.
Circularity Check
Theoretical identification templates under external conditional independence and shadow-variable assumptions; derivation chain remains self-contained
full rationale
The paper presents identification results for mediator/outcome missingness by defining two model classes (self-separated via conditional independences; self-connected via shadow variables) and deriving templates from those assumptions. These assumptions are explicitly stated as requiring external justification rather than being derived from data or prior results within the paper. No equations reduce a claimed prediction or identification functional to a fitted parameter or self-defined quantity by construction. Existing models are reviewed and extended, but the central synthesis does not rely on load-bearing self-citations whose validity is presupposed without independent verification. This is standard theoretical model development with no circular reduction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Conditional independence assumptions required for identification in self-separated missingness models
- domain assumption Existence and positioning of shadow variables sufficient for identification in self-connected models
Reference graph
Works this paper leans on
-
[1]
Reuben M Baron and David A Kenny. The Moderator - Mediator Variable Distinction in Social Psychological Research : Conceptual , Strategic , and Statistical Considerations . Journal of Personality and Social Psychology, 51 0 (6): 0 1173--1182, 1986
work page 1986
-
[2]
David Benkeser, Iván Díaz, and Jialu Ran. Inference for natural mediation effects under case-cohort sampling with applications in identifying COVID -19 vaccine correlates of protection. arXiv:2103.02643 [math, stat], 2021
-
[3]
Rohit Bhattacharya, Razieh Nabi, Ilya Shpitser, and James M. Robins. Identification In Missing Data Models Represented By Directed Acyclic Graphs . In Proceedings of The 35th Uncertainty in Artificial Intelligence Conference , pages 1149--1158. PMLR, 2020. ISSN: 2640-3498
work page 2020
-
[4]
Stephen R. Cole and Constantine E. Frangakis. The Consistency Statement in Causal Inference : A Definition or an Assumption ? Epidemiology, 20 0 (1), 2009. doi:10.1097/EDE.0b013e31818ef366
-
[5]
S. Ghazaleh Dashti, Katherine J. Lee, Julie A. Simpson, John B. Carlin, and Margarita Moreno-Betancur. Handling multivariable missing data in causal mediation analysis, 2024. arXiv:2403.17396 [stat]
-
[6]
Direct and Indirect Effects of Sequential Treatments
Vanessa Didelez, Philip Dawid, and Sara Geneletti. Direct and Indirect Effects of Sequential Treatments . In Proceedings of the Twenty - Second Conference on Uncertainty in Artificial Intelligence , 2006. doi:10.48550/arXiv.1206.6840
-
[7]
A new instrumental method for dealing with endogenous selection
Xavier d’Haultfoeuille. A new instrumental method for dealing with endogenous selection. Journal of Econometrics, 154 0 (1): 0 1--15, 2010. doi:10.1016/j.jeconom.2009.06.005
-
[8]
Nonparametric causal mediation analysis for stochastic interventional (in)direct effects
Nima S Hejazi, Kara E Rudolph, Mark J Van Der Laan, and Iván Díaz. Nonparametric causal mediation analysis for stochastic interventional (in)direct effects. Biostatistics, 24 0 (3): 0 686--707, 2023. doi:10.1093/biostatistics/kxac002
-
[9]
A general approach to causal mediation analysis
Kosuke Imai, Luke Keele, and Dustin Tingley. A general approach to causal mediation analysis. Psychological Methods, 15 0 (4): 0 309--334, 2010 a . doi:10.1037/a0020761
-
[10]
Identification, Inference and Sensitivity Analysis for Causal Mediation Effects
Kosuke Imai, Luke Keele, and Teppei Yamamoto. Identification, Inference and Sensitivity Analysis for Causal Mediation Effects . Statistical Science, 25 0 (1), 2010 b . doi:10.1214/10-STS321
-
[11]
Phillip S. Kott. Calibration Weighting When Model and Calibration Variables Can Differ . In Fulvia Mecatti, Pier Luigi Conti, and Maria Giovanna Ranalli, editors, Contributions to Sampling Statistics , pages 1--18. Springer International Publishing, Cham, 2014. ISBN 978-3-319-05320-2. doi:10.1007/978-3-319-05320-2_1
-
[12]
Nuisance mediators and missing data in mediation analyses of pain trials
Hopin Lee. Nuisance mediators and missing data in mediation analyses of pain trials. European Journal of Pain, 24 0 (9): 0 1651--1652, 2020. doi:10.1002/ejp.1637
-
[13]
E. L. Lehmann and Henry Scheffé. Completeness, Similar Regions , and Unbiased Estimation - Part I . Sankhyā: The Indian Journal of Statistics, 10 0 (4): 0 305--340, 1950. doi:10.1007/978-1-4614-1412-4_23
-
[14]
Identifiability and estimation of causal mediation effects with missing data
Wei Li and Xiao-Hua Zhou. Identifiability and estimation of causal mediation effects with missing data. Statistics in Medicine, 36 0 (25): 0 3948--3965, 2017. doi:10.1002/sim.7413
-
[15]
Yilin Li, Wang Miao, Ilya Shpitser, and Eric J. Tchetgen Tchetgen. A self-censoring model for multivariate nonignorable nonmonotone missing data. Biometrics, 79 0 (4): 0 3203--3214, 2023. doi:10.1111/biom.13916
-
[16]
Estimating Mixed Memberships With Sharp Eigenvector Deviations
Daniel Malinsky, Ilya Shpitser, and Eric J. Tchetgen Tchetgen. Semiparametric Inference for Nonmonotone Missing - Not -at- Random Data : The No Self - Censoring Model . Journal of the American Statistical Association, 117 0 (539): 0 1415--1423, 2022. doi:10.1080/01621459.2020.1862669
-
[17]
Wang Miao and Eric J. Tchetgen Tchetgen. On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika, 103 0 (2): 0 475--482, 2016. doi:10.1093/biomet/asw016
-
[18]
Wang Miao, Lan Liu, Eric Tchetgen Tchetgen, and Zhi Geng. Identification, Doubly Robust Estimation , and Semiparametric Efficiency Theory of Nonignorable Missing Data With a Shadow Variable , 2015. arXiv:1509.02556v2
-
[19]
Tchetgen Tchetgen, and Zhi Geng
Wang Miao, Lan Liu, Yilin Li, Eric J. Tchetgen Tchetgen, and Zhi Geng. Identification and Semiparametric Efficiency Theory of Nonignorable Missing Data with a Shadow Variable . ACM / IMS Journal of Data Science, 1 0 (2), 2024. doi:10.1145/3592389
-
[20]
On Handling Self -masking and Other Hard Missing Data Problems
Karthika Mohan. On Handling Self -masking and Other Hard Missing Data Problems . In AAAI Spring Symposium 2019 , 2019. URL https://why19.causalai.net/papers/mohan-why19.pdf
work page 2019
-
[21]
Graphical Models for Processing Missing Data
Karthika Mohan and Judea Pearl. Graphical Models for Processing Missing Data . Journal of the American Statistical Association, 116 0 (534): 0 1023--1037, 2021. doi:10.1080/01621459.2021.1874961
-
[22]
Graphical Models for Inference with Missing Data
Karthika Mohan, Judea Pearl, and Jin Tian. Graphical Models for Inference with Missing Data . In Advances in Neural Information Processing Systems , volume 26. Curran Associates, Inc., 2013
work page 2013
-
[23]
On Testability and Goodness of Fit Tests in Missing Data Models
Razieh Nabi and Rohit Bhattacharya. On Testability and Goodness of Fit Tests in Missing Data Models . In Proceedings of the Thirty - Ninth Conference on Uncertainty in Artificial Intelligence , pages 1467--1477. PMLR, 2023. ISSN: 2640-3498
work page 2023
-
[24]
Full Law Identification in Graphical Models of Missing Data : Completeness Results
Razieh Nabi, Rohit Bhattacharya, and Ilya Shpitser. Full Law Identification in Graphical Models of Missing Data : Completeness Results . In Proceedings of the 37th International Conference on Machine Learning , pages 7153--7163. PMLR, 2020. URL https://proceedings.mlr.press/v119/nabi20a.html
work page 2020
-
[25]
Trang Quynh Nguyen, Ian Schmid, and Elizabeth A. Stuart. Clarifying causal mediation analysis for the applied researcher: Defining effects based on what we want to learn. Psychological Methods, 26 0 (2): 0 255--271, 2021. doi:10.1037/met0000299
-
[26]
Judea Pearl. Direct and Indirect Effects . In Proceedings of the seventeenth conference on uncertainty and artificial intelligence, pages 411--420. 2001
work page 2001
-
[27]
Richard F Potthoff, Gail E Tudor, Karen S Pieper, and Vic Hasselblad. Can one assess whether missing data are missing at random in medical studies? Statistical Methods in Medical Research, 15 0 (3): 0 213--234, 2006. doi:10.1191/0962280206sm448oa
-
[28]
James M. Robins and Sander Greenland. Identifiability and Exchangeability for Direct and Indirect Effects . Epidemiology, 3 0 (2): 0 143--155, 1992. doi:10.1097/00001648-199203000-00013
-
[29]
Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66 0 (5): 0 688--701, 1974. doi:10.1037/h0037350
-
[30]
Jerzy Splawa-Neyman. On the application of probability theory to agricultural experiments. Essay on principles. Section 9 [ Translated and edited by DM Dabrowska and TP Speed ; republished in Statistical Science, 1990, 5(4):465-472 ]. Annals of Agricultural Sciences, 10: 0 1--51, 1923. doi:10.1214/ss/1177012031
-
[31]
Ranjani Srinivasan, Rohit Bhattacharya, Razieh Nabi, Elizabeth L. Ogburn, and Ilya Shpitser. Graphical Models of Entangled Missingness , April 2023. URL http://arxiv.org/abs/2304.01953. arXiv:2304.01953
-
[32]
Semiparametric estimation with data missing not at random using an instrumental variable
BaoLuo Sun, Lan Liu, Wang Miao, Kathleen Wirth, James Robins, and Eric J Tchetgen Tchetgen. Semiparametric estimation with data missing not at random using an instrumental variable. Statistica Sinica, 28 0 (4): 0 1965, 2018
work page 1965
-
[33]
VanderWeele, Stijn Vansteelandt, and James M
Tyler J. VanderWeele, Stijn Vansteelandt, and James M. Robins. Effect Decomposition in the Presence of an Exposure - Induced Mediator - Outcome Confounder :. Epidemiology, 25 0 (2): 0 300--306, 2014. doi:10.1097/EDE.0000000000000034
-
[34]
An Instrumental Variable Approach for Identification and Estimation with Nonignorable Nonresponse
Sheng Wang, Jun Shao, and Jae Kwang Kim. An Instrumental Variable Approach for Identification and Estimation with Nonignorable Nonresponse . Statistica Sinica, 24 0 (3): 0 1097--1116, 2014
work page 2014
-
[35]
Causal inference with confounders missing not at random
S Yang, L Wang, and P Ding. Causal inference with confounders missing not at random. Biometrika, 106 0 (4): 0 875--888, 2019. doi:10.1093/biomet/asz048
-
[36]
Methods for mediation analysis with missing data
Zhiyong Zhang and Lijuan Wang. Methods for mediation analysis with missing data. Psychometrika, 78 0 (1): 0 154--184, 2013. doi:10.1007/s11336-012-9301-5
-
[37]
Semiparametric Pseudo - Likelihoods in Generalized Linear Models With Nonignorable Missing Data
Jiwei Zhao and Jun Shao. Semiparametric Pseudo - Likelihoods in Generalized Linear Models With Nonignorable Missing Data . Journal of the American Statistical Association, 110 0 (512): 0 1577--1590, 2015. doi:10.1080/01621459.2014.983234
-
[38]
Population-level balance in signed networks
Shuozhi Zuo, Debashis Ghosh, Peng Ding, and Fan Yang. Mediation analysis with the mediator and outcome missing not at random. Journal of the American Statistical Association, 2024. doi:10.1080/01621459.2024.2359132
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.