Identification In Missing Data Models Represented By Directed Acyclic Graphs

Ilya Shpitser; James M. Robins; Razieh Nabi; Rohit Bhattacharya

arxiv: 1907.00241 · v1 · pith:TI42VMGRnew · submitted 2019-06-29 · 📊 stat.ML · cs.LG

Identification In Missing Data Models Represented By Directed Acyclic Graphs

Rohit Bhattacharya , Razieh Nabi , Ilya Shpitser , James M. Robins This is my paper

Pith reviewed 2026-05-25 12:26 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords missing dataidentifiabilitydirected acyclic graphscausal inferenceID algorithmmissing not at randomgraphical modelscensored data

0 comments

The pith

Missing data models on directed acyclic graphs contain identifiable target distributions that existing algorithms miss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard identification procedures for recovering a target distribution from censored observations leave a substantial class of cases unidentified even when the distribution is recoverable from the observed data under the given DAG. It introduces a new algorithm that broadens the set of graph manipulations beyond those in the ID algorithm from causal inference. A sympathetic reader would care because correct identification is required before any downstream estimation or inference can be guaranteed to be unbiased under missingness mechanisms that are not missing at random. The work therefore enlarges the set of missing data problems that can be solved without additional parametric assumptions.

Core claim

The most general identification strategies proposed so far retain a significant gap in that they fail to identify a wide class of identifiable distributions; a new algorithm that significantly generalizes the types of manipulations used in the ID algorithm recovers these distributions whenever they are identifiable under the missing data DAG.

What carries the argument

A generalized manipulation algorithm that extends the ID algorithm's operations to missing data mechanisms represented by a factorization with respect to a directed acyclic graph.

If this is right

More target distributions become recoverable without requiring parametric restrictions on the missingness mechanism.
Inference procedures can be applied to a larger collection of missing data problems represented by DAGs.
The gap between what is identifiable and what prior algorithms could identify is narrowed.
Identification results carry over directly to estimation once the functional is obtained.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may suggest similar generalizations for identification in other graphical missing data settings beyond standard DAGs.
Practical implementations could be tested by constructing synthetic examples where identifiability holds but earlier methods fail.
Connections to causal effect identification under missingness could allow joint handling of both problems in the same graph.

Load-bearing premise

The missing data mechanism is correctly represented by a factorization with respect to the given directed acyclic graph.

What would settle it

A concrete missing data DAG together with an explicit target distribution that is identifiable from the observed law but is not recovered by the new algorithm, or a distribution the algorithm returns that is in fact not a function of the observed data alone.

Figures

Figures reproduced from arXiv: 1907.00241 by Ilya Shpitser, James M. Robins, Razieh Nabi, Rohit Bhattacharya.

**Figure 1.** Figure 1: Identification of p(Y (a)) by following a total order of valid fixing operations. 3 MISSING DATA MODELS OF A DAG Missing data models are sets of full data laws (distributions) p(X(1) , O, R) composed of the target laws p(X(1) , O), and the nuisance laws p(R|X(1) , O) defining the missingness processes. The target law is over a set X(1) ≡ {X (1) 1 , . . . , X(1) k } of random variables that are potentiall… view at source ↗

**Figure 2.** Figure 2: (a), (b), (c) are intermediate graphs obtained in i [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: (a) A DAG where Rs are fixed according to a partial order. (b) The CADMG obtained by fixing R2. responding to this kernel is shown in [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: A DAG where selection bias on R1 is avoidable by following a partial order fixing schedule on an ADMG induced by latent projecting out X (1) 1 . observed data, meaning that p(R1|X (1) 2 ) is identified as q˜1(R1|X2, 1R2,R3 ). This implies the target law is identified in this model. In general, to identify p(Ri | paG (Ri)), we may need to use separate partial fixing orders on different sets of variables f… view at source ↗

**Figure 5.** Figure 5: (a) A DAG where the fixing operator must be [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: A DAG where variables besides Rs are required to be fixed. ables outside R, including variables in X(1) that become observed after fixing or conditioning on some elements of R [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: (a) A complex missing data DAG used to illustrate th [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: (a) Graph corresponding to the kernel obtained in ( [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Execution of the fixing schedule to obtain the prope [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Execution of the fixing schedule to obtain the prop [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

read the original abstract

Missing data is a pervasive problem in data analyses, resulting in datasets that contain censored realizations of a target distribution. Many approaches to inference on the target distribution using censored observed data, rely on missing data models represented as a factorization with respect to a directed acyclic graph. In this paper we consider the identifiability of the target distribution within this class of models, and show that the most general identification strategies proposed so far retain a significant gap in that they fail to identify a wide class of identifiable distributions. To address this gap, we propose a new algorithm that significantly generalizes the types of manipulations used in the ID algorithm, developed in the context of causal inference, in order to obtain identification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper identifies a real gap in prior missing-data identification on DAGs and gives a generalized algorithm extending the ID manipulations, but the full proofs and examples are needed to confirm it works cleanly.

read the letter

This paper points out that existing identification procedures for missing data models on DAGs leave some identifiable distributions unidentified, and it offers a new algorithm that broadens the manipulations from the standard ID algorithm to close that gap. The abstract is direct about the limitation in prior work and positions the contribution as a practical extension rather than a complete overhaul. That framing is useful because missing data problems on graphs come up often in applied causal work, and any method that recovers more cases without new assumptions is worth checking. The approach builds on established causal identification tools, which gives it a solid foundation rather than starting from scratch. The main strength is the explicit claim that previous strategies were incomplete, backed by the promise of examples showing the difference. A soft spot is that the abstract alone does not show the actual steps of the new algorithm or the proofs that the generalized manipulations are sound and do not introduce false positives. Without seeing those details, it is difficult to judge whether the extension covers the claimed class of distributions or whether some cases still slip through. The weakest assumption appears to be that the DAG correctly encodes the missingness mechanism and that the new operations recover the target exactly when it is identifiable. If the paper supplies clear counterexamples to prior methods and verifies the new procedure on them, that would strengthen the case considerably. This is the kind of incremental but targeted advance that fits well in the causal inference literature. I would bring it to a reading group for the algorithm description and examples, and I would cite it if the proofs hold up. It deserves peer review because the problem is concrete and the proposed fix is stated in terms of existing machinery rather than ad hoc fixes.

Referee Report

2 major / 2 minor

Summary. The paper considers identifiability of a target distribution in missing-data models whose observed-data law factorizes according to a given DAG. It argues that existing general-purpose identification procedures (including extensions of the causal ID algorithm) leave a non-trivial gap, failing to recover the target even when it is identifiable under the DAG. The authors introduce a new algorithm that enlarges the set of allowed manipulations beyond those in the standard ID algorithm and claim that the resulting procedure recovers the target whenever it is identifiable.

Significance. If the soundness claim holds, the result would close a documented gap in the graphical identification literature for missing data and would allow routine application of a single algorithm to a strictly larger class of identifiable problems than was previously possible. The work directly extends a well-studied causal-inference primitive (the ID algorithm) rather than starting from scratch, which increases its immediate utility.

major comments (2)

[§4] §4, Algorithm 1, lines 12–18: the generalized ‘missingness intervention’ operation is defined by replacing the conditional distribution of the missingness indicator with a fixed value; the manuscript must supply an explicit inductive argument showing that each such step preserves the observed-data law when the target is identifiable under the input DAG. Without this argument the completeness claim rests on the examples alone.
[Example 3] Example 3 (the three-variable chain with MNAR missingness on the middle variable): the paper asserts that prior ID-based procedures return ‘unidentified’ while the new algorithm returns the correct functional. The derivation of the functional should be written out in full (including the explicit expression for the recovered density) so that readers can verify it does not rely on an implicit parametric assumption.

minor comments (2)

[§2–3] Notation for the observed-data law versus the full-data law is introduced inconsistently between §2 and §3; a single table of symbols would eliminate repeated parenthetical clarifications.
[Figures 1–3] The running example graphs would be easier to follow if every node were explicitly labeled as fully observed, partially observed, or missingness indicator.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight areas where additional rigor and clarity will improve the manuscript. We address each major comment below and will incorporate the requested material in the revision.

read point-by-point responses

Referee: [§4] §4, Algorithm 1, lines 12–18: the generalized ‘missingness intervention’ operation is defined by replacing the conditional distribution of the missingness indicator with a fixed value; the manuscript must supply an explicit inductive argument showing that each such step preserves the observed-data law when the target is identifiable under the input DAG. Without this argument the completeness claim rests on the examples alone.

Authors: We agree that an explicit inductive argument is required to establish that each generalized missingness intervention preserves the observed-data law. In the revised manuscript we will insert a formal inductive proof in §4 that proceeds by induction on the number of interventions, showing preservation at each step under the assumption that the target is identifiable from the input DAG. This will place the completeness claim on a rigorous footing rather than relying primarily on examples. revision: yes
Referee: [Example 3] Example 3 (the three-variable chain with MNAR missingness on the middle variable): the paper asserts that prior ID-based procedures return ‘unidentified’ while the new algorithm returns the correct functional. The derivation of the functional should be written out in full (including the explicit expression for the recovered density) so that readers can verify it does not rely on an implicit parametric assumption.

Authors: We will expand Example 3 to contain a complete, line-by-line derivation of the recovered functional. The expanded example will explicitly state each algebraic step and the final expression for the target density, making clear that the derivation uses only the graphical structure and the definition of the new operations, without any parametric restrictions. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes a new identification algorithm for missing-data models on DAGs by generalizing manipulations from the causal ID algorithm. The derivation chain consists of defining the class of models via DAG factorization, exhibiting a gap in prior methods via counterexamples, and presenting generalized operations whose soundness is argued directly from the graphical structure rather than by fitting parameters or reducing to self-citations. No equation equates a claimed prediction to an input by construction, no uniqueness theorem is imported solely from overlapping prior work as an external fact, and the central result does not rename a known empirical pattern. The reference to the ID algorithm functions as an external foundation from causal inference, not a load-bearing loop internal to this manuscript. The derivation is therefore self-contained against the stated graphical assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities beyond the standard assumption that missingness follows a DAG factorization.

pith-pipeline@v0.9.0 · 5650 in / 1033 out tokens · 35538 ms · 2026-05-25T12:26:52.883368+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

new algorithm that significantly generalizes the types of manipulations used in the ID algorithm... fixing operations according to a partial order... fix sets of variables jointly
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

nested factorization... fixing operator φV

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

Pearl’s calcu- lus of interventions is complete

Yimin Huang and Marco V altorta. Pearl’s calcu- lus of interventions is complete. In Twenty Sec- ond Conference On Uncertainty in Artiﬁcial Intel- ligence, 2006

work page 2006
[2]

Lauritzen

Steffan L. Lauritzen. Graphical Models . Oxford, U.K.: Clarendon, 1996

work page 1996
[3]

Graphical models for recovering probabilistic and causal queries from missing data

Karthika Mohan and Judea Pearl. Graphical models for recovering probabilistic and causal queries from missing data. In Advances in Neural Information Processing Systems, pages 1520–1528. 2014

work page 2014
[4]

Graph- ical models for inference with missing data

Karthika Mohan, Judea Pearl, and Jin Tian. Graph- ical models for inference with missing data. In Ad- vances in Neural Information Processing Systems , pages 1277–1285, 2013

work page 2013
[5]

Probabilistic Reasoning in Intelligent Systems

Judea Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan and Kaufmann, San Mateo, 1988

work page 1988
[6]

Causality: Models, Reasoning, and Inference

Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2 edition, 2009

work page 2009
[7]

S., Evans, R

Thomas S. Richardson, Robin J. Evans, James M. Robins, and Ilya Shpitser. Nested Markov properties for acyclic directed mixed graphs. arXiv:1701.06686v2, 2017. Working paper

work page arXiv 2017
[8]

James M. Robins. A new approach to causal in- ference in mortality studies with sustained expo- sure periods – application to control of the healthy worker survivor effect. Mathematical Modeling , 7:1393–1512, 1986

work page 1986
[9]

James M. Robins. Non-response models for the analysis of non-monotone non-ignorable missing data. Statistics in Medicine , 16:21–37, 1997

work page 1997
[10]

D. B. Rubin. Causal inference and missing data (with discussion). Biometrika, 63:581–592, 1976

work page 1976
[11]

Mauricio Sadinle and Jerome P . Reiter. Item- wise conditionally independent nonresponse mod- elling for incomplete multivariate data. Biometrika, 104(1):207–220, 2017

work page 2017
[12]

Consistent estimation of functions of data missing non-monotonically and not at random

Ilya Shpitser. Consistent estimation of functions of data missing non-monotonically and not at random. In Advances in Neural Information Processing Sys- tems, pages 3144–3152, 2016

work page 2016
[13]

Missing data as a causal and probabilistic prob- lem

Ilya Shpitser, Karthika Mohan, and Judea Pearl. Missing data as a causal and probabilistic prob- lem. In Proceedings of the Thirty First Conference on Uncertainty in Artiﬁcial Intelligence (UAI-15) , pages 802–811. AUAI Press, 2015

work page 2015
[14]

Identiﬁcation of joint interventional distributions in recursive semi- Markovian causal models

Ilya Shpitser and Judea Pearl. Identiﬁcation of joint interventional distributions in recursive semi- Markovian causal models. In Proceedings of the Twenty-First National Conference on Artiﬁcial In- telligence (AAAI-06). AAAI Press, 2006

work page 2006
[15]

Tchetgen Tchetgen, Linbo Wang, and BaoLuo Sun

Eric J. Tchetgen Tchetgen, Linbo Wang, and BaoLuo Sun. Discrete choice models for non- monotone nonignorable missing data: Identiﬁca- tion and inference. Statistica Sinica , 28(4):2069– 2088, 2018

work page 2069
[16]

A general identiﬁcation condition for causal effects

Jin Tian and Judea Pearl. A general identiﬁcation condition for causal effects. In Eighteenth National Conference on Artiﬁcial Intelligence , pages 567– 573, 2002

work page 2002
[17]

Semiparametric Theory and Missing Data

Anastasios Tsiatis. Semiparametric Theory and Missing Data. Springer-V erlag New Y ork, 1st edi- tion edition, 2006

work page 2006
[18]

Y an Zhou, Roderick J. A. Little, and John D. Kalbﬂeisch. Block-conditional missing at ran- dom models for missing data. Statistical Science , 25(4):517–532, 2010. 7 APPENDIX A. Proofs Proposition 1 Given a DAG G(X(1), R, O, X), the distribution p(Ri|paG(Ri))|paG(Ri)∩ R=1 is identiﬁable from p(R, O, X) if there exists (i) Z⊆ X(1)∪ R∪ O, (ii) an equivalenc...

work page 2010

[1] [1]

Pearl’s calcu- lus of interventions is complete

Yimin Huang and Marco V altorta. Pearl’s calcu- lus of interventions is complete. In Twenty Sec- ond Conference On Uncertainty in Artiﬁcial Intel- ligence, 2006

work page 2006

[2] [2]

Lauritzen

Steffan L. Lauritzen. Graphical Models . Oxford, U.K.: Clarendon, 1996

work page 1996

[3] [3]

Graphical models for recovering probabilistic and causal queries from missing data

Karthika Mohan and Judea Pearl. Graphical models for recovering probabilistic and causal queries from missing data. In Advances in Neural Information Processing Systems, pages 1520–1528. 2014

work page 2014

[4] [4]

Graph- ical models for inference with missing data

Karthika Mohan, Judea Pearl, and Jin Tian. Graph- ical models for inference with missing data. In Ad- vances in Neural Information Processing Systems , pages 1277–1285, 2013

work page 2013

[5] [5]

Probabilistic Reasoning in Intelligent Systems

Judea Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan and Kaufmann, San Mateo, 1988

work page 1988

[6] [6]

Causality: Models, Reasoning, and Inference

Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2 edition, 2009

work page 2009

[7] [7]

S., Evans, R

Thomas S. Richardson, Robin J. Evans, James M. Robins, and Ilya Shpitser. Nested Markov properties for acyclic directed mixed graphs. arXiv:1701.06686v2, 2017. Working paper

work page arXiv 2017

[8] [8]

James M. Robins. A new approach to causal in- ference in mortality studies with sustained expo- sure periods – application to control of the healthy worker survivor effect. Mathematical Modeling , 7:1393–1512, 1986

work page 1986

[9] [9]

James M. Robins. Non-response models for the analysis of non-monotone non-ignorable missing data. Statistics in Medicine , 16:21–37, 1997

work page 1997

[10] [10]

D. B. Rubin. Causal inference and missing data (with discussion). Biometrika, 63:581–592, 1976

work page 1976

[11] [11]

Mauricio Sadinle and Jerome P . Reiter. Item- wise conditionally independent nonresponse mod- elling for incomplete multivariate data. Biometrika, 104(1):207–220, 2017

work page 2017

[12] [12]

Consistent estimation of functions of data missing non-monotonically and not at random

Ilya Shpitser. Consistent estimation of functions of data missing non-monotonically and not at random. In Advances in Neural Information Processing Sys- tems, pages 3144–3152, 2016

work page 2016

[13] [13]

Missing data as a causal and probabilistic prob- lem

Ilya Shpitser, Karthika Mohan, and Judea Pearl. Missing data as a causal and probabilistic prob- lem. In Proceedings of the Thirty First Conference on Uncertainty in Artiﬁcial Intelligence (UAI-15) , pages 802–811. AUAI Press, 2015

work page 2015

[14] [14]

Identiﬁcation of joint interventional distributions in recursive semi- Markovian causal models

Ilya Shpitser and Judea Pearl. Identiﬁcation of joint interventional distributions in recursive semi- Markovian causal models. In Proceedings of the Twenty-First National Conference on Artiﬁcial In- telligence (AAAI-06). AAAI Press, 2006

work page 2006

[15] [15]

Tchetgen Tchetgen, Linbo Wang, and BaoLuo Sun

Eric J. Tchetgen Tchetgen, Linbo Wang, and BaoLuo Sun. Discrete choice models for non- monotone nonignorable missing data: Identiﬁca- tion and inference. Statistica Sinica , 28(4):2069– 2088, 2018

work page 2069

[16] [16]

A general identiﬁcation condition for causal effects

Jin Tian and Judea Pearl. A general identiﬁcation condition for causal effects. In Eighteenth National Conference on Artiﬁcial Intelligence , pages 567– 573, 2002

work page 2002

[17] [17]

Semiparametric Theory and Missing Data

Anastasios Tsiatis. Semiparametric Theory and Missing Data. Springer-V erlag New Y ork, 1st edi- tion edition, 2006

work page 2006

[18] [18]

Y an Zhou, Roderick J. A. Little, and John D. Kalbﬂeisch. Block-conditional missing at ran- dom models for missing data. Statistical Science , 25(4):517–532, 2010. 7 APPENDIX A. Proofs Proposition 1 Given a DAG G(X(1), R, O, X), the distribution p(Ri|paG(Ri))|paG(Ri)∩ R=1 is identiﬁable from p(R, O, X) if there exists (i) Z⊆ X(1)∪ R∪ O, (ii) an equivalenc...

work page 2010