Semiparametric Difference-in-Differences Estimation With Missing Not at Random Data: A Shadow Variable Approach
Pith reviewed 2026-06-27 17:50 UTC · model grok-4.3
The pith
A shadow variable that is independent of missingness given covariates and outcome evolution identifies the ATT in semiparametric difference-in-differences models with MNAR data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the conditional independence of the shadow variable from the missingness process given covariates and the (possibly unobserved) outcome evolution, the ATT is identified and can be estimated semiparametrically via the derived algorithm.
What carries the argument
The shadow variable together with its conditional independence from the missingness indicator given covariates and outcome evolution, which separates the variable's link to outcomes from its relation to missingness and thereby restores identification of the ATT.
If this is right
- The ATT becomes point-identified without requiring the data to be missing at random.
- A semiparametric estimator for the ATT can be constructed directly from the identification formula.
- The estimator's finite-sample properties can be assessed through Monte Carlo experiments under the maintained assumptions.
- The same framework applies to empirical panels that contain a suitable fully observed shadow variable.
Where Pith is reading between the lines
- Similar shadow-variable strategies could be explored for other panel estimators that currently assume missing at random.
- Sensitivity checks that vary the set of conditioning variables might help assess robustness of the independence restriction in practice.
- The approach suggests searching for auxiliary variables that track outcome dynamics without directly influencing response probabilities in other missing-data settings.
Load-bearing premise
The shadow variable remains independent of the missingness indicator after conditioning on covariates and the outcome evolution.
What would settle it
A dataset in which the shadow variable remains correlated with the missingness indicator after conditioning on covariates and observed outcome changes would violate the key independence restriction and invalidate the identification result.
Figures
read the original abstract
This paper considers a semiparametric difference-in-differences (DID) framework for identifying and estimating treatment effects on the treated (ATT) when outcomes are missing not at random (MNAR), and a fully observed shadow variable is available. The shadow variable is assumed to be associated with the outcome evolution but independent of the missingness process, conditional on covariates and the possibly unobserved outcome evolution. We establish the identification conditions, derive the corresponding identification results and estimation algorithm, and evaluate the finite-sample performance of the proposed estimator through simulation studies and a real data application.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a semiparametric difference-in-differences framework to identify and estimate the average treatment effect on the treated (ATT) when the outcome is missing not at random. It introduces a fully observed shadow variable assumed to be associated with the outcome evolution but conditionally independent of the missingness process given covariates and the (possibly unobserved) outcome evolution. The paper establishes identification under this restriction, derives the corresponding identification formula and semiparametric estimation algorithm, and assesses finite-sample performance via simulations and a real-data application.
Significance. If the central identification result holds, the contribution is a practical extension of DID methods to MNAR settings that avoids the standard missing-at-random assumption by using an auxiliary shadow variable. This could be useful in empirical applications where outcome data are incomplete and the shadow variable is plausibly available. The semiparametric approach and accompanying simulations provide a concrete estimation procedure whose performance can be evaluated directly.
minor comments (2)
- [Abstract] The abstract states that identification results and an estimation algorithm are derived but does not display any of the key identifying equations or the form of the estimator; including the main identification formula (presumably in §3 or §4) in the abstract would improve accessibility.
- [Identification section] The description of the shadow-variable conditional independence assumption would benefit from an explicit statement of the conditioning set (covariates, treatment, and outcome evolution) in a single displayed equation early in the identification section.
Simulated Author's Rebuttal
We thank the referee for their positive summary of the paper and for recommending minor revision. The referee's description of the contribution is accurate. No major comments were provided in the report.
Circularity Check
No significant circularity detected
full rationale
The paper derives identification and semiparametric estimation of the ATT from an external conditional independence assumption on the shadow variable (independent of missingness given covariates and outcome evolution). This assumption is stated as the key identifying restriction and is not obtained by fitting, self-definition, or self-citation; the subsequent identification formula and algorithm follow from it without reducing any claimed prediction to the inputs by construction. No enumerated circularity pattern is present.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Shadow variable is associated with outcome evolution but independent of missingness process conditional on covariates and outcome evolution
Reference graph
Works this paper leans on
-
[1]
The review of economic studies , volume =
Semiparametric difference-in-differences estimators , author =. The review of economic studies , volume =. 2005 , publisher =
2005
-
[2]
arXiv preprint arXiv:2309.15983 , year =
What to do (and not to do) with causal panel analysis under parallel trends: Lessons from a large reanalysis study , author =. arXiv preprint arXiv:2309.15983 , year =
-
[3]
International Review of Financial Analysis , volume=
Does the marginal child increase household debt?--Evidence from the new fertility policy in China , author=. International Review of Financial Analysis , volume=. 2021 , publisher=
2021
-
[4]
Journal of Econometrics , volume =
A new instrumental method for dealing with endogenous selection , author =. Journal of Econometrics , volume =. 2010 , publisher =
2010
-
[5]
arXiv preprint arXiv:2207.11561 , year =
Alternative approaches for analysing repeated measures data that are missing not at random , author =. arXiv preprint arXiv:2207.11561 , year =
-
[6]
Journal of Econometrics , volume =
Correcting attrition bias using changes-in-changes , author =. Journal of Econometrics , volume =. 2024 , publisher =
2024
-
[7]
Econometrica , pages =
On the role of the propensity score in efficient semiparametric estimation of average treatment effects , author =. Econometrica , pages =. 1998 , publisher =
1998
-
[8]
The review of economic studies , volume =
Matching as an econometric evaluation estimator: Evidence from evaluating a job training programme , author =. The review of economic studies , volume =. 1997 , publisher =
1997
-
[9]
Journal of biopharmaceutical statistics , volume =
Last observation carried forward: a crystal ball? , author =. Journal of biopharmaceutical statistics , volume =. 2009 , publisher =
2009
-
[10]
Biometrika , volume =
On varieties of doubly robust estimators under missingness not at random with a shadow variable , author =. Biometrika , volume =. 2016 , publisher =
2016
-
[11]
ACM/JMS Journal of Data Science , volume =
Identification and semiparametric efficiency theory of nonignorable missing data with a shadow variable , author =. ACM/JMS Journal of Data Science , volume =. 2024 , publisher =
2024
-
[12]
Econometric Reviews , volume =
Revisiting regression adjustment in experiments with heterogeneous treatment effects , author =. Econometric Reviews , volume =. 2021 , publisher =
2021
-
[13]
Econometrica , volume =
Instrumental variable estimation of nonparametric models , author =. Econometrica , volume =. 2003 , publisher =
2003
-
[14]
, author =
Estimating causal effects of treatments in randomized and nonrandomized studies. , author =. Journal of educational Psychology , volume =. 1974 , publisher =
1974
-
[15]
Journal of econometrics , volume =
Doubly robust difference-in-differences estimators , author =. Journal of econometrics , volume =. 2020 , publisher =
2020
-
[16]
Statistics in medicine , volume =
Last observation carry-forward and last observation analysis , author =. Statistics in medicine , volume =. 2003 , publisher =
2003
-
[17]
arXiv preprint arXiv:2411.18772 , year =
Difference-in-differences Design with Outcomes Missing Not at Random , author =. arXiv preprint arXiv:2411.18772 , year =
-
[18]
Statistica Sinica , volume =
Semiparametric estimation with data missing not at random using an instrumental variable , author =. Statistica Sinica , volume =
-
[19]
Statistica Sinica , volume =
On semiparametric instrumental variable estimation of average treatment effects through data fusion , author =. Statistica Sinica , volume =. 2022 , publisher =
2022
-
[20]
Biometrics , volume =
A general instrumental variable framework for regression analysis with outcome missing not at random , author =. Biometrics , volume =. 2017 , publisher =
2017
-
[21]
2006 , publisher =
Semiparametric theory and missing data , author =. 2006 , publisher =
2006
-
[22]
Statistica Sinica , pages =
An instrumental variable approach for identification and estimation with nonignorable nonresponse , author =. Statistica Sinica , pages =. 2014 , publisher =
2014
-
[23]
Journal of the American Statistical Association , volume =
Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data , author =. Journal of the American Statistical Association , volume =. 2015 , publisher =
2015
-
[24]
China Economic Review , volume =
Mobile payment and Chinese rural household consumption , author =. China Economic Review , volume =. 2022 , publisher =
2022
-
[25]
Journal of the American Statistical Association , volume =
A versatile estimation procedure without estimating the nonignorable missingness mechanism , author =. Journal of the American Statistical Association , volume =. 2022 , publisher =
2022
-
[26]
Journal of the American Statistical Association , volume =
To adjust or not to adjust? estimating the average treatment effect in randomized experiments with missing covariates , author =. Journal of the American Statistical Association , volume =. 2024 , publisher =
2024
-
[27]
Biometrika , volume =
Covariate adjustment in randomized experiments with missing outcomes and covariates , author =. Biometrika , volume =. 2024 , publisher =
2024
-
[28]
The Econometrics Journal , volume=
Double/debiased machine learning for difference-in-differences models , author=. The Econometrics Journal , volume=. 2020 , publisher=
2020
-
[29]
arXiv preprint arXiv:2508.02097 , year=
A difference-in-differences estimator by covariate balancing propensity score , author=. arXiv preprint arXiv:2508.02097 , year=
-
[30]
American journal of epidemiology , volume=
Bespoke instruments: a new tool for addressing unmeasured confounders , author=. American journal of epidemiology , volume=. 2022 , publisher=
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.