Recognition: unknown
A goodness-of-fit test for the logistic propensity score model under nonignorable missing data
Pith reviewed 2026-05-09 23:11 UTC · model grok-4.3
The pith
A bootstrap test based on sum-of-squared residuals checks logistic propensity score models under nonignorable missing data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an unweighted sum-of-squared residuals statistic derived from the marginal missingness mechanism has a tractable asymptotic distribution under both the null that the logistic propensity score model is correct and under general alternatives, and that a bootstrap version of the test attains correct size while being consistent against misspecification.
What carries the argument
Unweighted sum-of-squared residuals statistic constructed from the marginal missingness mechanism, which remains computable under partial observability of the outcome.
If this is right
- The bootstrap test controls type I error at the nominal level asymptotically under the null.
- Power converges to one as sample size grows when the logistic model is misspecified.
- The procedure can be applied directly to check propensity score models in nonignorable missing data settings.
- Finite-sample performance is supported by the reported simulation studies and real-data example.
Where Pith is reading between the lines
- Passing the test would give analysts greater justification for using inverse-probability-weighted estimators that rely on the logistic propensity score.
- The same residual construction might be adapted to test other parametric families for the propensity score.
- The method could be combined with existing diagnostics for the outcome model to perform joint checks in missing-data analyses.
- Computational cost of the bootstrap may become relevant in very large data sets, suggesting possible faster approximations.
Load-bearing premise
The marginal missingness mechanism can be used to construct an unweighted sum-of-squared residuals statistic that is sensitive to logistic misspecification while accommodating partial observability of the outcome.
What would settle it
Large-sample simulations generated from a correctly specified logistic propensity score model in which the bootstrap test rejects at rates materially different from the nominal level would falsify the asymptotic size claim.
read the original abstract
Logistic regression is widely used to model the propensity score in the analysis of nonignorable missing data. However, goodness-of-fit testing for this propensity score model has received limited attention in the literature. In this paper, we propose a new goodness-of-fit testing procedure for the logistic propensity score model under nonignorable missing data. The proposed test is based on an unweighted sum-of-squared residuals constructed from the marginal missingness mechanism and accommodates the partial observability of the outcome. We establish the asymptotic distribution of the test statistic under both the null hypothesis and general alternatives, and develop a bootstrap procedure with theoretical guarantees to approximate its null distribution. We show that the resulting bootstrap test attains asymptotically correct size and is consistent, with power converging to one under model misspecification. Simulation studies and a real data application demonstrate that the proposed method performs well in finite samples.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a goodness-of-fit test for the logistic propensity score model when missingness is nonignorable. The test statistic is an unweighted sum of squared residuals constructed from the marginal missingness probabilities, which accommodates partial observability of the outcome. Asymptotic distributions of the statistic are derived under the null hypothesis of correct logistic specification and under general alternatives; a bootstrap procedure is proposed to approximate the null distribution, with proofs establishing asymptotic validity of the resulting test (correct size and consistency against misspecification). Finite-sample performance is examined via simulations and a real-data illustration.
Significance. The work addresses a clear methodological gap: while logistic propensity scores are routinely used under nonignorable missingness, few formal goodness-of-fit procedures exist that properly handle the partial observability without imposing an outcome model. The derivation that the unweighted residual statistic has mean zero under the null after integrating over the unobserved outcome, together with the bootstrap that jointly re-estimates the marginal mechanism, supplies a theoretically grounded and practical tool. The explicit asymptotic power consistency result is a strength.
minor comments (3)
- [§3.2] §3.2: the notation for the marginal missingness probability π(x) is introduced without an explicit statement that it is estimated jointly with the propensity parameters; a one-sentence clarification would prevent readers from assuming separate estimation.
- [Table 2] Table 2: the reported empirical sizes for n=200 are slightly above nominal level (0.07–0.08); adding a brief remark on whether this is due to the bootstrap tuning parameter or finite-sample bias would strengthen the simulation section.
- [Section 5] The real-data application (Section 5) reports p-values but does not state the number of bootstrap replications used; this detail should be added for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our work and for recommending minor revision. The referee correctly identifies the methodological gap addressed by the proposed bootstrap goodness-of-fit test for the logistic propensity score under nonignorable missingness.
Circularity Check
No significant circularity detected
full rationale
The paper constructs the test statistic directly as an unweighted sum of squared residuals from the marginal missingness mechanism, then derives its asymptotic distribution under the null via empirical process theory and proposes a bootstrap that replicates the joint estimation. These steps are independent of the target result; the mean-zero property under the logistic null and consistency under alternatives follow from the explicit construction and standard limiting arguments without reducing to fitted quantities by definition or load-bearing self-citation. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The propensity score follows a logistic regression model under the null hypothesis.
- standard math Standard regularity conditions hold for the asymptotic distribution of the test statistic.
Reference graph
Works this paper leans on
-
[1]
Berrendero, J. R. and Cuevas, A. and Torrecilla, J L. , title =
-
[2]
2025 , note =
R: A Language and Environment for Statistical Computing , author =. 2025 , note =
2025
-
[3]
D. R. Cox , Journal =
-
[4]
and Holmes, Christopher C
Heard, Nicholas A. and Holmes, Christopher C. and Stephens, David A. , Journal =. A Quantitative Study of Gene Regulation Involved in the Immune Response of
-
[5]
Statistica Sinica , volume=
A simple and efficient estimation method for models with non-ignorable missing data , author=. Statistica Sinica , volume=. 2020 , publisher=
2020
-
[6]
The Annals of Statistics , volume=
Semiparametric optimal estimation with nonignorable nonresponse data , author=. The Annals of Statistics , volume=
-
[7]
Statistica Sinica , volume=
Full-semiparametric-likelihood-based inference for non-ignorable missing data , author=. Statistica Sinica , volume=
-
[8]
Journal of the American Statistical Association , volume=
Identifiability of normal and normal mixture models with nonignorable missing data , author=. Journal of the American Statistical Association , volume=. 2016 , publisher=
2016
-
[9]
Canadian Journal of Statistics , volume =
Morikawa, Kengo and Kim, Jae Kwang and Kano, Yutaka , title =. Canadian Journal of Statistics , volume =
-
[10]
Statistica Sinica , volume =
An instrumental variable approach for identification and estimation with nonignorable nonresponse , author=. Statistica Sinica , volume =. 2014 , publisher=
2014
-
[11]
Scandinavian Journal of Statistics , volume=
A novel semiparametric approach to nonignorable missing data by catching covariate marginal information , author=. Scandinavian Journal of Statistics , volume=. 2025 , publisher=
2025
-
[12]
Biometrics , volume=
Instability of inverse probability weighting methods and a remedy for nonignorable missing data , author=. Biometrics , volume=. 2023 , publisher=
2023
-
[13]
Scandinavian Journal of Statistics , volume=
Pseudo likelihood-based estimation and testing of missingness mechanism function in nonignorable missing data problems , author=. Scandinavian Journal of Statistics , volume=. 2020 , publisher=
2020
-
[14]
Econometrics and Statistics , volume=
A model specification test for semiparametric nonignorable missing data modeling , author=. Econometrics and Statistics , volume=. 2024 , publisher=
2024
-
[15]
Journal of the American Statistical Association , volume=
Goodness-of-fit tests for parametric regression models , author=. Journal of the American Statistical Association , volume=. 2001 , publisher=
2001
-
[16]
Statistics and Computing , volume=
Bootstrap tests for almost goodness-of-fit , author=. Statistics and Computing , volume=. 2026 , publisher=
2026
-
[17]
Statistics in Medicine , volume=
A comparison of goodness-of-fit tests for the logistic regression model , author=. Statistics in Medicine , volume=. 1997 , publisher=
1997
-
[18]
Econometrica , volume=
Maximum likelihood estimation of misspecified models , author=. Econometrica , volume=. 1982 , publisher=
1982
-
[19]
Biometrika , volume=
Inference and missing data , author=. Biometrika , volume=. 1976 , publisher=
1976
-
[20]
, author=
Factors associated with mental health, general health, and school-based service use for child psychopathology. , author=. American Journal of Public Health , volume=. 1997 , publisher=
1997
-
[21]
Journal of the Royal Statistical Society: Series C (Applied Statistics) , volume=
Using auxiliary data for parameter estimation with non-ignorably missing outcomes , author=. Journal of the Royal Statistical Society: Series C (Applied Statistics) , volume=. 2001 , publisher=
2001
-
[22]
Biometrics , volume=
Model-checking techniques based on cumulative residuals , author=. Biometrics , volume=. 2002 , publisher=
2002
-
[23]
2000 , publisher=
Asymptotic Statistics , author=. 2000 , publisher=
2000
-
[24]
Bootstrap consistency for general semiparametric
Cheng, Guang and Huang, Jianhua , journal=. Bootstrap consistency for general semiparametric
-
[25]
Moment consistency of the exchangeably weighted bootstrap for semiparametric
Cheng, Guang , journal=. Moment consistency of the exchangeably weighted bootstrap for semiparametric. 2015 , publisher=
2015
-
[26]
Mathematical Statistics , edition =
Shao, Jun , year=. Mathematical Statistics , edition =
-
[27]
Journal of the American Statistical Association , volume=
Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data , author=. Journal of the American Statistical Association , volume=. 2015 , publisher=
2015
-
[28]
Biometrika , volume=
Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable , author=. Biometrika , volume=. 2001 , publisher=
2001
-
[29]
Journal of the American Statistical Association , year =
Kim, Jae Kwang and Yu, Chang Ling , title =. Journal of the American Statistical Association , year =
-
[30]
Biometrika , year =
Shao, Jun and Wang, Lijuan , title =. Biometrika , year =
-
[31]
Canadian Journal of Statistics , volume=
Receiver operating characteristic curve analysis with non-ignorable missing disease status , author=. Canadian Journal of Statistics , volume=. 2025 , publisher=
2025
-
[32]
2019 , publisher=
Statistical analysis with missing data , author=. 2019 , publisher=
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.