Improving estimation of the volume under the ROC surface when data are missing not at random

Duc-Khanh To; Gianfranco Adimari; Monica Chiogna

arxiv: 1906.08735 · v1 · pith:7WGP4VVXnew · submitted 2019-06-20 · 📊 stat.ME · stat.AP

Improving estimation of the volume under the ROC surface when data are missing not at random

Duc-Khanh To , Gianfranco Adimari , Monica Chiogna This is my paper

Pith reviewed 2026-05-25 19:05 UTC · model grok-4.3

classification 📊 stat.ME stat.AP

keywords VUS estimationverification biasnonignorable missingnessmean score equationsROC surfacediagnostic accuracymissing not at randombias correction

0 comments

The pith

Mean score equations produce consistent estimators for the volume under the ROC surface under nonignorable verification bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to estimate the volume under the ROC surface for a diagnostic test when some disease statuses are missing in a nonignorable way. It uses a parametric model for which subjects get verified, possibly aided by instrumental variables, and solves mean score equations after estimating disease probabilities only among verified cases. This yields four bias-corrected estimators whose consistency and asymptotic normality are derived. The method is checked in simulations and illustrated on an ovarian cancer dataset.

Core claim

By deriving and solving mean score equations from a parametric regression model of the verification process, four estimators for the volume under the ROC surface can be constructed that correct for nonignorable verification bias, achieve consistency, and possess asymptotic normality, with the disease model needed only for verified subjects.

What carries the argument

Mean score equation derived from the parametric verification model, which uses estimated verification and disease probabilities to adjust the VUS calculation for missingness.

If this is right

The estimators are consistent when the verification model is correct.
Asymptotic normality holds and supports inference procedures.
Instrumental variables can be used to address identifiability in the verification model.
The disease model needs to be specified only for verified subjects.
Four distinct estimators can be formed from different combinations of the estimated probabilities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The mean-score route may prove simpler to implement than full-likelihood maximization in some settings.
Similar score-equation adjustments could be explored for other diagnostic accuracy summaries under missing data.
Relaxing the verification model to semiparametric form would be a natural next step to test robustness.

Load-bearing premise

The parametric regression model for the verification process must be correctly specified.

What would settle it

A simulation in which the verification model is misspecified and the resulting VUS estimators exhibit persistent bias or fail to converge to the true value.

read the original abstract

In this paper, we propose a mean score equation-based approach to estimate the the volume under the receiving operating characteristic (ROC) surface (VUS) of a diagnostic test, under nonignorable (NI) verification bias. The proposed approach involves a parametric regression model for the verification process, which accommodates for possible NI missingness in the disease status of sample subjects, and may use instrumental variables, which help avoid possible identifiability problems. In order to solve the mean score equation derived by the chosen verification model, we preliminarily need to estimate the parameters of a model for the disease process, but its specification is required only for verified subjects under study. Then, by using the estimated verification and disease probabilities, we obtain four verification bias-corrected VUS estimators, which are alternative to those recently proposed by To Duc et al. (2019), based on a full likelihood approach. Consistency and asymptotic normality of the new estimators are established. Simulation experiments are conducted to evaluate their finite sample performances, and an application to a dataset from a research on epithelial ovarian cancer is presented.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mean score estimators give a parametric alternative to the authors' 2019 full-likelihood VUS estimators under NI verification bias, but consistency still requires the verification model to be correctly specified.

read the letter

This paper develops mean score equation estimators for the volume under the ROC surface when disease status is missing not at random. It frames the new estimators as an alternative to the full likelihood method in the same group's 2019 paper. The approach fits a parametric regression for the verification indicator, allows instrumental variables to address identifiability, estimates the disease model only on verified cases, and then plugs the resulting probabilities into four corrected VUS estimators. Consistency and asymptotic normality are stated, simulations are run, and an ovarian cancer dataset is analyzed.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a mean score equation-based approach to estimate the volume under the ROC surface (VUS) of a diagnostic test under nonignorable verification bias. It specifies a parametric regression model for the verification process (accommodating NI missingness and possibly using instrumental variables), estimates parameters of a disease model only for verified subjects, derives four verification bias-corrected VUS estimators as alternatives to the full-likelihood method of To Duc et al. (2019), establishes consistency and asymptotic normality of the estimators, and evaluates performance via simulation experiments and an application to epithelial ovarian cancer data.

Significance. If the consistency and asymptotic normality results hold under the stated parametric assumptions, the work supplies a computationally lighter alternative to full-likelihood estimation for VUS correction in the presence of nonignorable verification bias, a frequent challenge in diagnostic accuracy studies. The explicit provision of simulation studies and a real-data example, together with the theoretical guarantees, strengthens the methodological contribution to statistical methods for incomplete diagnostic data.

major comments (2)

[Theoretical results on consistency] The consistency and asymptotic normality results (abstract and theoretical development) are established only under correct specification of the parametric verification model for the entire sample; this assumption is load-bearing for the NI correction yet the manuscript provides no sensitivity analyses or robustness checks when the verification model is misspecified.
[Estimation procedure and identifiability] The approach invokes instrumental variables to restore identifiability under NI missingness (abstract), but supplies neither formal conditions on IV validity/strength nor any diagnostic procedures or sensitivity checks for the chosen instruments; this directly affects practical use of the four proposed estimators.

minor comments (1)

[Abstract] Abstract contains a typographical error ('estimate the the volume').

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and positive overall assessment of our work. We address each major comment point by point below, indicating planned revisions where the manuscript can be strengthened.

read point-by-point responses

Referee: The consistency and asymptotic normality results (abstract and theoretical development) are established only under correct specification of the parametric verification model for the entire sample; this assumption is load-bearing for the NI correction yet the manuscript provides no sensitivity analyses or robustness checks when the verification model is misspecified.

Authors: We agree that the consistency and asymptotic normality results are derived under the assumption of correct specification of the parametric verification model, which is standard for such parametric estimators. The manuscript does not include sensitivity analyses for misspecification. In the revised version, we will add simulation experiments that deliberately misspecify the verification model to assess the robustness of the four proposed estimators. revision: yes
Referee: The approach invokes instrumental variables to restore identifiability under NI missingness (abstract), but supplies neither formal conditions on IV validity/strength nor any diagnostic procedures or sensitivity checks for the chosen instruments; this directly affects practical use of the four proposed estimators.

Authors: The manuscript mentions that instrumental variables may be used to help with identifiability under nonignorable missingness but does not provide formal conditions on validity or strength, nor diagnostics or sensitivity checks. We will revise the manuscript to include a dedicated discussion of these conditions (drawing on standard IV literature), along with practical guidance on diagnostics and sensitivity checks for the instruments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; estimators derived from independent parametric models with explicit consistency proof

full rationale

The paper specifies separate parametric regression models for the verification process (using instrumental variables for identifiability under NI missingness) and the disease process (only on verified subjects). Mean-score equations are solved using these fitted probabilities to produce four VUS estimators. Consistency and asymptotic normality are established directly under the assumption of correct verification-model specification. The method is presented as an alternative to the authors' own prior full-likelihood approach (To Duc et al. 2019), but the new derivation does not reduce to that prior work or to any fitted quantity by construction. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain. The result remains falsifiable via model misspecification checks external to the fitted values.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Review conducted from abstract only; specific model forms and parameter details unavailable. The approach rests on correct specification of parametric models whose parameters are estimated from data.

free parameters (2)

verification model parameters
Parametric regression model for the verification process requires estimation of its coefficients from the observed data.
disease model parameters
Parameters of the disease process model are estimated using only verified subjects.

axioms (1)

domain assumption The chosen parametric regression model for the verification process is correctly specified and accommodates nonignorable missingness.
The mean score equation is derived from this model; misspecification would invalidate the bias correction.

pith-pipeline@v0.9.0 · 5726 in / 1364 out tokens · 37775 ms · 2026-05-25T19:05:35.180750+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

Goeman, J. J. and le Cessie, S. (2006). A goodness-of-ﬁt test for multinomial logistic regres- sion. Biometrics, 62(4):980–985

work page 2006
[2]

Kim, J. K. and Shao, J. (2013). Statistical methods for handling incomplete data . Chapman and Hall/CRC

work page 2013
[3]

and Zhou, X

Liu, D. and Zhou, X. H. (2010). A model for adjusting for nonignorable veriﬁcation bias in estimation of the ROC curve and its area with likelihood–based approach. Biometrics, 66(4):1119–1128

work page 2010
[4]

Louis, T. A. (1982). Finding the observed information matrix when using the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological) , 44(2):226–233

work page 1982
[5]

Mor, G., Visintin, I., Lai, Y., Zhao, H., Schwartz, P., Rutherford, T., Yue, L., Bray-Ward, P., and Ward, D. C. (2005). Serum protein markers for early detection of ovarian cancer. Proceedings of the National Academy of Sciences , 102(21):7677–7682

work page 2005
[6]

K., and Kano, Y

Morikawa, K., Kim, J. K., and Kano, Y. (2017). Semiparametric maximum likelihood esti- mation with data missing not at random. Canadian Journal of Statistics , 45(4):393–409

work page 2017
[7]

Nakas, C. T. and Yiannoutsos, C. T. (2004). Ordered multiple-class ROC analysis with continuous measurements. Statistics in Medicine , 23(22):3437–3449

work page 2004
[8]

K., Kim, J

Riddles, M. K., Kim, J. K., and Im, J. (2016). A propensity-score-adjustment method for nonignorable nonresponse. Journal of Survey Statistics and Methodology , 4(2):215–245. Scurﬁeld, B. K. (1996). Multiple-event forced-choice tasks in the theory of signal detectability. Journal of Mathematical Psychology , 40(3):253–269. To Duc, K., Chiogna, M., and Adi...

work page doi:10.1007/s10260-019-00451-3 2016
[9]

C., Alvero, A

Visintin, I., Feng, Z., Longton, G., Ward, D. C., Alvero, A. B., Lai, Y., Tenthorey, J., Leiser, A., Flores-Saaib, R., Yu, H., et al. (2008). Diagnostic markers for early detection of ovarian cancer. Clinical cancer research, 14(4):1065–1072

work page 2008
[10]

Wang, S., Shao, J., and Kim, J. K. (2014). An instrumental variable approach for identiﬁcation and estimation with nonignorable nonresponse. Statistica Sinica, 24(3):1097–1116

work page 2014
[11]

K., and Park, T

Yu, W., Kim, J. K., and Park, T. (2018). Estimation of area under the ROC curve under the nonignorable veriﬁcation bias. Statistica Sinica, 28(4):2149–2166

work page 2018
[12]

and Alonzo, T

Zhang, Y. and Alonzo, T. A. (2018). Estimation of the volume under the receiver-operating characteristic surface adjusting for non-ignorable veriﬁcation bias. Statistical Methods in Medical Research, 27(3):715–739. 23

work page 2018

[1] [1]

Goeman, J. J. and le Cessie, S. (2006). A goodness-of-ﬁt test for multinomial logistic regres- sion. Biometrics, 62(4):980–985

work page 2006

[2] [2]

Kim, J. K. and Shao, J. (2013). Statistical methods for handling incomplete data . Chapman and Hall/CRC

work page 2013

[3] [3]

and Zhou, X

Liu, D. and Zhou, X. H. (2010). A model for adjusting for nonignorable veriﬁcation bias in estimation of the ROC curve and its area with likelihood–based approach. Biometrics, 66(4):1119–1128

work page 2010

[4] [4]

Louis, T. A. (1982). Finding the observed information matrix when using the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological) , 44(2):226–233

work page 1982

[5] [5]

Mor, G., Visintin, I., Lai, Y., Zhao, H., Schwartz, P., Rutherford, T., Yue, L., Bray-Ward, P., and Ward, D. C. (2005). Serum protein markers for early detection of ovarian cancer. Proceedings of the National Academy of Sciences , 102(21):7677–7682

work page 2005

[6] [6]

K., and Kano, Y

Morikawa, K., Kim, J. K., and Kano, Y. (2017). Semiparametric maximum likelihood esti- mation with data missing not at random. Canadian Journal of Statistics , 45(4):393–409

work page 2017

[7] [7]

Nakas, C. T. and Yiannoutsos, C. T. (2004). Ordered multiple-class ROC analysis with continuous measurements. Statistics in Medicine , 23(22):3437–3449

work page 2004

[8] [8]

K., Kim, J

Riddles, M. K., Kim, J. K., and Im, J. (2016). A propensity-score-adjustment method for nonignorable nonresponse. Journal of Survey Statistics and Methodology , 4(2):215–245. Scurﬁeld, B. K. (1996). Multiple-event forced-choice tasks in the theory of signal detectability. Journal of Mathematical Psychology , 40(3):253–269. To Duc, K., Chiogna, M., and Adi...

work page doi:10.1007/s10260-019-00451-3 2016

[9] [9]

C., Alvero, A

Visintin, I., Feng, Z., Longton, G., Ward, D. C., Alvero, A. B., Lai, Y., Tenthorey, J., Leiser, A., Flores-Saaib, R., Yu, H., et al. (2008). Diagnostic markers for early detection of ovarian cancer. Clinical cancer research, 14(4):1065–1072

work page 2008

[10] [10]

Wang, S., Shao, J., and Kim, J. K. (2014). An instrumental variable approach for identiﬁcation and estimation with nonignorable nonresponse. Statistica Sinica, 24(3):1097–1116

work page 2014

[11] [11]

K., and Park, T

Yu, W., Kim, J. K., and Park, T. (2018). Estimation of area under the ROC curve under the nonignorable veriﬁcation bias. Statistica Sinica, 28(4):2149–2166

work page 2018

[12] [12]

and Alonzo, T

Zhang, Y. and Alonzo, T. A. (2018). Estimation of the volume under the receiver-operating characteristic surface adjusting for non-ignorable veriﬁcation bias. Statistical Methods in Medical Research, 27(3):715–739. 23

work page 2018