pith. sign in

arxiv: 2604.27409 · v1 · submitted 2026-04-30 · 📊 stat.ME · stat.AP

Robust inference methods of diagnostic test accuracy meta-analysis for influential outlying studies via density power divergence

Pith reviewed 2026-05-07 10:13 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords diagnostic test accuracymeta-analysisrobust inferencedensity power divergenceoutliersbivariate random effectssensitivity and specificityHyvärinen score
0
0 comments X

The pith

Density power divergence modifies the estimating equations in bivariate meta-analysis to automatically downweight outlying studies when synthesizing diagnostic test accuracy data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops frequentist methods for meta-analyzing pairs of sensitivity and specificity that stay reliable when some primary studies behave as outliers. It incorporates density power divergence into the estimating functions of the bivariate random-effects model, so that studies with low density under the fitted model receive reduced weight through a single tuning parameter. A data-adaptive Hyvärinen-score criterion is supplied to choose that parameter and preserve efficiency when outliers are absent. Simulations show the resulting pooled estimates have lower bias and root-mean-squared error together with better coverage than standard maximum-likelihood fits once outliers appear. The same framework also supplies explicit measures of each study's contribution to the final robust estimates.

Core claim

The proposed methods modify the estimating function of the bivariate random-effects model using density power divergence with a tuning parameter to downweight influential outlying studies in diagnostic test accuracy meta-analysis. Practical strategies, including a Hyvärinen score criterion, select the tuning parameter for robust yet efficient inference. Individual study contributions to the pooled estimates are quantified to interpret outlier effects. Application to the Mini-Mental State Examination meta-analysis and simulations confirm reduced bias, lower root mean squared error, and better coverage in the presence of outliers.

What carries the argument

Density power divergence applied to the score equations of the bivariate random-effects model, which downweights studies whose observed sensitivity-specificity pairs have low density under the current parameter values.

If this is right

  • Pooled sensitivity and specificity estimates exhibit lower bias when outlying studies are present.
  • Confidence-interval coverage probabilities improve relative to standard methods under outlier contamination.
  • Each study's numerical contribution to the final robust estimates can be calculated and reported.
  • The same workflow supports routine sensitivity analyses that test whether standard results are driven by a small number of studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same divergence-weighted estimating equations could be applied to other paired-outcome meta-analyses such as those for agreement measures or change scores.
  • Reviewers conducting diagnostic accuracy syntheses could adopt the method as a default check before reporting standard pooled values.
  • The Hyvärinen-score selection rule may transfer to other robust-divergence estimators outside the diagnostic-test setting.

Load-bearing premise

The bivariate random-effects model remains a reasonable description of the data structure once the density-power-divergence weighting has been applied, and the Hyvärinen-score criterion selects a tuning parameter that keeps the efficiency loss tolerable.

What would settle it

A controlled simulation in which the robust estimators exhibit higher bias or root-mean-squared error than ordinary maximum-likelihood estimators under the same outlier configurations.

Figures

Figures reproduced from arXiv: 2604.27409 by Hisashi Noma, Kotaro Sasaki, Theodoros Evrenoglou.

Figure 1
Figure 1. Figure 1: a presents the forest plot for this meta-analysis. Eight studies were included, with sensitivity ranging from 27% to 89% and specificity ranging from 33% to 90%. The standardized residuals were relatively large for Studies 1, 2, and 3, suggesting that these studies may be outlying and influential (Figure 1b) view at source ↗
Figure 2
Figure 2. Figure 2: The DPD weights and the contribution rates for the pooled sensitivity and specificity of individual studies under the proposed method using 𝛼𝛼GES. 4. Simulations In this section, we describe the methods for the simulation studies conducted to evaluate the performance of our proposed methods, using the ADEMP framework24, and then present the results. 4.1 Aim and data generating mechanisms The aim of the sim… view at source ↗
Figure 3
Figure 3. Figure 3: Simulation results: bias and RMSE for pooled sensitivity and specificity estimates. The five scenarios (A)–(E) correspond to settings of (𝑘𝑘1, 𝑘𝑘2) = (0, 0), (2, 0), (0, 2), (2, 1), (1, 2) , where 𝑘𝑘1 and 𝑘𝑘2 are the numbers of studies with outlying sensitivity and outlying specificity, respectively. (BNN, bivariate normal￾normal model; BBN, bivariate binomial-normal model; BFM, bivariate finite mixture mo… view at source ↗
Figure 4
Figure 4. Figure 4: Simulation results: coverage probability and mean width of 95% confidence intervals. The five scenarios (A)–(E) correspond to settings of (𝑘𝑘1, 𝑘𝑘2) = (0, 0), (2, 0), (0, 2), (2, 1), (1, 2), where 𝑘𝑘1 and 𝑘𝑘2 are the numbers of studies with outlying sensitivity and outlying specificity, respectively. In panel (a), triangles indicate bars that were truncated for visual clarity, and the corresponding actual … view at source ↗
read the original abstract

In diagnostic test accuracy meta-analysis (DTA-MA), standard inference methods using bivariate random-effects models for jointly synthesizing sensitivity and specificity can be sensitive to outlying studies and may yield misleading conclusions. In this article, we propose frequentist outlier-robust statistical inference methods for DTA-MA based on density power divergence. The proposed methods automatically downweight influential outlying studies by modifying the estimating function using the robust divergence with a tuning parameter. To achieve robust yet statistically efficient inference in the presence of outlying studies, the proposed methods incorporate practical strategies for selecting the tuning parameter, including a data-adaptive criterion based on the Hyv\"arinen score. We also quantify the contributions of individual studies to the robust pooled estimates, facilitating interpretation of how outlying studies affect the results. We illustrate the effectiveness of the proposed methods through an application to a DTA-MA of the Mini-Mental State Examination. Simulation studies showed that the proposed methods reduced bias and root mean squared error relative to existing methods and improved coverage probability in the presence of outliers. The proposed methods enable a sensitivity analysis to assess whether the main results obtained using standard methods are driven by outlying studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes frequentist outlier-robust inference methods for diagnostic test accuracy meta-analysis (DTA-MA) based on density power divergence. These methods modify the estimating functions of bivariate random-effects models to automatically downweight influential outlying studies via a tuning parameter, with data-adaptive selection of the parameter using the Hyvärinen score. The approach also quantifies individual study contributions to the pooled estimates. Effectiveness is shown through an application to a DTA-MA of the Mini-Mental State Examination and simulation studies that report reduced bias, lower root mean squared error, and improved coverage probabilities relative to standard methods when outliers are present.

Significance. If the central claims hold after addressing the asymptotic issues, the work would provide a useful addition to the toolkit for robust meta-analysis in diagnostic accuracy studies, where outlying studies frequently arise and can distort pooled sensitivity and specificity estimates. The combination of automatic downweighting, interpretable study influence measures, and a real-data example supports practical sensitivity analysis, while the simulation results offer initial evidence of improved finite-sample performance under contamination.

major comments (2)
  1. The proposed estimator is a two-step procedure in which the tuning parameter α is selected by minimizing a data-dependent Hyvärinen score before the weighted estimating equations are solved. The manuscript does not derive the additional term in the asymptotic expansion arising from this selection step, nor does it provide an adjusted sandwich or bootstrap variance estimator that accounts for it. Consequently, the nominal coverage of the reported confidence intervals may be optimistic, which directly undermines the simulation claim of improved coverage probability in the presence of outliers.
  2. The target of inference after applying per-study density-power-divergence weights is no longer necessarily the population mean of the bivariate random-effects distribution; the weighted estimator may converge to a different functional (closer to a weighted median). The paper should clarify whether the estimand remains the original bivariate parameters or shifts under the robust weighting, and whether this affects the interpretation of the pooled sensitivity and specificity.
minor comments (2)
  1. The abstract and methods description refer to 'practical strategies' for tuning-parameter selection beyond the Hyvärinen score; explicit comparison of these alternatives (e.g., cross-validation or fixed α grids) in the simulation section would strengthen the practical guidance.
  2. Notation for the modified estimating function and the Hyvärinen score criterion should be introduced with explicit equations early in the methods section to improve readability for readers unfamiliar with density power divergence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The two major comments identify important gaps in the asymptotic justification and the precise interpretation of the estimand under robust weighting. We address each point below and will incorporate the necessary clarifications and methodological extensions in the revised manuscript.

read point-by-point responses
  1. Referee: The proposed estimator is a two-step procedure in which the tuning parameter α is selected by minimizing a data-dependent Hyvärinen score before the weighted estimating equations are solved. The manuscript does not derive the additional term in the asymptotic expansion arising from this selection step, nor does it provide an adjusted sandwich or bootstrap variance estimator that accounts for it. Consequently, the nominal coverage of the reported confidence intervals may be optimistic, which directly undermines the simulation claim of improved coverage probability in the presence of outliers.

    Authors: We agree that treating the data-adaptively chosen α as fixed omits an additional source of variability in the asymptotic expansion. The current sandwich variance therefore conditions on α and may understate uncertainty, potentially leading to optimistic coverage. In the revision we will derive the extra term arising from the Hyvärinen-score minimization and will replace the reported variance estimator with a bootstrap procedure that re-selects α in every replicate. The simulation study will be updated with these adjusted intervals so that the coverage comparisons are based on a variance estimator that fully accounts for the two-step procedure. revision: yes

  2. Referee: The target of inference after applying per-study density-power-divergence weights is no longer necessarily the population mean of the bivariate random-effects distribution; the weighted estimator may converge to a different functional (closer to a weighted median). The paper should clarify whether the estimand remains the original bivariate parameters or shifts under the robust weighting, and whether this affects the interpretation of the pooled sensitivity and specificity.

    Authors: We thank the referee for highlighting this distinction. When the bivariate random-effects model holds exactly and there are no outliers, the robust estimator (any fixed α > 0) converges to the same population parameters as the standard maximum-likelihood estimator. Under contamination, however, it converges to a different functional that down-weights outlying studies and is closer to a robust location measure. We will revise the manuscript to state this explicitly in the methods and discussion sections, and we will clarify that the reported pooled sensitivity and specificity are to be interpreted as robust summaries of the central tendency rather than as estimates of the mean under a correctly specified uncontaminated model. This does not alter the practical goal of sensitivity analysis but improves the precision of the interpretation. revision: yes

Circularity Check

0 steps flagged

No circularity: robustness claims rest on external simulation evidence and data-adaptive external criterion

full rationale

The paper defines a density-power-divergence weighted estimating function for the bivariate random-effects model and selects the tuning parameter via the Hyvärinen score, an independent criterion. No equation equates the target estimator to a quantity defined from the same fitted values, no prediction is obtained by construction from a subset fit, and no uniqueness theorem or ansatz is imported solely via self-citation. Simulation results are presented as separate empirical checks rather than algebraic identities. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The proposal rests on the standard bivariate random-effects model for paired sensitivity and specificity plus a single tuning parameter whose value is chosen by a data-adaptive score.

free parameters (1)
  • tuning parameter (alpha)
    Controls the degree of downweighting of outlying studies; selected via Hyvärinen score criterion.
axioms (1)
  • domain assumption Bivariate random-effects model for jointly modeling sensitivity and specificity across studies
    Invoked as the base model that is then robustified.

pith-pipeline@v0.9.0 · 5512 in / 1334 out tokens · 47559 ms · 2026-05-07T10:13:02.134381+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests

    Deeks JJ. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ 2001; 323: 157-162

  2. [2]

    Systematic reviews of diagnostic test accuracy

    Leeflang MM, Deeks JJ, Gatsonis C, et al. Systematic reviews of diagnostic test accuracy. Ann Intern Med 2008; 149: 889-897

  3. [3]

    Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews

    Reitsma JB, Glas AS, Rutjes AWS, et al. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epide miol 2005; 58: 982-990

  4. [4]

    Bivariate meta -analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach

    Chu H and Cole SR. Bivariate meta -analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. J Clin Epidemiol 2006; 59: 1331-1332

  5. [5]

    A bivariate finite mixture random effects model for identifying and accommodating outliers in diagnostic test accuracy meta- analyses

    Negeri ZF. A bivariate finite mixture random effects model for identifying and accommodating outliers in diagnostic test accuracy meta- analyses. Biom J 2025; 67: e70062

  6. [6]

    Robust bivariate random -effects model for accommodating outlying and influential studies in meta-analysis of diagnostic test accuracy studies

    Negeri ZF and Beyene J. Robust bivariate random -effects model for accommodating outlying and influential studies in meta-analysis of diagnostic test accuracy studies. Stat Methods Med Res 2020; 29: 3308-3325

  7. [7]

    A new approach to outliers in meta-analysis

    Baker R and Jackson D. A new approach to outliers in meta-analysis. Health Care Manag Sci 2008; 11: 121-131

  8. [8]

    Outlier and influence diagnostics for meta-analysis

    Viechtbauer W and Cheung MW. Outlier and influence diagnostics for meta-analysis. Res Synth Methods 2010; 1: 112-125

  9. [9]

    Influence diagnostics and outlier detection for meta-analysis of diagnostic test accuracy

    Matsushima Y , Noma H, Yamada T, et al. Influence diagnostics and outlier detection for meta-analysis of diagnostic test accuracy. Res Synth Methods 2020; 11: 237-247

  10. [10]

    Statistical methods for detecting outlying and influential studies in meta-analysis of diagnostic test accuracy studies

    Negeri ZF and Beyene J. Statistical methods for detecting outlying and influential studies in meta-analysis of diagnostic test accuracy studies. Stat Me thods Med Res 2020; 29: 1227-1242

  11. [11]

    Robust inference methods for meta -analysis involving influential outlying studies

    Noma H, Sugasawa S and Furukawa TA. Robust inference methods for meta -analysis involving influential outlying studies. Stat Med 2024; 43: 3778-3791

  12. [12]

    Meta-analysis models relaxing the random-effects normality assumption: methodological systematic review and simulation study

    Panagiotopoulou K, Evrenoglou T, Schmid CH, et al. Meta-analysis models relaxing the random-effects normality assumption: methodological systematic review and simulation study. BMC Med Res Methodol 2025; 25: 231

  13. [13]

    Robust and efficient estimation by minimising a density power divergence

    Basu A, Harris IR, Hjort NL, et al. Robust and efficient estimation by minimising a density power divergence. Biometrika 1998; 85: 549-559

  14. [14]

    Robust estimation for non-homogeneous data and the selection of the optimal tuning parameter: the density power divergence approach

    Ghosh A and Basu A. Robust estimation for non-homogeneous data and the selection of the optimal tuning parameter: the density power divergence approach. J Appl Stat 2015; 42: 2056-2072

  15. [15]

    On selection criteria for the tuning parameter in robust divergence

    Sugasawa S and Yonekura S. On selection criteria for the tuning parameter in robust divergence. Entropy 2021; 23: 1147. 20

  16. [16]

    Robust estimation for independent non-homogeneous observations using density power divergence with applications to linear regression

    Ghosh A and Basu A. Robust estimation for independent non-homogeneous observations using density power divergence with applications to linear regression. Electron J Stat 2013; 7: 2420-2456

  17. [17]

    Robust and efficient estimation in the parametric proportional hazards model under random censoring

    Ghosh A and Basu A. Robust and efficient estimation in the parametric proportional hazards model under random censoring. Stat Med 2019; 38: 5283-5299

  18. [18]

    Robust estimation of fixed effect parameters and variances of linear mixed models: the minimum density power divergence approach

    Saraceno G, Ghosh A, Basu A, et al. Robust estimation of fixed effect parameters and variances of linear mixed models: the minimum density power divergence approach. AStA Adv Stat Anal 2023; 108: 127-157

  19. [19]

    A refined method for the meta -analysis of controlled clinical trials with binary outcome

    Hartung J and Knapp G. A refined method for the meta -analysis of controlled clinical trials with binary outcome. Stat Med 2001; 20: 3875-3889

  20. [20]

    A simple confidence interval for meta-analysis

    Sidik K and Jonkman JN. A simple confidence interval for meta-analysis. Stat Med 2002; 21: 3153-3159

  21. [21]

    A refined method for multivariate meta -analysis and meta- regression

    Jackson D and Riley RD. A refined method for multivariate meta -analysis and meta- regression. Stat Med 2014; 33: 541-554

  22. [22]

    The influence curve and its role in robust estimation

    Hampel FR. The influence curve and its role in robust estimation. J Am Stat Assoc 1974; 69: 383-393

  23. [23]

    Mini -Mental State Examination (MMSE) for the detection of Alzheimer's disease and other dementias in people with mild cognitive impairment (MCI)

    Arevalo-Rodriguez I, Smailagic N, Roque IFM, et al. Mini -Mental State Examination (MMSE) for the detection of Alzheimer's disease and other dementias in people with mild cognitive impairment (MCI). Cochrane Database Syst Rev 2015; 2015: CD010783

  24. [24]

    Using simulation studies to evaluate statistical methods

    Morris TP, White IR and Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med 2019; 38: 2074-2102

  25. [25]

    Influential observations, high leverage points, and outliers in linear regression

    Chatterjee S and Hadi AS. Influential observations, high leverage points, and outliers in linear regression. Stat Sci 1986; 1: 379-393

  26. [26]

    A score based approach to wild bootstrap inference

    Kline P and Santos A. A score based approach to wild bootstrap inference. J Econom Methods 2012; 1: 23-41

  27. [27]

    Generalized bootstrap for estimating equations

    Chatterjee S and Bose A. Generalized bootstrap for estimating equations. Ann Stat 2005; 33: 414-436

  28. [28]

    Outlier detection and influence diagnostics in network meta-analysis

    Noma H, Gosho M, Ishii R, et al. Outlier detection and influence diagnostics in network meta-analysis. Res Synth Methods 2020; 11: 891-902

  29. [29]

    Influence analyses of "designs" for evaluating inconsistency in network meta-analysis

    Sasaki K and Noma H. Influence analyses of "designs" for evaluating inconsistency in network meta-analysis. arXiv preprint arXiv:2406.16485, 2025

  30. [30]

    Minimizing robust density power -based divergences for general parametric density models

    Okuno A. Minimizing robust density power -based divergences for general parametric density models. Ann Inst Stat Math 2024; 76: 851-875