Meta-analysis of networks of diagnostic tests with binary and continuous results
Pith reviewed 2026-05-10 01:35 UTC · model grok-4.3
The pith
A hierarchical model for network meta-analysis of diagnostic tests incorporates all thresholds from continuous biomarkers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
This is a hierarchical model that incorporates multinomial likelihoods for studies reporting results across multiple thresholds and a parametric structure for the relationship between the probability of testing positive and threshold within each disease class. This approach enables us to obtain accuracy estimates of tests across the whole range of observed thresholds, while it retains all the useful properties of standard NMA-DTA methods.
What carries the argument
Hierarchical model using multinomial likelihoods for multi-threshold data and a parametric curve linking test-positive probability to threshold within each disease class.
If this is right
- Accuracy estimates become available for the full range of observed thresholds instead of only selected points.
- A larger number of tests can be included in a single network analysis.
- Sensitivity and specificity at different thresholds are estimated with greater precision.
- Model variations with different covariance structures or added random effects can be compared directly.
Where Pith is reading between the lines
- Threshold-specific estimates could feed directly into decision models that choose the best cut-off for a given clinical context.
- The framework might later accommodate non-parametric curves when the parametric assumption is too restrictive.
- Networks built this way could support more stable comparisons when new tests or thresholds are added over time.
Load-bearing premise
The chosen parametric form for how the probability of a positive test changes with threshold holds across every study and every test in the network.
What would settle it
If external validation data at a held-out threshold show that the model's predicted sensitivity and specificity curves deviate systematically from the observed proportions, the parametric assumption fails.
Figures
read the original abstract
Network meta-analysis of diagnostic test accuracy (NMA-DTA) is a relatively new field, involving combining evidence across studies to evaluate and compare the accuracy of different tests for a given condition. However, the methods proposed to date cannot always capture complex aspects of the data. In fact, many commonly used diagnostic tests are continuous biomarkers, whose accuracy is evaluated at multiple thresholds within a study. Using current NMA-DTA methods we are feasibly able to include in our analysis only a few thresholds per study, discarding this way a big amount of data which could have provided us with useful information. We introduce an approach that can efficiently encompass all available data. This is a hierarchical model that incorporates multinomial likelihoods for studies reporting results across multiple thresholds and a parametric structure for the relationship between the probability of testing positive and threshold within each disease class. This approach enables us to obtain accuracy estimates of tests across the whole range of observed thresholds, while it retains all the useful properties of standard NMA-DTA methods. We explore different variations of this model based on different covariance structures, the inclusion of study-level random effects, and the addition of a further hierarchical structure on the test-level variance components. This framework is applied to data from two systematic reviews, allowing the inclusion of a larger number of tests (compared to alternative approaches) and estimation of sensitivity and specificity at different thresholds with increased precision.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a hierarchical model extending network meta-analysis of diagnostic test accuracy (NMA-DTA) to handle continuous biomarkers reported at multiple thresholds. It employs multinomial likelihoods for multi-threshold data and imposes a parametric structure linking threshold to the probability of a positive result within each disease class, enabling sensitivity/specificity estimation across the full observed threshold range while retaining standard NMA-DTA borrowing of strength. Variations explore different covariance structures, study-level random effects, and hierarchical variance components. The model is applied to two systematic reviews, claiming inclusion of more tests and increased precision.
Significance. If the parametric assumption holds and is validated, the approach could meaningfully expand usable data in NMA-DTA by avoiding arbitrary threshold selection or data discarding for continuous tests, yielding more precise network-wide accuracy estimates. The retention of established NMA-DTA properties and exploration of covariance structures are positive features; however, the absence of reported quantitative model comparisons, fit statistics, or sensitivity analyses limits assessment of practical gains.
major comments (3)
- [Methods (parametric link)] Methods, parametric structure for P(positive|threshold, disease class): The central claim of obtaining estimates 'across the whole range of observed thresholds' depends on a single parametric form holding uniformly across all included studies and tests. No sensitivity analyses, alternative functional forms, or misspecification diagnostics are described; violation would systematically bias the interpolated curves and all downstream network comparisons.
- [Results] Results, application to two reviews: Claims of 'increased precision' and 'larger number of tests' are made without quantitative support such as credible interval widths, effective sample size comparisons, or direct model fit metrics (e.g., DIC/WAIC) against standard NMA-DTA. This leaves the magnitude of improvement unverified.
- [Model variations] Model variations section: While covariance structures and random effects are explored, the core parametric assumption itself is not relaxed or tested (e.g., via non-parametric alternatives or study-specific shape parameters), making it the load-bearing unexamined component for the 'whole range' estimates.
minor comments (2)
- [Methods] Notation for the parametric threshold-probability function should be given an explicit equation number and clearly distinguished from the multinomial likelihood parameters.
- [Results/Figures] Figure captions for the estimated curves should state the exact parametric family used and any constraints imposed on monotonicity or range.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the strengths and limitations of our proposed hierarchical model for network meta-analysis of diagnostic tests with continuous results. We address each major comment below, agreeing where revisions are needed to provide stronger empirical support and robustness checks.
read point-by-point responses
-
Referee: Methods (parametric link): The central claim of obtaining estimates 'across the whole range of observed thresholds' depends on a single parametric form holding uniformly across all included studies and tests. No sensitivity analyses, alternative functional forms, or misspecification diagnostics are described; violation would systematically bias the interpolated curves and all downstream network comparisons.
Authors: We agree that the parametric structure is central to interpolating across thresholds and that its uniform applicability is an assumption requiring scrutiny. The form was selected based on established relationships in diagnostic biomarker literature (e.g., monotonicity in disease classes). In the revised manuscript we will add sensitivity analyses using alternative forms (linear, quadratic, and fractional polynomial) applied to both case studies, along with posterior predictive checks and comparison of model fit (DIC/WAIC) to assess robustness to misspecification. revision: yes
-
Referee: Results, application to two reviews: Claims of 'increased precision' and 'larger number of tests' are made without quantitative support such as credible interval widths, effective sample size comparisons, or direct model fit metrics (e.g., DIC/WAIC) against standard NMA-DTA. This leaves the magnitude of improvement unverified.
Authors: We acknowledge that the current results section relies on qualitative statements. The revised version will include direct quantitative comparisons: tables reporting average credible interval widths for sensitivity/specificity, effective sample size estimates, and DIC/WAIC values for our model versus standard NMA-DTA (with threshold selection) on the same two datasets. These additions will allow readers to evaluate the practical gains in precision and data inclusion. revision: yes
-
Referee: Model variations section: While covariance structures and random effects are explored, the core parametric assumption itself is not relaxed or tested (e.g., via non-parametric alternatives or study-specific shape parameters), making it the load-bearing unexamined component for the 'whole range' estimates.
Authors: The explored variations target heterogeneity and borrowing of strength across the network, which are the primary extensions beyond standard NMA-DTA. We recognize that the parametric link itself was not varied in the main analysis. In revision we will add a dedicated sensitivity subsection that relaxes the assumption via study-specific shape parameters and a non-parametric alternative (e.g., monotonic splines) for at least one dataset, reporting how network-level estimates change. revision: partial
Circularity Check
No circularity: hierarchical model derives continuous estimates from parametric assumptions and data
full rationale
The paper presents a hierarchical model using multinomial likelihoods for multi-threshold data combined with a parametric link between threshold and positive-test probability within disease classes. Accuracy estimates across the full range of thresholds are obtained by fitting this model and then evaluating the parametric curves; this is a standard model-based interpolation, not a reduction of the output to the inputs by definition or by renaming a fitted quantity as a prediction. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the derivation chain. The approach retains properties of standard NMA-DTA by construction of the hierarchy, but the continuous curves themselves are not tautological with the discrete data points. The central modeling choice (parametric form) is an explicit assumption whose validity is separate from circularity.
Axiom & Free-Parameter Ledger
free parameters (2)
- parameters of the parametric threshold-probability function
- covariance parameters for random effects
axioms (2)
- standard math Results at multiple thresholds within a study follow a multinomial distribution.
- domain assumption A parametric functional form adequately describes the monotonic or smooth relationship between threshold and test-positive probability within disease classes.
Reference graph
Works this paper leans on
-
[1]
Chu, H. and Cole, S. R. [2006], ‘Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach’,Journal of clinical epidemiology59(12), 1331–1332
work page 2006
-
[2]
Derezea, E., Ades, A., Rogers, G., Sutton, A. J., Cooper, N. J., Hamilton, J. and Jones, H. E. [2024], ‘Technical support document 25: Evidence synthesis of diagnostic test accuracy for decision making’
work page 2024
-
[3]
Holper, L., Cerullo, E., Mokros, A. and Habermeyer, E. [2024], ‘Predictive and incremental validity of the static-99, static-99r, and stable-2007 for sexual recidivism: A diagnostic test accuracy network meta-analysis (dta-nma).’,Psychological Assessment36(2), 134
work page 2024
-
[4]
Hoyer, A., Hirt, S. and Kuss, O. [2018], ‘Meta-analysis of full roc curves using bivariate time-to-event models for interval-censored data’,Research synthesis methods9(1), 62–72
work page 2018
-
[5]
Jones, H. E., Gatsonsis, C. A., Trikalinos, T. A., Welton, N. J. and Ades, A. [2019], ‘Quantifying how diagnostic test accuracy depends on threshold in a meta-analysis’,Statistics in Medicine 38(24), 4789–4803
work page 2019
-
[6]
Kawada, T., Shim, S. R., Quhal, F., Rajwa, P., Pradere, B., Yanagisawa, T., Bekku, K., Laukhtina, E., von Deimling, M., Teoh, J. Y.-C. et al. [2024], ‘Diagnostic accuracy of liquid biomarkers for clinically 11 Figure 4: Network from the prostate review:(1)4K,(2)PCA3,(3)PHI,(4)SelectMDx significant prostate cancer detection: a systematic review and diagnos...
work page 2024
-
[7]
Lian, Q., Hodges, J. S. and Chu, H. [2018], ‘A bayesian hierarchical summary receiver operating characteristic model for network meta-analysis of diagnostic tests’,Journal of the American Statistical Association
work page 2018
-
[8]
Ma, X., Lian, Q., Chu, H., Ibrahim, J. G. and Chen, Y. [2018], ‘A bayesian hierarchical model for network meta-analysis of multiple diagnostic tests’,Biostatistics19(1), 87–102
work page 2018
-
[9]
Macaskill, P., Takwoingi, Y., Deeks, J. and Gatsonis, C. [2022],Chapter 9: Understanding meta-analysis. Draft version (4 October 2022) for inclusion in: Deeks JJ, Bossuyt PM, Leeflang MM, Takwoingi Y, editor(s). Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 2, London: Cochrane
work page 2022
-
[10]
Menten, J. and Lesaffre, E. [2015], ‘A general framework for comparative bayesian meta-analysis of diagnostic studies’,BMC medical research methodology15(1), 1–13
work page 2015
-
[11]
Nyaga, V. N., Aerts, M. and Arbyn, M. [2018], ‘Anova model for network meta-analysis of diagnostic test accuracy data’,Statistical methods in medical research27(6), 1766–1784
work page 2018
-
[12]
Owen, R. K., Cooper, N. J., Quinn, T. J., Lees, R. and Sutton, A. J. [2018], ‘Network meta-analysis of diagnostic test accuracy studies identifies and ranks the optimal diagnostic tests and thresholds for health care policy and decision-making’,Journal of clinical epidemiology99, 64–74
work page 2018
-
[13]
Reitsma, J. B., Glas, A. S., Rutjes, A. W., Scholten, R. J., Bossuyt, P. M. and Zwinderman, A. H. [2005], ‘Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews’,Journal of clinical epidemiology58(10), 982–990. 12 Figure 5: Results for the continuous tests of the prostate cancer review
work page 2005
-
[14]
Rogers, G., Derezea, E., Sadler, L., Wang, H., Watt, K., Ryder, S., Cramp, M., Whiting, P., Rogers, M., Bell, J., Oppe, F., Welton, N. and Jones, H. E. [2025], ‘Diagnostic accuracy of tests used in surveillance for hepatocellular carcinoma in people with cirrhosis: systematic review and network meta-analysis’,(Submt.)
work page 2025
-
[15]
Steinhauser, S., Schumacher, M. and R¨ ucker, G. [2016], ‘Modelling multiple thresholds in meta-analysis of diagnostic test accuracy studies’,BMC medical research methodology16, 1–15
work page 2016
-
[16]
Takwoingi, Y., Dendukuri, N., Schiller, I., R¨ ucker, G., Jones, H., Partlett, C. and Macaskill, P. [2022],Chapter 10: Undertaking meta-analysis. Draft version for inclusion in: Deeks JJ, Bossuyt PM, Leeflang MM, Takwoingi Y, editor(s). Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 2, London: Cochrane
work page 2022
-
[17]
Trikalinos, T. A., Balion, C. M., Coleman, C. I., Griffith, L., Santaguida, P. L., Vandermeer, B. and Fu, R. [2012], ‘Chapter 8: meta-analysis of test performance when there is a “gold standard”’,Journal of general internal medicine27(Suppl 1), 56–66
work page 2012
-
[18]
A., Tsokani, S., Agarwal, R., Pagkalidou, E., R¨ ucker, G., Mavridis, D
Veroniki, A. A., Tsokani, S., Agarwal, R., Pagkalidou, E., R¨ ucker, G., Mavridis, D. and Takwoingi, Y. [2022], ‘Diagnostic test accuracy network meta-analysis methods: A scoping review and empirical assessment’,Journal of clinical epidemiology
work page 2022
-
[19]
Walsh, T., Macey, R., Ricketts, D., Carrasco Labra, A., Worthington, H., Sutton, A., Freeman, S., 13 Glenny, A., Riley, P., Clarkson, J. et al. [2022], ‘Enamel caries detection and diagnosis: An analysis of systematic reviews’,Journal of dental research101(3), 261–269
work page 2022
-
[20]
Zhou, X.-H., Obuchowski, N. A. and McClish, D. K. [2014],Statistical methods in diagnostic medicine, John Wiley & Sons. 14 A Appendix A.1 Example 1: Model comparison additional results Table 2: Goodness of fit comparison for the HCC dataset. V1 V2 V3 Met-reg Residual Deviance1071.7 1071.7 1084.5 1081.5 pV 570.0 571.6 593.5 579.2 DIC*1641.7 1643.3 1678.0 1...
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.