Recognition: unknown
A theory of ROC analysis of rule-out and rule-in diagnostics with applications to mammography data
Pith reviewed 2026-05-08 07:38 UTC · model grok-4.3
The pith
Bivariate copulas show that higher radiologist-AI correlation on diseased cases raises the AUC of rule-out ROC curves, with the opposite pattern for rule-in.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When two tests are combined under a rule-out strategy, the AUC of the resulting ROC curve increases if correlation is higher among diseased cases and lower among non-diseased cases. The opposite correlation pattern increases AUC under a rule-in strategy. Bivariate copulas prove these relations by separating the marginal distributions of each test from their dependence structure.
What carries the argument
Bivariate copulas that model dependence between the two tests independently of their marginal distributions, allowing explicit derivation of combined ROC curves and AUCs for rule-out and rule-in rules.
Load-bearing premise
The joint behavior of the two diagnostic tests can be described by a bivariate copula separating marginal distributions from dependence.
What would settle it
Observe whether increasing radiologist-AI correlation specifically on diseased cases in mammography data raises the empirical AUC of the rule-out curve as predicted.
Figures
read the original abstract
Multiple diagnostic tests are frequently used to determine the presence of a disease condition in patients. In this paper, we use bivariate copulas to examine the properties of receiver operating characteristic (ROC) curves formed when two correlated diagnostic tests are used together to rule-out ("believe the negative") and rule-in ("believe the positive") patients for disease. We use this theory to analyze three mammography data sets where AI devices are applied to reduce radiologists' workload or improve diagnostic performance. Our analysis shows with generality that increasing the radiologist-AI correlation for diseased cases enhances the area under the ROC curve (AUC) of a radiologist-AI rule-out curve, whereas decreasing correlation for non-diseased cases has a similar effect. The opposite trends hold for rule-in scenarios. Applications to clinical mammography data show that projected empirical radiologist performance under a rule-out or rule-in scenario is consistent with the theory.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a bivariate copula framework to derive properties of ROC curves for combined rule-out (both tests negative) and rule-in (both tests positive) diagnostics from two correlated tests. It claims general results that increasing radiologist-AI correlation among diseased cases (or decreasing it among non-diseased cases) raises the AUC of the rule-out ROC, with opposite effects for rule-in; these are illustrated on three mammography datasets showing empirical consistency.
Significance. If the derivations hold, the work supplies a flexible, copula-based toolkit for quantifying how test dependence modulates combined diagnostic performance in rule-out and rule-in settings. This is directly relevant to AI-assisted mammography, where the theory can inform workload-reduction strategies. The separation of marginals from dependence via copulas and the data applications are clear strengths.
major comments (2)
- [§3] §3 (Theoretical results on correlation and AUC): The generality claim that varying the copula parameter produces the stated monotonic AUC effects for rule-out and rule-in does not automatically extend to all bivariate copulas. For families with asymmetric tail dependence (Clayton for lower tail, Gumbel for upper tail), increasing the parameter at fixed Kendall’s tau can alter joint survival probabilities near the relevant thresholds differently than for symmetric copulas (Gaussian, Frank). The integral defining the AUC may therefore change sign for some threshold choices; the derivation should state the required copula properties or include explicit checks for asymmetric families.
- [§4] §4 (Mammography applications): The reported consistency between theory and the three datasets depends on the chosen copula family and marginal estimators. No sensitivity analysis to alternative copulas (or to the specific parametric forms) is provided, leaving open whether the observed agreement is robust or an artifact of the selected dependence structure.
minor comments (2)
- [Notation] The notation for the combined rule-out and rule-in thresholds (around Eq. (5)–(7)) would be clearer if accompanied by a small numerical example showing how the joint probability is computed from the copula.
- [Figures] Figures displaying the empirical and theoretical ROC curves lack confidence bands or variability measures, making it hard to judge how closely the data track the predicted curves.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The comments highlight important points about the scope of the theoretical results and the robustness of the empirical applications. We address each major comment below and will revise the manuscript to incorporate clarifications and additional analyses.
read point-by-point responses
-
Referee: [§3] §3 (Theoretical results on correlation and AUC): The generality claim that varying the copula parameter produces the stated monotonic AUC effects for rule-out and rule-in does not automatically extend to all bivariate copulas. For families with asymmetric tail dependence (Clayton for lower tail, Gumbel for upper tail), increasing the parameter at fixed Kendall’s tau can alter joint survival probabilities near the relevant thresholds differently than for symmetric copulas (Gaussian, Frank). The integral defining the AUC may therefore change sign for some threshold choices; the derivation should state the required copula properties or include explicit checks for asymmetric families.
Authors: We agree that the monotonicity results in §3 are derived under the assumption that increasing the copula parameter strengthens positive dependence in a manner that monotonically affects the relevant joint survival probabilities for rule-out (lower tail) and rule-in (upper tail) thresholds. The proofs rely on the copula being increasing in concordance and on the survival copula preserving the direction of the effect for the AUC integral. While this holds for the symmetric families (Gaussian, Frank) emphasized in the paper, we acknowledge that asymmetric tail dependence in families such as Clayton or Gumbel could, in principle, produce non-monotonic behavior for certain threshold configurations. We will revise §3 to explicitly state the required copula properties (positive quadrant dependence together with monotonicity of the survival function with respect to the dependence parameter) and add numerical verification for Clayton and Gumbel copulas across a range of Kendall’s tau values and clinically relevant thresholds to confirm that the stated AUC directions remain valid in the mammography context. revision: yes
-
Referee: [§4] §4 (Mammography applications): The reported consistency between theory and the three datasets depends on the chosen copula family and marginal estimators. No sensitivity analysis to alternative copulas (or to the specific parametric forms) is provided, leaving open whether the observed agreement is robust or an artifact of the selected dependence structure.
Authors: The Gaussian copula was selected for the applications because it permits straightforward control of dependence via a single parameter while remaining consistent with the marginal ROC curves estimated from the data. Marginal distributions were fitted using both parametric (beta) and nonparametric (empirical) approaches, with results shown to be insensitive to this choice. Nevertheless, we accept that a sensitivity check is warranted. In the revised manuscript we will add a dedicated subsection in §4 that repeats the rule-out and rule-in AUC calculations for the three mammography datasets under the Clayton and Frank copulas (matched to the same Kendall’s tau values), as well as under a nonparametric rank-based dependence estimator. We will report the resulting AUC values and their agreement with the theoretical predictions to demonstrate robustness. revision: yes
Circularity Check
Derivation is self-contained via standard copula mathematics with no reduction to inputs or self-citations
full rationale
The paper's core results on how radiologist-AI correlation affects rule-out and rule-in AUC are obtained by modeling the joint distribution of two tests with bivariate copulas, separating marginals from dependence structure, and integrating the resulting joint survival or distribution functions to obtain the combined ROC curves. This is a direct mathematical derivation from copula properties (a standard external tool) rather than any fit to the mammography data or self-referential definition. The general claims follow from varying the copula parameter while holding marginals fixed and examining the sign of the AUC change; no equations reduce to tautologies or fitted parameters renamed as predictions. Applications to the three data sets are presented only as consistency checks after the theory is derived. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the derivation chain. The paper is therefore self-contained against external statistical benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Bivariate copulas can model the dependence between two diagnostic test results while preserving marginal distributions.
Reference graph
Works this paper leans on
-
[1]
Primary prevention, smoking, and smoking cessation,
D.M. Burns, “Primary prevention, smoking, and smoking cessation,”Prevention and Early Diagnosis of Lung Cancer, vol. 89, no. 11, 2000
2000
-
[2]
The average receiver operating characteristic curve in multireader multicase imaging studies,
W. Chen and F.W. Samuelson, “The average receiver operating characteristic curve in multireader multicase imaging studies,”British Journal of Radiology, vol. 87, no. 1040, 2014
2014
-
[3]
Multi-million Mammography Image Dataset and Population-Based Screening Cohort for the Training and Evaluation of Deep Neural Networks—the Cohort of Screen-Aged Women (CSAW),
K. Dembrower, P. Lindholm and F. Strand, “Multi-million Mammography Image Dataset and Population-Based Screening Cohort for the Training and Evaluation of Deep Neural Networks—the Cohort of Screen-Aged Women (CSAW),”Journal of Digital Imaging, vol. 33, pp. 408-413, 2020
2020
-
[4]
Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data,
D.D. Dorfman and E. Alf, “Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data,”Journal of Mathematical Psychology, vol. 6, no. 3, pp. 487-496, 1969
1969
-
[5]
On the computation of the bivariate normal integral,
Z. Dresner and G.O. Wesolowsky, “On the computation of the bivariate normal integral,”Journal of Statistical Computation and Simulation, vol. 35, no. 1-2, pp. 101-107, 1990
1990
-
[6]
[Online]
EMBED Open Source Dataset, 2023. [Online]. Available:https://github.com/Emory-HITI/EMBED_ Open_Data
2023
-
[7]
The WISDOM Study: breaking the deadlock in the breast cancer screening debate,
L.J. Esserman et al., “The WISDOM Study: breaking the deadlock in the breast cancer screening debate,”npj Breast Cancer, vol. 3, p. 34, 2017
2017
-
[8]
The joy of copulas: Bivariate distributions with uniform marginals,
C. Genest and J. MacKay, “The joy of copulas: Bivariate distributions with uniform marginals,”The American Statistician, vol. 40, no. 4, pp. 280-283, 1986
1986
-
[9]
Computation of Multivariate Normal and t Probabilities,
A. Genz and F. Bretz, “Computation of Multivariate Normal and t Probabilities,”Springer Science and Business Media, 2009
2009
-
[10]
Enhancing Radiologist Reading Performance by Ordering Screening Mammo- grams Based on Characteristics That Promote Visual Adaptation,
J.J.J. Gommers et al., “Enhancing Radiologist Reading Performance by Ordering Screening Mammo- grams Based on Characteristics That Promote Visual Adaptation,”Radiology, vol. 313, no. 1, 2024
2024
-
[11]
Signal Detection Theory and Psychophysics,
D. M. Green and J. A. Swets, “Signal Detection Theory and Psychophysics,” John Wiley & Sons, New York, 1966
1966
-
[12]
The meaning and use of the area under a Receiver Operating Charac- teristic (ROC) curve,
J.A. Hanley and B.J. McNeil, “The meaning and use of the area under a Receiver Operating Charac- teristic (ROC) curve,”Radiology, vol. 143, no. 1, pp. 29-36, 1982
1982
-
[13]
Mammography Breast Cancer Screening Triage Using Deep Learning: A UK Retrospective Study,
S.E. Hickman et al., “Mammography Breast Cancer Screening Triage Using Deep Learning: A UK Retrospective Study,”Radiology, vol. 309, no. 2, 2023
2023
-
[14]
Likelihood inference for Archimedean copulas in high dimensions under known margins,
M. Hofert, M. Maechler and A. J. McNeil, “Likelihood inference for Archimedean copulas in high dimensions under known margins,”Journal of Multivariate Analysis, vol. 110, pp. 133-150, 2012. 20
2012
-
[15]
J. Jeong et al., ”The EMory BrEast imaging Dataset (EMBED): A racially diverse, granular dataset of 3.4 million screening and diagnostic mammographic images,” Radiology: Artificial Intelligence, vol. 4, no. 5, 2023
2023
-
[16]
Towards adaptive anomaly detection systems using Boolean combination of hidden Markov models
W. Khreich, “Towards adaptive anomaly detection systems using Boolean combination of hidden Markov models”, 2011. (Doctoral dissertation, ´Ecole de technologie sup´ erieure, Universit´ e du Qu´ ebec, Montr´ eal, Canada)
2011
-
[17]
Kyono, F
T. Kyono, F. Gilbert and M. van der Schaar, ”Improving workflow efficiency for mammography using machine learning,” Journal of the American College of radiology, vol. 17, pp. 56-63, 2020
2020
-
[18]
K. Lang, “Artificial intelligence-supported screen reading versus standard double reading in the Mam- mography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy”The Lancet Digital Health, vol. 5, no. 7, pp. e400-e409, 2023
2023
-
[19]
Artificial intelligence evaluation of 122,969 mammography examinations from a population- based screening program,
M. Larsen, “Artificial intelligence evaluation of 122,969 mammography examinations from a population- based screening program,”Radiology, vol. 303, pp. 619-626, 2022
2022
-
[20]
An ‘Artificial Intelligence–based Mammography Screening Protocol for Breast Cancer: Outcome and Radiologist Workload,
A.D. Lauritzen et al., “An ‘Artificial Intelligence–based Mammography Screening Protocol for Breast Cancer: Outcome and Radiologist Workload,”Radiology, vol. 304, no. 1, 2022
2022
-
[21]
Libra Software, 1.0.4, Philadelphia, PA: University of Pennsylvania
-
[22]
A systematic review and quality assessment of individualised breast cancer risk prediction models,
J. Louro, M. Posso, M. H. Boon, M. Rom´ an, L. Domingo, and M. Sala, “A systematic review and quality assessment of individualised breast cancer risk prediction models,”British Journal of Cancer, vol. 121, no. 1, pp. 76–85, 2019
2019
-
[23]
Cascade of Boolean detector combinations,
M. M¨ ahk¨ onen, T. Virtanen, and J. K¨ am¨ ar¨ ainen, “Cascade of Boolean detector combinations,” EURASIP Journal on Image and Video Processing, vol. 2018, issue 61, 2018
2018
-
[24]
The analysis of case-control data with clustered exposures,
R. J. Marshall, “The analysis of case-control data with clustered exposures,”Biometrics, vol. 57, no. 3, pp. 712–719, 2001
2001
-
[25]
CSAW-S: mammography screenings from 172 different patients with annotations for semantic segmentation,
C. Matsoukas, “CSAW-S: mammography screenings from 172 different patients with annotations for semantic segmentation,” 2023. [Online]. Available:https://github.com/ChrisMats/CSAW-S
2023
-
[26]
A flexible and tractable class of one-factor copulas,
G. Mazo and S. Girard, “A flexible and tractable class of one-factor copulas,”Statistics and Computing, vol. 1, pp. 1-16, 2015
2015
-
[27]
Choosing between the BP and BN sequential strategies. Pharmaceutical Statistics,
D.K. McClish, A. Wilk and C. Schubert, “Choosing between the BP and BN sequential strategies. Pharmaceutical Statistics,” Pharmaceutical Statistics, vol. 18, no. 5, pp. 533-545, 2019
2019
-
[28]
International evaluation of an AI system for breast cancer screening,
S.M. McKinney, “International evaluation of an AI system for breast cancer screening,”Nature, vol. 577, no. 7788, pp. 89-94, 2020
2020
-
[29]
Basic principles of ROC analysis,
C.E. Metz, “Basic principles of ROC analysis,”Seminars in Nuclear Medicine, vol. 8, no. 4, pp. 283-298, 1978
1978
-
[30]
ROC analysis in medical imaging: a tutorial review of the literature,
C.E. Metz, “ROC analysis in medical imaging: a tutorial review of the literature,”Radiological Physics and Technology, vol. 1, pp. 2-12, 2008
2008
-
[31]
Receiver operating characteristic analysis: a tool for the quantitative evaluation of observer performance and imaging systems,
C.E. Metz, “Receiver operating characteristic analysis: a tool for the quantitative evaluation of observer performance and imaging systems,”Journal of the American College of Radiology, vol. 3, no. 6, pp. 413- 422, 2006
2006
-
[32]
A new approach for testing the significance of differences between ROC curves from correlated data,
C.E. Metz, P.L. Wang and H.B. Kronman, “A new approach for testing the significance of differences between ROC curves from correlated data,” inInformation Processing in Medical Imaging, Dordrecht, Springer, 1984, pp. 432-445.ue; 1984, pp. 432–445. 21
1984
-
[33]
Modeling dependence structure with Archimedean copulas and applications to the iTraxx CDS index,
N. Naifar, “Modeling dependence structure with Archimedean copulas and applications to the iTraxx CDS index,”Review of Quantitative Finance and Accounting, vol. 38, no. 2, pp. 271–294, 2012
2012
-
[34]
Nelsen, An Introduction to Copulas (Second Edition), Portland, OR: Springer Series in Statistics, 2006
R. Nelsen, An Introduction to Copulas (Second Edition), Portland, OR: Springer Series in Statistics, 2006
2006
-
[35]
Statistical considerations for testing an AI algorithm used for prescreening lung CT images,
N. A. Obuchowski and J. A. Bullen, “Statistical considerations for testing an AI algorithm used for prescreening lung CT images,”Translational Lung Cancer Research, vol. 8, no. 2, pp. 179-193, 2019
2019
-
[36]
Using artificial intelligence to read chest radiographs for tuberculosis detection: A multi-site evaluation of the diagnostic accuracy of three deep learning systems,
Z.Z. Qin et al., “Using artificial intelligence to read chest radiographs for tuberculosis detection: A multi-site evaluation of the diagnostic accuracy of three deep learning systems,”Scientific Reports, vol. 9, 2019
2019
-
[37]
[Online]
”RSNA Breast Cancer Detection Competition,” 2023. [Online]. Available:https://www.kaggle.com/ competitions/rsna-breast-cancer-detection
2023
-
[38]
AI-based strategies to reduce workload in breast cancer screening with mammography and tomosynthesis: A retrospective evaluation,
J.L. Raya-Povedano, S. Romero-Mart´ ın, E. El´ ıas-Cabot, A. Gubern-M´ erida, M. Rodr´ ıguez-Ruiz and M. ´Alvarez-Benito, “AI-based strategies to reduce workload in breast cancer screening with mammography and tomosynthesis: A retrospective evaluation,”Radiology, vol. 299, no. 1, pp. 50-57, 2021
2021
-
[39]
Can we reduce the workload of mammographic screening by automatic identification of normal exams with artificial intelligence? A feasibility study,
A. Rodr´ ıguez-Ruiz et al., “Can we reduce the workload of mammographic screening by automatic identification of normal exams with artificial intelligence? A feasibility study,” EuropeanRadiology, vol. 29, no. 9, pp. 4825-4832, 2019
2019
-
[40]
Stand-alone artificial intelligence for breast cancer detection in mammogra- phy: Comparison with 101 radiologists,
A. Rodr´ ıguez-Ruiz et al., “Stand-alone artificial intelligence for breast cancer detection in mammogra- phy: Comparison with 101 radiologists,”JAMA Network Open, vol. 2, no. 3, 2019
2019
-
[41]
Inference Based on Diagnostic Measures from Studies of New Imaging Devices
F.W. Samuelson, “Inference Based on Diagnostic Measures from Studies of New Imaging Devices”, Academic Radiology, vol. 20, pp. 816-824, 2013
2013
-
[42]
Are better AI algorithms for breast cancer detection also better at predicting risk? A paired case–control study,
R. Santeramo, C. Damiani, J. Wei, G. Montana and A. R. Brentnall, “Are better AI algorithms for breast cancer detection also better at predicting risk? A paired case–control study,”Breast Cancer Research, vol. 26, no. 25, 2024
2024
-
[43]
On the principles of believe the positive and believe the negative for diagnosis using two continuous tests,
C. Shen, “On the principles of believe the positive and believe the negative for diagnosis using two continuous tests,”Journal of Data Science, vol. 6, pp. 189-205, 2008
2008
-
[44]
CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer,
M. Sorkhei et al., “CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer,” inNeurIPS 2025 Proceedings
2025
-
[45]
CSAW-CC (mammography) – a dataset for AI research to improve screening, diagnostics and prognostics of breast cancer (Version 1) [Dataset],
F. Strand, “CSAW-CC (mammography) – a dataset for AI research to improve screening, diagnostics and prognostics of breast cancer (Version 1) [Dataset],” Solna, Sweden, 2022
2022
-
[46]
Constructing a bivariate distribution function with given marginals and correlation: ap- plication to the galaxy luminosity function,
T. Takeuchi, “Constructing a bivariate distribution function with given marginals and correlation: ap- plication to the galaxy luminosity function,”Monthly Notices of the Royal Astronomical Society, vol. 406, no. 3, 2010
2010
-
[47]
10-year performance of four models of breast cancer risk: A validation study,
M. B. Terry et al., “10-year performance of four models of breast cancer risk: A validation study,” JAMA Oncology, vol. 5, no. 11, pp. 1718-1723, 2019
2019
-
[48]
Assessing the diagnostic accuracy of a sequence of tests,
M. L. Thompson, “Assessing the diagnostic accuracy of a sequence of tests,”Biostatistics, vol. 4, no. 3, pp. 341-351, 2003
2003
-
[49]
Tails of copulas,
G.G. Venter, “Tails of copulas,”ASTIN Bulletin, vol. 32, no. 2, pp. 259–270, 2002
2002
-
[50]
A deep learning mammography-based model for improved breast cancer risk prediction,
A. Yala, C. Lehman, T. Schuster, T. Portnoi, and R. Barzilay, “A deep learning mammography-based model for improved breast cancer risk prediction,”Radiology, vol. 292, no. 1, pp. 60–66, 2019
2019
-
[51]
Research highlight: Artificial intelligence for ruling out negative examinations in screening breast MRI,
J.H. Youk and E.K. Kim, “Research highlight: Artificial intelligence for ruling out negative examinations in screening breast MRI,”Korean Journal of Radiology, vol. 23, no. 2, pp. 153-155, 2022. 22
2022
-
[52]
Impact of artificial intelligence support on accuracy and reading time in breast tomosynthesis image interpretation: A multi-reader multi-case study,
S.L. van Winkel et al., “Impact of artificial intelligence support on accuracy and reading time in breast tomosynthesis image interpretation: A multi-reader multi-case study,”European Radiology, vol. 31, pp. 7348-7356, 2021. A Calculus in copula analysis A.1 Gumbel copula: Equation(8)is increasing inθ D To see that (8) is increasing inθ D on (0,∞), note f...
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.