Evaluating the role of correlation among markers in prediction models

Francisco J. Jurado; Luis Mariano Esteban; N\'uria Malats; Sergio Sabroso-Lasa; Tom\'as Alcal\'a-Nalvaiz

arxiv: 2606.02062 · v1 · pith:PCWCE3SRnew · submitted 2026-06-01 · 📊 stat.ME

Evaluating the role of correlation among markers in prediction models

Sergio Sabroso-Lasa , Luis Mariano Esteban , Tom\'as Alcal\'a-Nalvaiz , Francisco J. Jurado , N\'uria Malats This is my paper

Pith reviewed 2026-06-28 13:18 UTC · model grok-4.3

classification 📊 stat.ME

keywords biomarkersAUCcorrelationsROC curvepredictive modelsmultivariate normalitypancreatic cancermetabolites

0 comments

The pith

Negative correlations between biomarkers maximize the combined AUC in predictive models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives an expression for the maximum achievable AUC when combining biomarkers and shows how their correlations affect this value. It demonstrates that negative correlations between markers lead to the highest combined discrimination ability, particularly when the individual markers have similar predictive power. This finding is illustrated with graphical surfaces and confirmed through simulations with normal and skewed distributions as well as analysis of real metabolite data for pancreatic cancer detection. The work highlights that the sign and strength of inter-marker correlations should be considered when building or extending predictive models.

Core claim

Under the assumption of multivariate normality, the maximum AUC for a linear combination of biomarkers is a function of the correlations between them, with negative correlations yielding the highest values and positive correlations the lowest. This holds for markers with equal or differing predictive abilities, though the benefit is greatest when abilities are equal. Simulations and real data on lipid metabolites reinforce that negative correlations optimize model performance.

What carries the argument

An expression for the maximum AUC derived as a function of the correlations between markers under multivariate normality.

If this is right

When adding a new biomarker, preferring ones negatively correlated with existing ones improves discrimination more.
For markers with equal strength, negative correlation gives greatest AUC gain.
Positive correlations between markers reduce the combined AUC.
The effect persists in skewed distributions but asymmetry plays a role.
In metabolite data for PDAC, correlations influence AUC optimization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Model builders could screen potential markers for negative correlations to existing ones to maximize gain.
This might suggest redesigning marker selection criteria in high-dimensional settings.
Extensions to non-linear combinations or other performance metrics could be explored.
The finding may apply to other diagnostic fields beyond cancer.

Load-bearing premise

The biomarkers follow a multivariate normal distribution.

What would settle it

A dataset where combining negatively correlated markers does not yield higher AUC than positively correlated ones, after controlling for individual marker strengths.

read the original abstract

Different methods have been employed to estimate models maximizing the area under the receiver operating characteristic curve (ROC-AUC). Once a model is developed, integrating novel biomarkers may improve its diagnostic ability. However, the discrimination improvement from adding a new biomarker is not always evident, even if the marker itself has good discriminatory power. The sign and magnitude of correlations between biomarkers may impact model performance. In this paper, we assess the effect of such correlations on the discrimination ability of predictive models. Under multivariate normality, we derive an expression for the maximum AUC as a function of the correlations between markers, illustrated graphically using surfaces. Logarithmic folded bivariate normal and Gamma simulations address skewed data cases. Additionally, AUC improvement was assessed combining 1934 blood lipid metabolites determined by liquid chromatography in 44 pancreatic cancer cases and 38 controls from the PanGenMic Study. Our results show that negative correlations consistently maximize the combined AUC, offering the greatest improvements when markers have equal predictive ability, while positive correlations yield the least favorable results. Negative correlations remain optimal for markers with differing abilities, though positive correlations show slight benefits. Simulations with skewed distributions confirm these trends, emphasizing the role of asymmetry in marker selection. Real-world analysis of serum lipid-derived metabolites for detecting pancreatic ductal adenocarcinoma (PDAC) reinforces the influence of correlations on AUC optimization. These findings suggest that the sign and magnitude of inter-biomarker correlations should be considered when incorporating new markers into predictive algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives a closed-form max AUC under MVN that increases with negative marker correlations, and the simulations plus small real-data check mostly back the pattern.

read the letter

The main point is that under multivariate normality the maximum AUC for an optimal linear combination of markers has an explicit expression in the pairwise correlations, and that expression is largest when correlations are negative. The paper derives this, shows the surfaces, runs skewed-distribution simulations to test robustness, and applies the idea to 1934 lipid metabolites in a 44-case/38-control pancreatic cancer dataset.

What works is the direct derivation from standard MVN properties and the fact that the simulations target the non-normality worry raised in the abstract. The real-data example at least illustrates that the correlation sign matters in practice. The central claim does not collapse into circularity; it starts from the MVN model and produces a usable expression.

The soft spots are modest but real. The analytic result is tied to normality even though the simulations suggest the negative-correlation preference survives moderate skewness. The real-data sample is small, so it functions more as a sanity check than strong confirmation. No code or step-by-step derivation is mentioned in the abstract, which slows verification. The paper also does not explore how a practitioner would actually use the expression when choosing markers.

This is for statisticians and methodologists who build or evaluate multi-marker diagnostic models, especially in clinical biomarker work. A reader who needs to think about correlation structure when adding markers will find the surfaces and the negative-correlation result directly useful.

It deserves peer review. The result is concrete enough and the simulations address the main limitation, so referees can check the derivation and the robustness claims without starting from scratch.

Referee Report

2 major / 2 minor

Summary. The paper derives a closed-form expression for the maximum AUC achievable by a linear combination of biomarkers under the assumption of multivariate normality, as a function of the pairwise correlations among markers. It concludes that negative correlations maximize the combined AUC (with largest gains when markers have equal individual predictive strength), illustrates this with surfaces, checks robustness via simulations under logarithmic folded bivariate normal and Gamma distributions, and applies the idea to 1934 lipid metabolites in a pancreatic cancer case-control study.

Significance. If the derivation is correct, the result supplies a simple, interpretable rule for biomarker selection that is directly actionable in model building. The use of exact MVN properties for the closed-form result, together with targeted simulations for non-normality and a real-data corroboration, gives the work concrete practical value beyond purely theoretical claims.

major comments (2)

[Derivation (abstract and main text)] The central derivation of the maximum-AUC expression (mentioned in the abstract and presumably in the Methods/Results) is stated to follow from standard multivariate-normal properties, yet the manuscript supplies neither the explicit formula nor the algebraic steps that produce it. This omission is load-bearing because the claim that negative correlations maximize AUC rests entirely on that expression.
[Real-data application] Table or figure reporting the real-data AUC values (PanGenMic lipid-metabolite analysis) is needed to quantify the claimed improvement under negative versus positive correlations; without it the empirical support for the main conclusion remains qualitative.

minor comments (2)

[Simulation section] The abstract refers to 'logarithmic folded bivariate normal' simulations; the precise parameterization and how the correlation is preserved under the transformation should be stated explicitly for reproducibility.
[Graphical illustration] Notation for the linear combination coefficients and the resulting AUC expression should be introduced once and used consistently; currently the link between the MVN parameters and the plotted surfaces is not fully transparent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation of minor revision. We address each major comment below.

read point-by-point responses

Referee: [Derivation (abstract and main text)] The central derivation of the maximum-AUC expression (mentioned in the abstract and presumably in the Methods/Results) is stated to follow from standard multivariate-normal properties, yet the manuscript supplies neither the explicit formula nor the algebraic steps that produce it. This omission is load-bearing because the claim that negative correlations maximize AUC rests entirely on that expression.

Authors: We agree that the explicit formula and algebraic steps were omitted and should be supplied. The maximum AUC under MVN follows from the fact that the optimal linear combination yields an AUC determined by the square root of a quadratic form in the mean difference vector and the inverse covariance matrix; the sign of the off-diagonal elements of the correlation matrix then determines whether this quantity is maximized or minimized. In the revised manuscript we will insert the closed-form expression together with the derivation steps in the Methods section. revision: yes
Referee: [Real-data application] Table or figure reporting the real-data AUC values (PanGenMic lipid-metabolite analysis) is needed to quantify the claimed improvement under negative versus positive correlations; without it the empirical support for the main conclusion remains qualitative.

Authors: We agree that quantitative AUC values are required to make the empirical claim concrete. In the revised manuscript we will add a table (or figure) in the Results section that reports the observed AUCs for representative metabolite pairs and small combinations stratified by the sign and magnitude of their correlations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained from standard MVN properties

full rationale

The paper derives a closed-form expression for maximum AUC under multivariate normality as a function of pairwise correlations, which follows directly from standard properties of the MVN distribution without reducing to any fitted input, self-defined quantity, or self-citation chain within the paper itself. Simulations under logarithmic folded bivariate normal and Gamma distributions, plus the real-data lipid metabolite example, serve as independent robustness checks rather than tautological confirmations. No load-bearing step equates a prediction to its own construction or imports uniqueness via author-overlapping citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The load-bearing premise is the multivariate normality assumption required for the closed-form AUC expression; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Biomarkers are jointly multivariate normal
The abstract states that the expression for maximum AUC is derived under multivariate normality.

pith-pipeline@v0.9.1-grok · 5805 in / 1204 out tokens · 26975 ms · 2026-06-28T13:18:17.870698+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 6 canonical work pages

[1]

Combining diagnostic test results to increase accuracy

Pepe M, Biostatistics MT, 2000 undefined. Combining diagnostic test results to increase accuracy. academic.oup.com [Internet]. 2000 [cited 2024 Feb 14];1(2):123–

2000
[2]

Available from: https://academic.oup.com/biostatistics/article- abstract/1/2/123/438521
[3]

The area above the ordinal dominance graph and the area below the receiver operating characteristic graph

Bamber D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol. 1975 Nov 1;12(4):387–415. 29

1975
[4]

Combining Biomarkers to Improve Diagnostic Accuracy in Detecting Diseases With Group-Tested Data

Yang J, Zhang W, Albert PS, Liu A, Chen Z. Combining Biomarkers to Improve Diagnostic Accuracy in Detecting Diseases With Group-Tested Data. Stat Med [Internet]. 2024 [cited 2025 Jan 22];43(27). Available from: https://pubmed.ncbi.nlm.nih.gov/39375883/

arXiv 2024
[5]

Novel combination markers for predicting progression of nonmuscle invasive bladder cancer

Ha YS, Kim JS, Yoon HY, Jeong P, Kim TH, Yun SJ, et al. Novel combination markers for predicting progression of nonmuscle invasive bladder cancer. Int J Cancer [Internet]. 2012 Aug 15 [cited 2025 Jan 22];131(4):E501–7. Available from: https://onlinelibrary.wiley.com/doi/full/10.1002/ijc.27319

work page doi:10.1002/ijc.27319 2012
[6]

Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker

Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol [Internet]. 2004 May 1 [cited 2024 Nov 20];159(9):882–90. Available from: https://pubmed.ncbi.nlm.nih.gov/15105181/

arXiv 2004
[7]

Estimation and Comparison of Receiver Operating Characteristic Curves

Pepe MS, Longton G, Janes H. Estimation and Comparison of Receiver Operating Characteristic Curves. Stata J [Internet]. 2009 [cited 2024 Nov 20];9(1):1. Available from: https://pmc.ncbi.nlm.nih.gov/articles/PMC2774909/

2009
[8]

Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond

Pencina MJ, D’Agostino RB, D’Agostino RB, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med [Internet]. 2008 Jan 30 [cited 2024 Nov 20];27(2):157–72. Available from: https://pubmed.ncbi.nlm.nih.gov/17569110/

arXiv 2008
[9]

A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index

Hilden J, Gerds TA. A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index. Stat Med [Internet]. 2014 Aug 30 [cited 2024 Nov 20];33(19):3405–14. Available from: https://pubmed.ncbi.nlm.nih.gov/23553436/

arXiv 2014
[10]

Building multi-marker algorithms for disease prediction–-the role of correlations among markers

Pinsky P, insights CZB, 2011 undefined. Building multi-marker algorithms for disease prediction–-the role of correlations among markers. journals.sagepub.comPF Pinsky, CS ZhuBiomarker insights, 2011•journals.sagepub.com [Internet]. 2011 [cited 2024 Feb 14];6:83–93. Available from: https://journals.sagepub.com/doi/abs/10.4137/BMI.S7513

work page doi:10.4137/bmi.s7513 2011
[11]

When does combining markers improve classification performance and what are implications for practice? Stat Med

Bansal A, SullivanPepe M. When does combining markers improve classification performance and what are implications for practice? Stat Med. 2013 May 20;32(11):1877–92

2013
[12]

Impact of correlation of predictors on discrimination of risk models in development and external populations

Kundu S, Mazumdar M, Ferket B. Impact of correlation of predictors on discrimination of risk models in development and external populations. BMC Med Res Methodol. 2017 Apr 19;17(1)

2017
[13]

Artificial intelligence for multimodal data integration in oncology

Lipkova J, Chen RJ, Chen B, Lu MY, Barbieri M, Shao D, et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell. 2022 Oct 10;40(10):1095–110

2022
[14]

Highly accurate diagnosis of pancreatic cancer by integrative modeling using gut microbiome and exposome data

Zhang Y, Zhang H, Liu B, Ning K. Highly accurate diagnosis of pancreatic cancer by integrative modeling using gut microbiome and exposome data. iScience [Internet]. 2024 Mar 15 [cited 2024 Nov 28];27(3). Available from: https://pubmed.ncbi.nlm.nih.gov/38450156/

arXiv 2024
[15]

MDICC: novel method for multi-omics data integration and cancer subtype identification

Yang Y, Tian S, Qiu Y, Zhao P, Zou Q. MDICC: novel method for multi-omics data integration and cancer subtype identification. Brief Bioinform [Internet]. 2022 May 1 30 [cited 2024 Nov 28];23(3). Available from: https://pubmed.ncbi.nlm.nih.gov/35437603/

arXiv 2022
[16]

Multimodal analysis of methylomics and fragmentomics in plasma cell-free DNA for multi-cancer early detection and localization

Nguyen VTC, Nguyen TH, Doan NNT, Pham TMQ, Nguyen GTH, Nguyen TD, et al. Multimodal analysis of methylomics and fragmentomics in plasma cell-free DNA for multi-cancer early detection and localization. Elife [Internet]. 2023 Oct 11 [cited 2024 Nov 28];12. Available from: https://pubmed.ncbi.nlm.nih.gov/37819044/

arXiv 2023
[17]

Linear combinations of multiple diagnostic markers

Su JQ, Liu JS. Linear combinations of multiple diagnostic markers. J Am Stat Assoc. 1993;88(424):1350–5

1993
[18]

Sample size and performance estimation for biomarker combinations based on pilot studies with small sample sizes

Al-Mekhlafi A, Becker T, Statistics FKC in, 2022 undefined. Sample size and performance estimation for biomarker combinations based on pilot studies with small sample sizes. Taylor & FrancisA Al-Mekhlafi, T Becker, F KlawonnCommunications in Statistics-Theory and Methods, 2022•Taylor & Francis [Internet]. 2020 [cited 2024 Jan 31];51(16):5534–48. Available...

work page doi:10.1080/03610926.2020.1843053 2022
[19]

Performance of diagnostic tests based on continuous bivariate markers

Samawi H, Chen DG, Yin J, Alsharman M. Performance of diagnostic tests based on continuous bivariate markers. J Appl Stat [Internet]. 2022 Oct 27 [cited 2024 Feb 1]; Available from: https://www.tandfonline.com/doi/abs/10.1080/02664763.2022.2137478

work page doi:10.1080/02664763.2022.2137478 2022
[20]

A step-by-step algorithm for combining diagnostic tests

Esteban LM, Sanz G, Borque A. A step-by-step algorithm for combining diagnostic tests. J Appl Stat. 2011 May;38(5):899–911

2011
[21]

The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression

Van Den Goorbergh R, Van Smeden M, Timmerman D, Ben Van Calster. The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. Journal of the American Medical Informatics Association [Internet]. 2022 Aug 16 [cited 2024 Nov 28];29(9):1525–34. Available from: https://dx.doi.org/10.1093/jamia/ocac093

work page doi:10.1093/jamia/ocac093 2022
[22]

A NOTE ON THE GAMMA DISTRIBUTION

THOM HCS. A NOTE ON THE GAMMA DISTRIBUTION. Mon Weather Rev [Internet]. 1958;86(4):117–22. Available from: https://journals.ametsoc.org/view/journals/mwre/86/4/1520- 0493_1958_086_0117_anotgd_2_0_co_2.xml

1958
[23]

A family of Gamma-generated distributions: Statistical properties and applications

Pourreza H, Jamkhaneh EB, Deiri E. A family of Gamma-generated distributions: Statistical properties and applications. Stat Methods Med Res [Internet]. 2021 Aug 1 [cited 2024 Nov 28];30(8):1850–73. Available from: https://pubmed.ncbi.nlm.nih.gov/34006148/

arXiv 2021
[24]

A faecal microbiota signature with high specificity for pancreatic cancer

Kartal E, Schmidt TSB, Molina-Montes E, Rodríguez-Perales S, Wirbel J, Maistrenko OM, et al. A faecal microbiota signature with high specificity for pancreatic cancer. Gut [Internet]. 2022 [cited 2024 Nov 20];71(7):1359–72. Available from: https://pubmed.ncbi.nlm.nih.gov/35260444/

arXiv 2022
[25]

Reduced risk of pancreatic cancer associated with asthma and nasal allergies

Gomez-Rubio P, Zock JP, Rava M, Marquez M, Sharp L, Hidalgo M, et al. Reduced risk of pancreatic cancer associated with asthma and nasal allergies. Gut [Internet]. 2017 Feb 1 [cited 2024 Nov 20];66(2):314–22. Available from: https://pubmed.ncbi.nlm.nih.gov/26628509/ 31

arXiv 2017
[26]

Impact of Correlation on Predictive Ability of Biomarkers

Demler O, Pencina MJ, D’agostino RB, Demler O V, D’ RB, Sr A. Impact of Correlation on Predictive Ability of Biomarkers. researchgate.net [Internet]. 2013 Oct 30 [cited 2024 Jan 29];32(24):4196–210. Available from: https://www.researchgate.net/profile/Olga-Demler- 2/publication/236614761_Impact_of_Correlation_on_Predictive_Ability_of_Biomarke rs/links/5c7...

arXiv 2013
[27]

Prediction Models — Development, Evaluation, and Clinical Application

Pencina MJ, Goldstein BA, D’Agostino RB. Prediction Models — Development, Evaluation, and Clinical Application. New England Journal of Medicine [Internet]. 2020 Apr 23 [cited 2024 Nov 12];382(17):1583–6. Available from: https://www.nejm.org/doi/full/10.1056/NEJMp2000589

work page doi:10.1056/nejmp2000589 2020
[28]

A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models

Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol [Internet]. 2019 Jun 1 [cited 2024 Nov 28];110:12–22. Available from: https://pubmed.ncbi.nlm.nih.gov/30763612/

arXiv 2019

[1] [1]

Combining diagnostic test results to increase accuracy

Pepe M, Biostatistics MT, 2000 undefined. Combining diagnostic test results to increase accuracy. academic.oup.com [Internet]. 2000 [cited 2024 Feb 14];1(2):123–

2000

[2] [2]

Available from: https://academic.oup.com/biostatistics/article- abstract/1/2/123/438521

[3] [3]

The area above the ordinal dominance graph and the area below the receiver operating characteristic graph

Bamber D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol. 1975 Nov 1;12(4):387–415. 29

1975

[4] [4]

Combining Biomarkers to Improve Diagnostic Accuracy in Detecting Diseases With Group-Tested Data

Yang J, Zhang W, Albert PS, Liu A, Chen Z. Combining Biomarkers to Improve Diagnostic Accuracy in Detecting Diseases With Group-Tested Data. Stat Med [Internet]. 2024 [cited 2025 Jan 22];43(27). Available from: https://pubmed.ncbi.nlm.nih.gov/39375883/

arXiv 2024

[5] [5]

Novel combination markers for predicting progression of nonmuscle invasive bladder cancer

Ha YS, Kim JS, Yoon HY, Jeong P, Kim TH, Yun SJ, et al. Novel combination markers for predicting progression of nonmuscle invasive bladder cancer. Int J Cancer [Internet]. 2012 Aug 15 [cited 2025 Jan 22];131(4):E501–7. Available from: https://onlinelibrary.wiley.com/doi/full/10.1002/ijc.27319

work page doi:10.1002/ijc.27319 2012

[6] [6]

Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker

Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol [Internet]. 2004 May 1 [cited 2024 Nov 20];159(9):882–90. Available from: https://pubmed.ncbi.nlm.nih.gov/15105181/

arXiv 2004

[7] [7]

Estimation and Comparison of Receiver Operating Characteristic Curves

Pepe MS, Longton G, Janes H. Estimation and Comparison of Receiver Operating Characteristic Curves. Stata J [Internet]. 2009 [cited 2024 Nov 20];9(1):1. Available from: https://pmc.ncbi.nlm.nih.gov/articles/PMC2774909/

2009

[8] [8]

Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond

Pencina MJ, D’Agostino RB, D’Agostino RB, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med [Internet]. 2008 Jan 30 [cited 2024 Nov 20];27(2):157–72. Available from: https://pubmed.ncbi.nlm.nih.gov/17569110/

arXiv 2008

[9] [9]

A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index

Hilden J, Gerds TA. A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index. Stat Med [Internet]. 2014 Aug 30 [cited 2024 Nov 20];33(19):3405–14. Available from: https://pubmed.ncbi.nlm.nih.gov/23553436/

arXiv 2014

[10] [10]

Building multi-marker algorithms for disease prediction–-the role of correlations among markers

Pinsky P, insights CZB, 2011 undefined. Building multi-marker algorithms for disease prediction–-the role of correlations among markers. journals.sagepub.comPF Pinsky, CS ZhuBiomarker insights, 2011•journals.sagepub.com [Internet]. 2011 [cited 2024 Feb 14];6:83–93. Available from: https://journals.sagepub.com/doi/abs/10.4137/BMI.S7513

work page doi:10.4137/bmi.s7513 2011

[11] [11]

When does combining markers improve classification performance and what are implications for practice? Stat Med

Bansal A, SullivanPepe M. When does combining markers improve classification performance and what are implications for practice? Stat Med. 2013 May 20;32(11):1877–92

2013

[12] [12]

Impact of correlation of predictors on discrimination of risk models in development and external populations

Kundu S, Mazumdar M, Ferket B. Impact of correlation of predictors on discrimination of risk models in development and external populations. BMC Med Res Methodol. 2017 Apr 19;17(1)

2017

[13] [13]

Artificial intelligence for multimodal data integration in oncology

Lipkova J, Chen RJ, Chen B, Lu MY, Barbieri M, Shao D, et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell. 2022 Oct 10;40(10):1095–110

2022

[14] [14]

Highly accurate diagnosis of pancreatic cancer by integrative modeling using gut microbiome and exposome data

Zhang Y, Zhang H, Liu B, Ning K. Highly accurate diagnosis of pancreatic cancer by integrative modeling using gut microbiome and exposome data. iScience [Internet]. 2024 Mar 15 [cited 2024 Nov 28];27(3). Available from: https://pubmed.ncbi.nlm.nih.gov/38450156/

arXiv 2024

[15] [15]

MDICC: novel method for multi-omics data integration and cancer subtype identification

Yang Y, Tian S, Qiu Y, Zhao P, Zou Q. MDICC: novel method for multi-omics data integration and cancer subtype identification. Brief Bioinform [Internet]. 2022 May 1 30 [cited 2024 Nov 28];23(3). Available from: https://pubmed.ncbi.nlm.nih.gov/35437603/

arXiv 2022

[16] [16]

Multimodal analysis of methylomics and fragmentomics in plasma cell-free DNA for multi-cancer early detection and localization

Nguyen VTC, Nguyen TH, Doan NNT, Pham TMQ, Nguyen GTH, Nguyen TD, et al. Multimodal analysis of methylomics and fragmentomics in plasma cell-free DNA for multi-cancer early detection and localization. Elife [Internet]. 2023 Oct 11 [cited 2024 Nov 28];12. Available from: https://pubmed.ncbi.nlm.nih.gov/37819044/

arXiv 2023

[17] [17]

Linear combinations of multiple diagnostic markers

Su JQ, Liu JS. Linear combinations of multiple diagnostic markers. J Am Stat Assoc. 1993;88(424):1350–5

1993

[18] [18]

Sample size and performance estimation for biomarker combinations based on pilot studies with small sample sizes

Al-Mekhlafi A, Becker T, Statistics FKC in, 2022 undefined. Sample size and performance estimation for biomarker combinations based on pilot studies with small sample sizes. Taylor & FrancisA Al-Mekhlafi, T Becker, F KlawonnCommunications in Statistics-Theory and Methods, 2022•Taylor & Francis [Internet]. 2020 [cited 2024 Jan 31];51(16):5534–48. Available...

work page doi:10.1080/03610926.2020.1843053 2022

[19] [19]

Performance of diagnostic tests based on continuous bivariate markers

Samawi H, Chen DG, Yin J, Alsharman M. Performance of diagnostic tests based on continuous bivariate markers. J Appl Stat [Internet]. 2022 Oct 27 [cited 2024 Feb 1]; Available from: https://www.tandfonline.com/doi/abs/10.1080/02664763.2022.2137478

work page doi:10.1080/02664763.2022.2137478 2022

[20] [20]

A step-by-step algorithm for combining diagnostic tests

Esteban LM, Sanz G, Borque A. A step-by-step algorithm for combining diagnostic tests. J Appl Stat. 2011 May;38(5):899–911

2011

[21] [21]

The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression

Van Den Goorbergh R, Van Smeden M, Timmerman D, Ben Van Calster. The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. Journal of the American Medical Informatics Association [Internet]. 2022 Aug 16 [cited 2024 Nov 28];29(9):1525–34. Available from: https://dx.doi.org/10.1093/jamia/ocac093

work page doi:10.1093/jamia/ocac093 2022

[22] [22]

A NOTE ON THE GAMMA DISTRIBUTION

THOM HCS. A NOTE ON THE GAMMA DISTRIBUTION. Mon Weather Rev [Internet]. 1958;86(4):117–22. Available from: https://journals.ametsoc.org/view/journals/mwre/86/4/1520- 0493_1958_086_0117_anotgd_2_0_co_2.xml

1958

[23] [23]

A family of Gamma-generated distributions: Statistical properties and applications

Pourreza H, Jamkhaneh EB, Deiri E. A family of Gamma-generated distributions: Statistical properties and applications. Stat Methods Med Res [Internet]. 2021 Aug 1 [cited 2024 Nov 28];30(8):1850–73. Available from: https://pubmed.ncbi.nlm.nih.gov/34006148/

arXiv 2021

[24] [24]

A faecal microbiota signature with high specificity for pancreatic cancer

Kartal E, Schmidt TSB, Molina-Montes E, Rodríguez-Perales S, Wirbel J, Maistrenko OM, et al. A faecal microbiota signature with high specificity for pancreatic cancer. Gut [Internet]. 2022 [cited 2024 Nov 20];71(7):1359–72. Available from: https://pubmed.ncbi.nlm.nih.gov/35260444/

arXiv 2022

[25] [25]

Reduced risk of pancreatic cancer associated with asthma and nasal allergies

Gomez-Rubio P, Zock JP, Rava M, Marquez M, Sharp L, Hidalgo M, et al. Reduced risk of pancreatic cancer associated with asthma and nasal allergies. Gut [Internet]. 2017 Feb 1 [cited 2024 Nov 20];66(2):314–22. Available from: https://pubmed.ncbi.nlm.nih.gov/26628509/ 31

arXiv 2017

[26] [26]

Impact of Correlation on Predictive Ability of Biomarkers

Demler O, Pencina MJ, D’agostino RB, Demler O V, D’ RB, Sr A. Impact of Correlation on Predictive Ability of Biomarkers. researchgate.net [Internet]. 2013 Oct 30 [cited 2024 Jan 29];32(24):4196–210. Available from: https://www.researchgate.net/profile/Olga-Demler- 2/publication/236614761_Impact_of_Correlation_on_Predictive_Ability_of_Biomarke rs/links/5c7...

arXiv 2013

[27] [27]

Prediction Models — Development, Evaluation, and Clinical Application

Pencina MJ, Goldstein BA, D’Agostino RB. Prediction Models — Development, Evaluation, and Clinical Application. New England Journal of Medicine [Internet]. 2020 Apr 23 [cited 2024 Nov 12];382(17):1583–6. Available from: https://www.nejm.org/doi/full/10.1056/NEJMp2000589

work page doi:10.1056/nejmp2000589 2020

[28] [28]

A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models

Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol [Internet]. 2019 Jun 1 [cited 2024 Nov 28];110:12–22. Available from: https://pubmed.ncbi.nlm.nih.gov/30763612/

arXiv 2019