Multi-omic Enriched Blood-Derived Digital Signatures Reveal Mechanistic and Confounding Disease Clusters for Differential Diagnosis

Abicumaran Uthamacumaran; Alexander Fulton; Bolin Liu; Hector Zenil

arxiv: 2511.10888 · v2 · submitted 2025-11-14 · 🧬 q-bio.OT

Multi-omic Enriched Blood-Derived Digital Signatures Reveal Mechanistic and Confounding Disease Clusters for Differential Diagnosis

Bolin Liu , Abicumaran Uthamacumaran , Alexander Fulton , Hector Zenil This is my paper

Pith reviewed 2026-05-17 22:56 UTC · model grok-4.3

classification 🧬 q-bio.OT

keywords digital blood twindisease clusteringblood biomarkershierarchical clusteringcytokine signalinghematological diseasesmechanistic overlapsprecision diagnostics

0 comments

The pith

Blood-derived digital signatures recover clinically meaningful disease clusters and reveal shared inflammatory mechanisms across categories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper constructs a digital blood twin computational model from blood analyte profiles across 103 diseases. Profiles are standardized into a disease-analyte matrix and pairwise Pearson correlations are used to build a hierarchical clustering tree that is cut into 16 groups. The largest cluster shows enrichment for cytokine-signaling pathways, indicating shared inflammatory mechanisms that cross traditional disease boundaries, while hematological conditions form a tight group and metabolic or respiratory ones are more scattered. Random Forest analysis flags neutrophils, mean corpuscular volume, red blood cell count, and platelet count as the strongest separating features. A reader would care because the work suggests that everyday lab blood tests contain enough structure to help reorganize how diseases are grouped for diagnosis and to spot overlapping biology.

Core claim

The authors construct a digital blood twin from longitudinal hematological and biochemical analytes across 103 disease signatures. They standardize these into a unified disease-analyte matrix and use pairwise Pearson correlations to measure similarity, followed by hierarchical clustering that partitions the tree into 16 groups at a stringent threshold. Enrichment analysis on the largest heterogeneous cluster points to cytokine-signaling pathways as a common mechanism. PCA and UMAP confirm the separation of hematological diseases, and Random Forest identifies neutrophils, mean corpuscular volume, red blood cell count, and platelet count as top discriminative features.

What carries the argument

The digital blood twin, a computational model based on a standardized disease-analyte matrix and pairwise Pearson correlations followed by hierarchical clustering.

If this is right

Hematopoietic disorders form a consistent and distinct cluster.
Metabolic, endocrine, and respiratory diseases display weaker internal cohesion and more heterogeneous placement.
The largest cluster converges on cytokine-signaling pathways that transcend conventional clinical categories.
Neutrophils, mean corpuscular volume, red blood cell count, and platelet count emerge as the most discriminative analytes.
Routine laboratory data combined with this network physiology approach can refine disease ontology and map comorbidities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same correlation-and-clustering pipeline could be applied to patient-level longitudinal blood data to test whether it improves differential diagnosis in clinical settings.
Extending the matrix to include additional omics layers might expose further mechanistic overlaps not visible from blood analytes alone.
The identified clusters could serve as a starting point for predicting co-occurrence risks between diseases that share biomarker profiles.

Load-bearing premise

That pairwise Pearson correlations computed on standardized disease-analyte profiles accurately capture true mechanistic similarities rather than being driven by confounding variables or data collection biases.

What would settle it

Re-running the clustering and enrichment steps on an independent collection of disease profiles or with a different similarity measure such as Spearman correlation and finding that the 16-group partition or the cytokine pathway enrichment disappears.

Figures

Figures reproduced from arXiv: 2511.10888 by Abicumaran Uthamacumaran, Alexander Fulton, Bolin Liu, Hector Zenil.

**Figure 1.** Figure 1: Complete phylogenetic tree of 103 disease profiles constructed using Pearson correlation distance (1 − ρij ) and UPGMA hierarchical clustering. The yaxis represents clustering distance [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Phylogenetic tree of 103 disease profiles [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 4.** Figure 4: A) Reactome enrichment dotplot of Cluster 9 diseases. The top pathways include Signaling by interleukins, IL-4/IL-13 signaling, Extracellular matrix organization, Platelet activation/degranulation, and IL-10 signaling. B) Reactome cnetplot of the top 8 pathways, highlighting shared genes linking cytokine signaling, platelet function, and extracellular matrix remodeling. C) Reactome emapplot (cutoff = 0.35)… view at source ↗

**Figure 6.** Figure 6: Random Forest analysis of feature importance. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

read the original abstract

Understanding disease relationships through blood biomarkers offers a pathway toward data-driven taxonomy and precision medicine. In this study, we constructed a digital blood twin, a computational model derived from 103 disease signatures comprising longitudinal hematological and biochemical analytes. Profiles were standardized into a unified disease-analyte matrix, and pairwise Pearson correlations were computed to assess similarity across conditions. Hierarchical clustering revealed consistent grouping of hematopoietic disorders, while metabolic, endocrine, and respiratory diseases were more heterogeneous, reflecting weaker internal cohesion. To evaluate cluster structure, the tree was partitioned at a stringent distance threshold, yielding 16 groups. Enrichment analysis of the largest and most heterogeneous cluster demonstrated convergence on cytokine-signaling pathways, indicating shared inflammatory mechanisms that transcend conventional clinical boundaries. PCA and UMAP corroborated the correlation-based results, consistently separating hematological diseases as a distinct cluster. Random Forest feature selection identified neutrophils, mean corpuscular volume, red blood cell count, and platelet count as the most discriminative analytes, reinforcing the role of hematopoietic markers as key drivers of disease stratification. Collectively, these findings show that blood-derived digital signatures can recover clinically meaningful disease clusters while uncovering mechanistic overlaps across categories. This network physiology framework highlights the potential of integrating routine laboratory data with computational methods to refine disease ontology, map comorbidities, and advance precision diagnostics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies standard correlation clustering and feature selection to 103 compiled disease blood profiles to recover hematopoietic groupings and cytokine enrichment in mixed clusters, but unadjusted confounders make the mechanistic claims shaky.

read the letter

The main point is that this work pulls together hematological and biochemical profiles for 103 diseases, computes pairwise Pearson correlations, runs hierarchical clustering, and cuts the tree at a fixed distance to get 16 groups. One large mixed cluster shows cytokine-signaling enrichment, while blood disorders separate cleanly. PCA, UMAP, and random forest all line up on the same story and flag neutrophils, red cell count, and platelets as top discriminators. That multi-view consistency is the strongest part of the execution and gives the groupings more weight than a single clustering run would have.

Referee Report

2 major / 2 minor

Summary. The paper constructs a 'digital blood twin' computational model from 103 disease signatures using longitudinal hematological and biochemical analytes. Disease profiles are standardized into a unified matrix, pairwise Pearson correlations are computed to measure similarity, and hierarchical clustering with a fixed distance threshold partitions the data into 16 groups. Enrichment analysis on the largest heterogeneous cluster identifies convergence on cytokine-signaling pathways. PCA, UMAP, and Random Forest feature selection corroborate separation of hematological diseases and highlight neutrophils, mean corpuscular volume, red blood cell count, and platelet count as key discriminative analytes. The central claim is that blood-derived signatures recover clinically meaningful clusters and uncover mechanistic overlaps transcending conventional disease categories.

Significance. If the correlations and clusters reflect true mechanistic similarities rather than artifacts, the work could provide a data-driven network physiology framework for refining disease ontology, mapping comorbidities, and advancing precision diagnostics using routine laboratory data. It would demonstrate the utility of unsupervised methods on standardized analyte profiles for identifying shared inflammatory mechanisms across metabolic, endocrine, respiratory, and other categories.

major comments (2)

[Abstract (pipeline description) and Results (clustering and enrichment)] The pipeline (standardization, Pearson correlations, hierarchical clustering at a fixed distance threshold to yield 16 groups, followed by enrichment) provides no evidence of covariate adjustment, matching, or sensitivity analyses for potential confounders such as age, sex, BMI, medications, or batch effects. This directly undermines the claim that observed clusters and cytokine-signaling enrichment represent mechanistic overlaps rather than data collection biases or unmodeled variables.
[Abstract and Methods] No sample sizes, data sources, statistical thresholds for enrichment, multiple-testing corrections, or error estimates are reported for the 103 signatures or the 16 groups. Without these, it is impossible to evaluate the robustness of the hematological separation or the cross-category convergence.

minor comments (2)

[Abstract] The term 'digital blood twin' is introduced without a precise mathematical definition or comparison to existing digital twin concepts in the literature.
[Results (hierarchical clustering)] Clarify whether the distance threshold for tree partitioning was chosen a priori or post hoc, and report sensitivity of the 16-group structure to small changes in this threshold.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and indicate revisions to be incorporated in the next version of the manuscript.

read point-by-point responses

Referee: The pipeline (standardization, Pearson correlations, hierarchical clustering at a fixed distance threshold to yield 16 groups, followed by enrichment) provides no evidence of covariate adjustment, matching, or sensitivity analyses for potential confounders such as age, sex, BMI, medications, or batch effects. This directly undermines the claim that observed clusters and cytokine-signaling enrichment represent mechanistic overlaps rather than data collection biases or unmodeled variables.

Authors: We agree that explicit treatment of potential confounders is necessary to support mechanistic interpretations. The 103 signatures were compiled from published aggregate data and public repositories rather than raw individual-level records, which limits direct covariate adjustment. In the revised manuscript we will add a new subsection in Methods describing the provenance of each signature and any available metadata on demographics or batch information. We will also report sensitivity analyses (e.g., re-clustering after excluding signatures with known age or sex imbalance where metadata permit) and expand the Discussion to quantify how unmodeled variables could affect cluster stability and enrichment results. These additions will make the limitations transparent while preserving the core finding that blood-analyte patterns recover reproducible groupings. revision: yes
Referee: No sample sizes, data sources, statistical thresholds for enrichment, multiple-testing corrections, or error estimates are reported for the 103 signatures or the 16 groups. Without these, it is impossible to evaluate the robustness of the hematological separation or the cross-category convergence.

Authors: We acknowledge that these quantitative details were omitted from the initial submission. The revised Methods section will now list, for each of the 103 signatures, the source study or database, the number of independent samples or patients contributing to the signature, and the number of analytes measured. For the enrichment analysis we will specify the exact statistical test, the FDR threshold employed (e.g., Benjamini-Hochberg FDR < 0.05), and any multiple-testing correction applied across the 16 clusters. Cluster stability will be quantified by reporting bootstrap or permutation-based error estimates on the cophenetic distances and on the Random Forest feature importances. These additions will allow readers to assess the statistical support for the reported separations and convergences. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard data-analysis pipeline

full rationale

The paper constructs a disease-analyte matrix from 103 signatures, standardizes it, computes pairwise Pearson correlations, applies hierarchical clustering (partitioned at a fixed distance threshold to yield 16 groups), runs enrichment, PCA, UMAP, and Random Forest feature selection. All steps are direct, off-the-shelf applications of established algorithms to the input data; no equations reduce outputs to fitted parameters by construction, no self-definitional loops, and no load-bearing self-citations or uniqueness theorems are invoked. The central claims are empirical results of this pipeline rather than tautological restatements of the inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that correlation-based similarity in blood analyte profiles reflects underlying biology, plus the post-hoc choice of a stringent distance threshold to define 16 clusters and the interpretation of enrichment as mechanistic convergence.

free parameters (1)

distance threshold for tree partitioning
Used to yield exactly 16 groups; described as stringent but no numerical value or selection criterion provided.

axioms (1)

domain assumption Pearson correlation on standardized analyte profiles measures biologically meaningful disease similarity
Invoked when constructing the similarity matrix and interpreting clusters as mechanistic overlaps.

invented entities (1)

digital blood twin no independent evidence
purpose: Computational model derived from longitudinal hematological and biochemical disease signatures
New term introduced for the unified disease-analyte matrix and derived signatures; no independent falsifiable prediction or external validation supplied.

pith-pipeline@v0.9.0 · 5543 in / 1088 out tokens · 47039 ms · 2026-05-17T22:56:03.499895+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

pairwise Pearson correlations were computed to assess similarity across conditions. Hierarchical clustering revealed consistent grouping... partitioned at a stringent distance threshold, yielding 16 groups. Enrichment analysis... cytokine-signaling pathways
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Profiles were standardized into a unified disease-analyte matrix

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

The role of blood testing in prevention, diagnosis, and management of chronic diseases: A review

Cabalar I, Le TH, Silber A, O’Hara M, Abdallah B, Parikh M, et al. The role of blood testing in prevention, diagnosis, and management of chronic diseases: A review. The American Journal of the Medical Sciences. 2024;368(4):274-86

work page 2024
[2]

Hoffbrand’s essential haematology

Hoffbrand AV. Hoffbrand’s essential haematology. John Wiley & Sons; 2024

work page 2024
[3]

Clinical proteomics: written in blood

Liotta LA, Ferrari M, Petricoin E. Clinical proteomics: written in blood. Nature. 2003;425(6961):905- 5

work page 2003
[4]

Digital twin in healthcare: Recent updates and challenges

Sun T, He X, Li Z. Digital twin in healthcare: Recent updates and challenges. Digital health. 2023;9:20552076221149651

work page 2023
[5]

Digital twin for healthcare systems

Vall´ee A. Digital twin for healthcare systems. Frontiers in Digital Health. 2023;5:1253050

work page 2023
[6]

The digital twin revolution in healthcare

Erol T, Mendi AF, Do ˘gan D. The digital twin revolution in healthcare. In: 2020 4th international symposium on multidisciplinary studies and innovative technologies (ISMSIT). IEEE; 2020. p. 1-7

work page 2020
[7]

Digital twins as global learning health and disease models for preventive and personalized medicine

Li X, Loscalzo J, Mahmud AF, Aly DM, Rzhetsky A, Zitnik M, et al. Digital twins as global learning health and disease models for preventive and personalized medicine. Genome Medicine. 2025;17(1):11

work page 2025
[8]

International statistical classification of diseases and related health prob- lems: 10th revision (ICD -10)

Organization WH, et al. International statistical classification of diseases and related health prob- lems: 10th revision (ICD -10). http://www who int/classifications/apps/icd/icd. 1992

work page 1992
[9]

International classification of diseases

WHO O. International classification of diseases. WHO [Internet]. 1992

work page 1992
[10]

The human disease network

Goh KI, Cusick ME, V alle D, Childs B, Vidal M, Barab ´asi AL. The human disease network. Pro- ceedings of the National Academy of Sciences. 2007;104(21):8685-90

work page 2007
[11]

A dynamic network approach for the study of human phenotypes

Hidalgo CA, Blumm N, Barab ´asi AL, Christakis NA. A dynamic network approach for the study of human phenotypes. PLoS computational biology. 2009;5(4):e1000353

work page 2009
[12]

The potential of the Medical Digital Twin in diabetes management: a review

Chu Y , Li S, Tang J, Wu H. The potential of the Medical Digital Twin in diabetes management: a review. Frontiers in Medicine. 2023;10:1178912

work page 2023
[13]

Mastering regular expressions

Friedl J. Mastering regular expressions. ” O’Reilly Media, Inc.”; 2006

work page 2006
[14]

Pearson K. VII. Mathematical contributions to the theory of evolution. —III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society 20 of London Series A, containing papers of a mathematical or physical character. 1896;(187):253-318

work page
[15]

A statistical method for evaluating systematic relationships

Sokal RR, Michener CD, et al. A statistical method for evaluating systematic relationships. 1958

work page 1958
[16]

Disease Ontology: a backbone for disease semantic integration

Schriml LM, Arze C, Nadendla S, Chang YWW, Mazaitis M, Felix V , et al. Disease Ontology: a backbone for disease semantic integration. Nucleic acids research. 2012;40(D1):D940-6

work page 2012
[17]

Disease Ontology; 2025

Project TDO. Disease Ontology; 2025. Accessed: 2025-07. https://disease- ontology.org/

work page 2025
[18]

The DisGeNET knowledge platform for disease genomics: 2019 update

Pin˜ero J, Ram´ırez-Anguita JM, Sa u¨ch-Pitarch J, Ronzano F, Centeno E, Sanz F, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic acids research. 2020;48(D1):D845-55

work page 2019
[19]

KEGG: kyoto encyclopedia of genes and genomes

Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research. 2000;28(1):27-30

work page 2000
[20]

The reactome pathway knowledgebase 2022

Gillespie M, Jassal B, Stephan R, Milacic M, Rothfels K, Senff -Ribeiro A, et al. The reactome pathway knowledgebase 2022. Nucleic acids research. 2022;50(D1):D687-92

work page 2022
[21]

Using clusterProfiler to characterize multiomics data

Xu S, Hu E, Cai Y , Xie Z, Luo X, Zhan L, et al. Using clusterProfiler to characterize multiomics data. Nature protocols. 2024;19(11):3292-320

work page 2024
[22]

ReactomePA: an R/Bioconductor package for reactome pathway analysis and visu- alization

Yu G, He QY . ReactomePA: an R/Bioconductor package for reactome pathway analysis and visu- alization. Molecular BioSystems. 2016;12(2):477-9

work page 2016
[23]

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological). 1995;57(1):289- 300

work page 1995
[24]

DisGeNET; 2025

project TD. DisGeNET; 2025. Accessed: 2025-07. https://www.disgenet.org/

work page 2025
[25]

Statistical analysis with missing data

Little RJ, Rubin DB. Statistical analysis with missing data. John Wiley & Sons; 2019

work page 2019
[26]

Principal component analysis: a review and recent developments

Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philo- sophical transactions of the royal society A: Mathematical, Physical and Engineering Sciences. 2016;374(2065):20150202

work page 2016
[27]

Umap: Uniform manifold approximation and projection for dimen- sion reduction

McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimen- sion reduction. arXiv preprint arXiv:180203426. 2018

work page 2018
[28]

Random forests

Breiman L. Random forests. Machine learning. 2001;45(1):5-32

work page 2001
[29]

Thyroid disorders and diabetes mellitus

Hage M, Zantout MS, Azar ST. Thyroid disorders and diabetes mellitus. Journal of thyroid research. 2011;2011(1):439463

work page 2011
[30]

Anemia of chronic disease

Weiss G, Goodnough LT. Anemia of chronic disease. New England Journal of Medicine. 2005;352(10):1011-23

work page 2005
[31]

Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data

De Winter JC, Gosling SD, Potter J. Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychological methods. 2016;21(3):273. 21

work page 2016
[32]

Learning similarity with cosine similarity ensemble

Xia P , Zhang L, Li F. Learning similarity with cosine similarity ensemble. Information sciences. 2015;307:39-52

work page 2015
[33]

Visualizing data using t -SNE

Maaten Lvd, Hinton G. Visualizing data using t -SNE. Journal of machine learning research. 2008;9(Nov):2579-605

work page 2008
[34]

A unified approach to interpreting model predictions

Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in neural information processing systems. 2017;30

work page 2017
[35]

” Why should i trust you?” Explaining the predictions of any clas- sifier

Ribeiro MT, Singh S, Guestrin C. ” Why should i trust you?” Explaining the predictions of any clas- sifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016. p. 1135- 44

work page 2016
[36]

Accessed: 2025-08-29

Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation); 2016. Accessed: 2025-08-29. https://gdpr-info.eu/

work page 2016
[37]

An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems

Zenil H, Kiani NA, Marabita F, Deng Y , Elias S, Schmidt A, Ball G, Tegnér J. An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems. iScience. 2019;19:1160–1172. doi:10.1016/j.isci.2019.07.043

work page doi:10.1016/j.isci.2019.07.043 2019
[38]

A Review of Mathematical and Computational Methods in Cancer Dynamics

Uthamacumaran A, Zenil H. A Review of Mathematical and Computational Methods in Cancer Dynamics. Frontiers in Oncology. 2022;12:850731. doi:10.3389/fonc.2022.850731

work page doi:10.3389/fonc.2022.850731 2022
[39]

Algorithmic Information Dynamics: A Computational Approach to Causality with Applications to Living Systems

Zenil H, Kiani NA, Tegnér J. Algorithmic Information Dynamics: A Computational Approach to Causality with Applications to Living Systems. Cambridge University Press; 2023

work page 2023
[40]

Emergence and algorithmic information dynamics of systems and observers

Abrahão FS, Zenil H. Emergence and algorithmic information dynamics of systems and observers. Philosophical Transactions of the Royal Society A. 2022;380:20200429. doi:10.1098/rsta.2020.0429. 22 Supplementary Material Table 2: Disease profiles grouped by hierarchical clustering (cut at distance = 0.02). Profile IDs are omitted for clarity; only disease na...

work page doi:10.1098/rsta.2020.0429 2022

[1] [1]

The role of blood testing in prevention, diagnosis, and management of chronic diseases: A review

Cabalar I, Le TH, Silber A, O’Hara M, Abdallah B, Parikh M, et al. The role of blood testing in prevention, diagnosis, and management of chronic diseases: A review. The American Journal of the Medical Sciences. 2024;368(4):274-86

work page 2024

[2] [2]

Hoffbrand’s essential haematology

Hoffbrand AV. Hoffbrand’s essential haematology. John Wiley & Sons; 2024

work page 2024

[3] [3]

Clinical proteomics: written in blood

Liotta LA, Ferrari M, Petricoin E. Clinical proteomics: written in blood. Nature. 2003;425(6961):905- 5

work page 2003

[4] [4]

Digital twin in healthcare: Recent updates and challenges

Sun T, He X, Li Z. Digital twin in healthcare: Recent updates and challenges. Digital health. 2023;9:20552076221149651

work page 2023

[5] [5]

Digital twin for healthcare systems

Vall´ee A. Digital twin for healthcare systems. Frontiers in Digital Health. 2023;5:1253050

work page 2023

[6] [6]

The digital twin revolution in healthcare

Erol T, Mendi AF, Do ˘gan D. The digital twin revolution in healthcare. In: 2020 4th international symposium on multidisciplinary studies and innovative technologies (ISMSIT). IEEE; 2020. p. 1-7

work page 2020

[7] [7]

Digital twins as global learning health and disease models for preventive and personalized medicine

Li X, Loscalzo J, Mahmud AF, Aly DM, Rzhetsky A, Zitnik M, et al. Digital twins as global learning health and disease models for preventive and personalized medicine. Genome Medicine. 2025;17(1):11

work page 2025

[8] [8]

International statistical classification of diseases and related health prob- lems: 10th revision (ICD -10)

Organization WH, et al. International statistical classification of diseases and related health prob- lems: 10th revision (ICD -10). http://www who int/classifications/apps/icd/icd. 1992

work page 1992

[9] [9]

International classification of diseases

WHO O. International classification of diseases. WHO [Internet]. 1992

work page 1992

[10] [10]

The human disease network

Goh KI, Cusick ME, V alle D, Childs B, Vidal M, Barab ´asi AL. The human disease network. Pro- ceedings of the National Academy of Sciences. 2007;104(21):8685-90

work page 2007

[11] [11]

A dynamic network approach for the study of human phenotypes

Hidalgo CA, Blumm N, Barab ´asi AL, Christakis NA. A dynamic network approach for the study of human phenotypes. PLoS computational biology. 2009;5(4):e1000353

work page 2009

[12] [12]

The potential of the Medical Digital Twin in diabetes management: a review

Chu Y , Li S, Tang J, Wu H. The potential of the Medical Digital Twin in diabetes management: a review. Frontiers in Medicine. 2023;10:1178912

work page 2023

[13] [13]

Mastering regular expressions

Friedl J. Mastering regular expressions. ” O’Reilly Media, Inc.”; 2006

work page 2006

[14] [14]

Pearson K. VII. Mathematical contributions to the theory of evolution. —III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society 20 of London Series A, containing papers of a mathematical or physical character. 1896;(187):253-318

work page

[15] [15]

A statistical method for evaluating systematic relationships

Sokal RR, Michener CD, et al. A statistical method for evaluating systematic relationships. 1958

work page 1958

[16] [16]

Disease Ontology: a backbone for disease semantic integration

Schriml LM, Arze C, Nadendla S, Chang YWW, Mazaitis M, Felix V , et al. Disease Ontology: a backbone for disease semantic integration. Nucleic acids research. 2012;40(D1):D940-6

work page 2012

[17] [17]

Disease Ontology; 2025

Project TDO. Disease Ontology; 2025. Accessed: 2025-07. https://disease- ontology.org/

work page 2025

[18] [18]

The DisGeNET knowledge platform for disease genomics: 2019 update

Pin˜ero J, Ram´ırez-Anguita JM, Sa u¨ch-Pitarch J, Ronzano F, Centeno E, Sanz F, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic acids research. 2020;48(D1):D845-55

work page 2019

[19] [19]

KEGG: kyoto encyclopedia of genes and genomes

Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research. 2000;28(1):27-30

work page 2000

[20] [20]

The reactome pathway knowledgebase 2022

Gillespie M, Jassal B, Stephan R, Milacic M, Rothfels K, Senff -Ribeiro A, et al. The reactome pathway knowledgebase 2022. Nucleic acids research. 2022;50(D1):D687-92

work page 2022

[21] [21]

Using clusterProfiler to characterize multiomics data

Xu S, Hu E, Cai Y , Xie Z, Luo X, Zhan L, et al. Using clusterProfiler to characterize multiomics data. Nature protocols. 2024;19(11):3292-320

work page 2024

[22] [22]

ReactomePA: an R/Bioconductor package for reactome pathway analysis and visu- alization

Yu G, He QY . ReactomePA: an R/Bioconductor package for reactome pathway analysis and visu- alization. Molecular BioSystems. 2016;12(2):477-9

work page 2016

[23] [23]

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological). 1995;57(1):289- 300

work page 1995

[24] [24]

DisGeNET; 2025

project TD. DisGeNET; 2025. Accessed: 2025-07. https://www.disgenet.org/

work page 2025

[25] [25]

Statistical analysis with missing data

Little RJ, Rubin DB. Statistical analysis with missing data. John Wiley & Sons; 2019

work page 2019

[26] [26]

Principal component analysis: a review and recent developments

Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philo- sophical transactions of the royal society A: Mathematical, Physical and Engineering Sciences. 2016;374(2065):20150202

work page 2016

[27] [27]

Umap: Uniform manifold approximation and projection for dimen- sion reduction

McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimen- sion reduction. arXiv preprint arXiv:180203426. 2018

work page 2018

[28] [28]

Random forests

Breiman L. Random forests. Machine learning. 2001;45(1):5-32

work page 2001

[29] [29]

Thyroid disorders and diabetes mellitus

Hage M, Zantout MS, Azar ST. Thyroid disorders and diabetes mellitus. Journal of thyroid research. 2011;2011(1):439463

work page 2011

[30] [30]

Anemia of chronic disease

Weiss G, Goodnough LT. Anemia of chronic disease. New England Journal of Medicine. 2005;352(10):1011-23

work page 2005

[31] [31]

Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data

De Winter JC, Gosling SD, Potter J. Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychological methods. 2016;21(3):273. 21

work page 2016

[32] [32]

Learning similarity with cosine similarity ensemble

Xia P , Zhang L, Li F. Learning similarity with cosine similarity ensemble. Information sciences. 2015;307:39-52

work page 2015

[33] [33]

Visualizing data using t -SNE

Maaten Lvd, Hinton G. Visualizing data using t -SNE. Journal of machine learning research. 2008;9(Nov):2579-605

work page 2008

[34] [34]

A unified approach to interpreting model predictions

Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in neural information processing systems. 2017;30

work page 2017

[35] [35]

” Why should i trust you?” Explaining the predictions of any clas- sifier

Ribeiro MT, Singh S, Guestrin C. ” Why should i trust you?” Explaining the predictions of any clas- sifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016. p. 1135- 44

work page 2016

[36] [36]

Accessed: 2025-08-29

Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation); 2016. Accessed: 2025-08-29. https://gdpr-info.eu/

work page 2016

[37] [37]

An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems

Zenil H, Kiani NA, Marabita F, Deng Y , Elias S, Schmidt A, Ball G, Tegnér J. An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems. iScience. 2019;19:1160–1172. doi:10.1016/j.isci.2019.07.043

work page doi:10.1016/j.isci.2019.07.043 2019

[38] [38]

A Review of Mathematical and Computational Methods in Cancer Dynamics

Uthamacumaran A, Zenil H. A Review of Mathematical and Computational Methods in Cancer Dynamics. Frontiers in Oncology. 2022;12:850731. doi:10.3389/fonc.2022.850731

work page doi:10.3389/fonc.2022.850731 2022

[39] [39]

Algorithmic Information Dynamics: A Computational Approach to Causality with Applications to Living Systems

Zenil H, Kiani NA, Tegnér J. Algorithmic Information Dynamics: A Computational Approach to Causality with Applications to Living Systems. Cambridge University Press; 2023

work page 2023

[40] [40]

Emergence and algorithmic information dynamics of systems and observers

Abrahão FS, Zenil H. Emergence and algorithmic information dynamics of systems and observers. Philosophical Transactions of the Royal Society A. 2022;380:20200429. doi:10.1098/rsta.2020.0429. 22 Supplementary Material Table 2: Disease profiles grouped by hierarchical clustering (cut at distance = 0.02). Profile IDs are omitted for clarity; only disease na...

work page doi:10.1098/rsta.2020.0429 2022