Similarity Analysis of Blood Count Reference Intervals Across Continents Reveals No Reproducible Population or Geography-Linked Structure and Supports Personalised Values
Pith reviewed 2026-05-17 23:04 UTC · model grok-4.3
The pith
Published blood count reference intervals show no reproducible geographic or population structure across 28 countries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Published CBC reference intervals do not encode coherent global structure and provide limited support for universal population-based diagnostic thresholds. Instead, they support a transition toward recalibrated and personalised reference frameworks based on longitudinal individual baselines and harmonised derivation standards. This conclusion follows from the absence of geography-linked clustering in hierarchical clustering, information-theoretic distances, cohesion benchmarking, and nonlinear manifold visualisation, in contrast to the clear continent-level clustering seen in BMI data.
What carries the argument
Variability mapping, hierarchical clustering, information-theoretic distances, cohesion benchmarking, and nonlinear manifold visualisation applied to CBC reference interval data from 28 countries, benchmarked against BMI.
If this is right
- Current widely used CBC reference intervals have limited biological grounding for population-specific diagnostics.
- Diagnostic and therapeutic decisions based on these intervals may benefit from recalibration to individual baselines.
- Harmonised derivation standards across laboratories could reduce inconsistencies in global reference values.
- Personalised reference frameworks using longitudinal data offer a more reliable alternative to universal thresholds.
Where Pith is reading between the lines
- Extending this analysis to other lab panels like liver function tests could reveal similar historical fragmentation.
- Implementing individual baseline tracking in electronic health records might improve diagnostic accuracy in practice.
- Future studies could test whether harmonization efforts reduce the observed high similarity by enforcing common protocols.
Load-bearing premise
The published reference interval values from the 28 countries are comparable and free from selection or reporting biases that would mask existing population differences.
What would settle it
Observing clear and reproducible clustering by continent in CBC reference intervals when using additional data sources or different similarity metrics would undermine the finding of no structure.
Figures
read the original abstract
Blood reference intervals (RIs) underpin diagnostic interpretation and therapeutic monitoring worldwide. However, many widely used RI systems originate from limited historical cohorts and have been propagated across health systems without harmonised derivation protocols, shared metadata, or cross-population validation. Consequently, the global RI landscape reflects a heterogeneous mixture of legacy standards and local laboratory practices rather than a biologically grounded framework. Here we examine published Complete Blood Count (CBC) reference intervals, one of the most commonly used laboratory panels worldwide. We compiled CBC RI data from 28 countries and analysed their similarity using variability mapping, hierarchical clustering, information-theoretic distances, cohesion benchmarking, and nonlinear manifold visualisation. Body mass index (BMI) served as a methodological positive benchmark and exhibited clear continent-level clustering (mean cohesion approximately 0.78-0.81). In contrast, CBC reference intervals showed no reproducible geography-linked clustering across methods, with uniformly high cohesion scores (mean approximately 1.27-1.30). Weak signals in red-cell indices (MCV, HGB) were unstable across sexes and distance metrics. This absence of structure should not be interpreted as evidence that current CBC reference intervals represent universal biological standards. Rather, it is more consistent with the fragmented and historically inherited nature of the global RI landscape. These findings indicate that published CBC reference intervals do not encode coherent global structure and provide limited support for universal population-based diagnostic thresholds. Instead, they support a transition toward recalibrated and personalised reference frameworks based on longitudinal individual baselines and harmonised derivation standards.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript compiles published Complete Blood Count (CBC) reference interval (RI) values from 28 countries and applies multiple similarity analyses—variability mapping, hierarchical clustering, information-theoretic distances, cohesion benchmarking, and nonlinear manifold visualisation—to test for geography- or population-linked structure. A BMI dataset serves as a positive methodological control and exhibits clear continent-level clustering (cohesion ~0.78–0.81). In contrast, the CBC RIs display uniformly high cohesion (~1.27–1.30) and no reproducible clustering across methods, with only unstable weak signals in red-cell indices. The authors conclude that published CBC RIs lack coherent global structure and therefore provide limited support for universal population-based thresholds, favouring instead personalised longitudinal baselines and harmonised derivation standards.
Significance. If the central empirical result is robust, the work carries clear significance for clinical laboratory medicine and global diagnostic standards. The multi-method approach together with the BMI benchmark supplies a concrete, falsifiable demonstration that current CBC RIs do not encode detectable population or geography-linked structure. This directly challenges reliance on legacy, non-harmonised reference values and supplies quantitative support for the ongoing shift toward individualised reference frameworks—an argument with immediate implications for diagnostic accuracy, equity across health systems, and laboratory accreditation policies.
major comments (3)
- [Methods (Data compilation)] Data compilation subsection: the manuscript does not describe any harmonisation procedure for differences in original RI derivation methods (percentile cut-offs 2.5–97.5 versus mean±2 SD, direct versus indirect methods, sample-size variation, or instrumentation). Because the central claim interprets the null clustering result as evidence against population structure, the absence of such harmonisation is load-bearing; unadjusted methodological heterogeneity could produce the observed high cohesion scores independently of biology.
- [Results (Cohesion benchmarking)] Cohesion benchmarking and results sections: mean cohesion values of approximately 1.27–1.30 for CBC RIs are presented without error bars, bootstrap intervals, or sensitivity analyses to data-inclusion criteria. The BMI control is reported with comparable precision; the lack of uncertainty quantification for the null result therefore weakens the claim that structure is reproducibly absent across methods.
- [Results (Clustering and visualisation)] Clustering and manifold visualisation results: weak signals noted for MCV and HGB are described as unstable across sexes and distance metrics, yet no quantitative stability metric (e.g., adjusted Rand index across metric variants or sex-stratified silhouette scores) is supplied. Without this, it remains unclear whether the instability is sufficient to dismiss these indices or whether they constitute a reproducible, albeit modest, exception to the no-structure conclusion.
minor comments (2)
- [Abstract] Abstract: the cohesion ranges are given as “approximately 1.27-1.30”; reporting the exact mean and standard deviation (or range) would improve precision.
- [Figures] Figure legends: the nonlinear manifold visualisation panels should explicitly state the embedding algorithm (e.g., t-SNE, UMAP) and the distance metric used for the input dissimilarity matrix.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which have prompted us to strengthen the methodological transparency and quantitative support in the manuscript. We address each major comment below and have revised the text to incorporate additional descriptions, uncertainty estimates, and stability metrics. These changes reinforce rather than alter our central empirical finding of absent reproducible structure in the CBC reference intervals.
read point-by-point responses
-
Referee: Data compilation subsection: the manuscript does not describe any harmonisation procedure for differences in original RI derivation methods (percentile cut-offs 2.5–97.5 versus mean±2 SD, direct versus indirect methods, sample-size variation, or instrumentation). Because the central claim interprets the null clustering result as evidence against population structure, the absence of such harmonisation is load-bearing; unadjusted methodological heterogeneity could produce the observed high cohesion scores independently of biology.
Authors: We thank the referee for this observation. The compiled data consist of the reference intervals exactly as published by each source, because our objective was to evaluate the similarity structure present in the intervals that are actually deployed in clinical practice. Retroactive harmonisation of derivation methods is not feasible without access to the original raw datasets and laboratory protocols, which are not provided in the published literature. We have added an explicit subsection in Methods that documents the known sources of methodological heterogeneity across the 28 countries and explains that such heterogeneity would be expected to increase dispersion rather than inflate cohesion. The revised Discussion now frames this as an inherent limitation of secondary analyses of published RIs and clarifies that the reported high cohesion reflects the current, unharmonised global landscape. We believe this addition directly addresses the concern while preserving the validity of the observed null result. revision: yes
-
Referee: Cohesion benchmarking and results sections: mean cohesion values of approximately 1.27–1.30 for CBC RIs are presented without error bars, bootstrap intervals, or sensitivity analyses to data-inclusion criteria. The BMI control is reported with comparable precision; the lack of uncertainty quantification for the null result therefore weakens the claim that structure is reproducibly absent across methods.
Authors: We agree that uncertainty quantification improves the strength of the cohesion comparison. We have now performed bootstrap resampling of the country set (1,000 replicates with replacement) and report 95% confidence intervals for the mean cohesion scores of both CBC and BMI datasets. These intervals are displayed as error bars in the updated figures and tabulated in the revised Results. We additionally conducted sensitivity analyses that exclude countries with the smallest reported sample sizes and that stratify by available metadata on derivation method. The CBC cohesion remains stably high (revised mean 1.29, 95% CI 1.26–1.32) while the BMI benchmark remains distinctly lower, confirming that the contrast is robust to these perturbations. revision: yes
-
Referee: Clustering and manifold visualisation results: weak signals noted for MCV and HGB are described as unstable across sexes and distance metrics, yet no quantitative stability metric (e.g., adjusted Rand index across metric variants or sex-stratified silhouette scores) is supplied. Without this, it remains unclear whether the instability is sufficient to dismiss these indices or whether they constitute a reproducible, albeit modest, exception to the no-structure conclusion.
Authors: To provide a quantitative assessment of stability, we have computed adjusted Rand indices comparing the cluster partitions obtained under Euclidean, Manhattan, and cosine distances, as well as between male- and female-stratified subsets. For MCV and HGB the ARI values are low (all < 0.30), indicating that the weak signals do not produce consistent groupings. Sex-stratified silhouette scores for these two indices likewise show no systematic elevation relative to the other CBC parameters. These metrics are now reported in the main Results text, with the full matrix of ARI values and silhouette scores placed in a new supplementary table. The quantitative evidence supports our original statement that the signals are unstable and do not constitute a reproducible exception. revision: yes
Circularity Check
No circularity: empirical similarity analysis derives directly from compiled data
full rationale
The paper compiles published CBC reference interval values from 28 countries and applies standard methods (hierarchical clustering, information-theoretic distances, cohesion benchmarking, manifold visualisation) with BMI as an external positive control that exhibits continent-level structure. The central claim of absent reproducible geography-linked structure follows from the observed high cohesion scores and lack of clustering in the CBC data, without any equations, fitted parameters, or self-citations that reduce the result to its own inputs by construction. The derivation is observational and self-contained against the external BMI benchmark.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We compiled CBC RI data from 28 countries and analysed their similarity using variability mapping, hierarchical clustering, information-theoretic distances, cohesion benchmarking, and nonlinear manifold visualisation.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CBC reference intervals showed no reproducible geography-linked clustering across methods, with uniformly high cohesion scores (mean approximately 1.27-1.30).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Patterns in Individual Blood Count Trajectories in the UK Biobank Characterise Disease-Specific Signatures and Anticipate Pan-Cancer Risk
Longitudinal CBC trajectories in UK Biobank data yield disease-specific signatures that anticipate pan-cancer risk using machine learning.
Reference graph
Works this paper leans on
-
[1]
Reference intervals: current status, recent developments and future considerations
Ozarda Y . Reference intervals: current status, recent developments and future considerations. Biochemia Medica. 2016;26(1):5-16
work page 2016
-
[2]
Jones G, Barker A. Reference intervals. Clin Biochem Rev. 2008;29(Suppl 1):S93
work page 2008
-
[3]
Personalized reference intervals: from theory to practice
Coskun A, Sandberg S, Unsal I, et al. Personalized reference intervals: from theory to practice. Crit Rev Clin Lab Sci. 2022;59(7):501-516
work page 2022
-
[4]
Lim E, Miyamura J, Chen JJ. Racial/ethnic-specific reference intervals for common laboratory tests: a comparison among Asians, Blacks, Hispanics, and White. Hawaii J Med Public Health. 2015;74(9):302
work page 2015
-
[5]
A global multicenter study on reference values: 1
Ichihara K, Ozarda Y , Barth JH, et al. A global multicenter study on reference values: 1. Assessment of methods for derivation and comparison of reference intervals. Clin Chim Acta. 2017;467:70-82
work page 2017
-
[6]
The critical role of racial/ethnic data disaggregation for health equity
Kauh TJ, Read JG, Scheitler AJ. The critical role of racial/ethnic data disaggregation for health equity. Popul Res Policy Rev. 2021;40(1):1-7
work page 2021
-
[7]
Price MA, Fast PE, Mshai M, et al. Region-specific laboratory reference intervals are important: A systematic review of the data from Africa. PLOS Glob Public Health. 2022;2(11):e0000783
work page 2022
-
[8]
Velev J, LeBien J, Roche-Lima A. Unsupervised machine learning method for indirect estimation of reference intervals for chronic kidney disease in the Puerto Rican population. Sci Rep. 2023;13(1):17198
work page 2023
-
[9]
Shomorony I, Cirulli ET, Huang L, et al. An unsupervised learning approach to identify novel signatures of health and disease from multimodal data. Genome Med. 2020;12(1):7
work page 2020
-
[10]
Preventing dataset shift from breaking machine- learning biomarkers
Dockès J, Varoquaux G, Poline JB. Preventing dataset shift from breaking machine- learning biomarkers. GigaScience. 2021;10(9):giab055
work page 2021
-
[11]
Are reference change values more useful than population-based reference intervals [Internet]
Fraser CG, Bartlett WA. Are reference change values more useful than population-based reference intervals [Internet]. Washington (DC): AACC; 2013 [cited 2018 May 24]. Available from: https://www.aacc.org/community/aacc- academy/publications/scientificshorts/2013/are-reference-change-values-more-useful- than-population-based-reference-intervals
work page 2013
-
[12]
Molla G, Bitew M. Revolutionizing personalized medicine: synergy with multi-omics data generation, main hurdles, and future perspectives. Biomedicines. 2024;12(12):2750
work page 2024
-
[13]
Roberts MC, Fohner AE, Landry L, et al. Advancing precision public health using human genomics: examples from the field and future research opportunities. Genome Med. 2021;13(1):97
work page 2021
-
[14]
Hu XL, Hassan H, Al-Dayel FH. Reference intervals for common biochemistry laboratory tests in the Saudi population by a direct a priori method. Ann Saudi Med. 2017;37(1):16-20
work page 2017
-
[15]
Hematological reference intervals for adult population of Debre Berhan town, North East Ethiopia
Kelem A, Engidaye G, Addisu B, et al. Hematological reference intervals for adult population of Debre Berhan town, North East Ethiopia. Sci Rep. 2025;15(1):14121
work page 2025
-
[16]
Karnes JH, Arora A, Feng J, et al. Racial, ethnic, and gender differences in obesity and body fat distribution: An All of Us Research Program demonstration project. PLoS One. 2021;16(8):e0255583
work page 2021
-
[17]
Angelo A, Derbie G, Demtse A, et al. Umbilical cord blood hematological parameters reference interval for newborns from Addis Ababa, Ethiopia. BMC Pediatr. 2021;21(1):275
work page 2021
-
[18]
Cheng CKW, Chan J, Cembrowski GS, et al. Complete blood count reference interval diagrams derived from NHANES III: stratification by age, sex, and race. Lab Hematol. 2004;10(1):42-53
work page 2004
-
[19]
Reference intervals of white blood cell parameters for healthy adults in Japan
Takami A, Watanabe S, Yamamoto Y , et al. Reference intervals of white blood cell parameters for healthy adults in Japan. Int J Lab Hematol. 2021;43(5):948-958
work page 2021
-
[20]
Rustad P, Felding P, Franzson L, et al. The Nordic Reference Interval Project 2000: recommended reference intervals for 25 common biochemical properties. Scand J Clin Lab Invest. 2004;64(4):271-284
work page 2000
-
[21]
Al-Mawali A, Pinto AD, Al-Busaidi R, et al. Comprehensive haematological indices reference intervals for a healthy Omani population: First comprehensive study in Gulf Cooperation Council (GCC) and Middle Eastern countries based on age, gender and ABO blood group comparison. PLoS One. 2018;13(4):e0194497
work page 2018
-
[22]
Shaikh MS, Ahmed S, Khalid A, et al. Establishment of population specific reference intervals in healthy Pakistani adults for 21 routine and special haematology analytes. Ejifcc. 2022;33(3):220
work page 2022
-
[23]
A roadmap to precision medicine through post- genomic electronic medical records
Mendez KM, Reinke SN, Kelly RS, et al. A roadmap to precision medicine through post- genomic electronic medical records. Nat Commun. 2025;16(1):1700
work page 2025
-
[24]
Burke DL, Ensor J, Riley RD. Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ. Stat Med. 2017;36(5):855-875
work page 2017
-
[25]
An introduction to individual participant data meta-analysis
Veroniki AA, Seitidis G, Tsivgoulis G, et al. An introduction to individual participant data meta-analysis. Neurology. 2023;100(23):1102-1110
work page 2023
-
[26]
Pyrkov TV , Avchaciov K, Tarkhov AE, et al. Longitudinal analysis of blood markers reveals progressive loss of resilience and predicts human lifespan limit. Nat Commun. 2021;12(1):2765
work page 2021
-
[27]
A longitudinal big data approach for precision health
Schüssler-Fiorenza Rose SM, Contrepois K, Moneghetti KJ, et al. A longitudinal big data approach for precision health. Nat Med. 2019;25(5):792-804
work page 2019
-
[28]
Fraser CG. Reference change values. Clin Chem Lab Med. 2012;50(5):807-812
work page 2012
-
[29]
Reference value profile for healthy individuals from the Aljouf region of Saudi Arabia
Elderdery AY , Alshaiban AS. Reference value profile for healthy individuals from the Aljouf region of Saudi Arabia. J Hematol. 2017;6:6-11
work page 2017
-
[30]
Standards for clinical laboratory services
Dubai Health Authority. Standards for clinical laboratory services. Code: DHA/HRS/HPSD/ST-28, Issue 2. Dubai: Dubai Health Authority; 2023 [cited 2025 Aug 22]. Available from: https://www.dha.gov.ae/uploads/052023/Standards%20for%20Clinical%20Laboratory% 20Services2023552664.pdf
work page 2023
-
[31]
Blodstatus [Blood Status] [Internet]
Karolinska Universitetssjukhuset. Blodstatus [Blood Status] [Internet]. Solna: Karolinska Universitetssjukhuset; 2015 [cited 2025 Jul]. Available from: https://www.karolinska.se/KUL/Alla-anvisningar/Anvisning/9021
work page 2015
-
[32]
Mean body mass index [Internet]
World Health Organization. Mean body mass index [Internet]. Geneva: World Health Organization; 2016 [cited 2025 Aug 22]. Available from: https://platform.who.int/data/maternal-newborn-child-adolescent-ageing/indicator- explorer-new/MCA/mean-body-mass-index
work page 2016
-
[33]
Felsenstein J. Inferring Phylogenies. Sunderland (MA): Sinauer Associates; 2004
work page 2004
-
[34]
Haematological setpoints are a stable and patient-specific deep phenotype
Foy BH, Petherbridge R, Roth MT, et al. Haematological setpoints are a stable and patient-specific deep phenotype. Nature. 2025;637(8045):430-8
work page 2025
-
[35]
Ganie MA, Chowdhury S, Suri V , et al. Variation in normative values of major clinical biochemistry analytes in healthy reproductive-age women in India: A subset of data from a National Indian Council of Medical Research-Polycystic Ovary Syndrome task force study. Indian J Pharmacol. 2023;55(2):76-88
work page 2023
-
[36]
Complete blood count reference intervals from a healthy adult urban population in Kenya
Omuse G, Maina D, Mwangi J, et al. Complete blood count reference intervals from a healthy adult urban population in Kenya. PLoS One. 2018;13(6):e0198444
work page 2018
-
[37]
Medical benchmarks and the myth of the universal patient
Singh M. Medical benchmarks and the myth of the universal patient. The New Yorker [Internet]. 2025 Mar 24 [cited 2025 Aug 22]. Available from: https://www.newyorker.com/magazine/2025/03/31/medical-benchmarks-and-the-myth- of-the-universal-patient
work page 2025
-
[38]
An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems
Zenil H, Kiani NA, Marabita F, Deng Y , Elias S, Schmidt A, Ball G & Tegnér J. An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems. iScience. 2019;19:1160–1172
work page 2019
-
[39]
A Review of Mathematical and Computational Methods in Cancer Dynamics
Uthamacumaran A & Zenil H. A Review of Mathematical and Computational Methods in Cancer Dynamics. Front Oncol. 2022;12:850731
work page 2022
-
[40]
Zenil H, Kiani NA & Tegnér J. Algorithmic Information Dynamics: A Computational Approach to Causality with Applications to Living Systems. Cambridge University Press; 2023
work page 2023
-
[41]
Emergence and algorithmic information dynamics of systems and observers
Abrahão FS & Zenil H. Emergence and algorithmic information dynamics of systems and observers. Phil Trans R Soc A. 2022;380:20200429. SUPPLEMENTARY INFORMATION Tables Rank Type Selected Variables Linkage Metric Two-Level Cohesion Score 1 Multi-D Top-5 by MI Average Mutual Information 0.357 2 1D Male_MCH_Midpoint Average Euclidean 0.829 3 Multi-D All Male ...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.