pith. machine review for the scientific record. sign in

arxiv: 2604.22890 · v1 · submitted 2026-04-24 · 🧬 q-bio.OT

Recognition: unknown

AI-Derived Reproductive Phenotypes and Explainable ML for Concurrent Early Multimorbidity in U.S. Women: NHANES 2017-March 2020

Authors on Pith no claims yet

Pith reviewed 2026-05-08 08:38 UTC · model grok-4.3

classification 🧬 q-bio.OT
keywords reproductive historymultimorbidityNHANESmachine learningphenotypingexplainable AIwomen's healthearly onset chronic disease
0
0 comments X

The pith

Adverse reproductive life-course patterns strongly cluster with concurrent early multimorbidity in U.S. women aged 20-44.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper uses national survey data to group women by reproductive history and chronic health markers, then tests whether certain groups show much higher rates of multiple early-onset conditions. It finds four distinct phenotypes, one of which contains women with heavy adverse reproductive burdens and shows 77.5 percent multimorbidity. The work compares logistic regression and XGBoost models, stresses calibration and explainability via SHAP values, and argues that reproductive history can serve as a practical signal for concurrent risk rather than isolated outcomes. A sympathetic reader would care because early identification of such clusters could shift screening and prevention toward younger women before chronic diseases fully develop.

Core claim

Principal components analysis and k-means phenotyping revealed that adverse reproductive life-course structure is strongly clustered with concurrent early multimorbidity in U.S. women aged 20-44 years. Although XGBoost improved discrimination, calibration and feature attribution remained essential for reliable translation into practice.

What carries the argument

Principal components analysis to reduce reproductive-history and multimorbidity features, followed by k-means clustering into four phenotypes, with SHAP values to explain contributions in logistic regression and XGBoost models.

If this is right

  • One latent phenotype showed 77.5 percent meeting the multimorbidity definition of at least two conditions among hypertension, hypercholesterolemia, cardiovascular disease, kidney disease, and kidney stones.
  • XGBoost achieved higher discrimination (ROC-AUC 0.766) than logistic regression (0.667) but worse calibration (Brier score 0.069 versus 0.059).
  • Dominant drivers of the phenotypes were age, PHQ-9 depression score, income-to-poverty ratio, race/ethnicity, education level, and the adverse reproductive index.
  • Adverse reproductive burden affected 58 percent of the sample and was strongly represented in the fragile cluster.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Reproductive history variables could be added to routine young-adult health checks as an inexpensive early warning for multimorbidity risk.
  • The same phenotyping pipeline might be tested on longitudinal cohorts to assess whether the clusters predict future disease incidence beyond cross-sectional association.
  • Calibration issues in tree-based models suggest that hybrid or post-hoc recalibration steps would be needed before any phenotype-based triage enters clinical guidelines.

Load-bearing premise

The chosen reproductive and chronic-condition variables, after PCA reduction and k-means clustering with k=4, produce clinically meaningful phenotypes instead of artifacts of variable coding or the specific multimorbidity definition.

What would settle it

An independent dataset or alternative clustering method in which the high-burden reproductive phenotype shows multimorbidity rates no higher than the other groups would falsify the claimed clustering.

Figures

Figures reproduced from arXiv: 2604.22890 by Sunday A. Adetunji.

Figure 1
Figure 1. Figure 1: Latent phenotypes of reproductive-chronic disease-mental-health profiles among U.S. women aged 20–44 years. Participants are projected onto the first two principal components of the standardized predictor matrix, and colors denote the four k-means-derived phenotypes. The purpose of the display is descriptive summarization of heterogeneity rather than causal subtype discovery view at source ↗
Figure 2
Figure 2. Figure 2: Receiver-operating characteristic curves for restricted early multimorbidity classification comparing logistic regression with gradient-boosted trees (XGBoost). The diagonal line denotes chance performance. The key interpretation is not only the separation of curves but the mismatch between discrimination and calibration documented in view at source ↗
Figure 3
Figure 3. Figure 3: Top ten features associated with restricted early multimorbidity in the XGBoost model. Bars represent mean view at source ↗
read the original abstract

Background:Adverse reproductive history is a multisystemic risk factor, but evidence is constrained by isolated outcome studies, limited adjustment, and non-interpretable algorithmic models. We re-frame the estimand from prediction to concurrent risk classification and emphasize calibration, interpretability, and systematic error. Methods:We analyzed 1,602 U.S. women aged 20-44 years from NHANES 2017-March 2020 with reproductive-history variables, chronic-condition indicators, and PHQ-9 data. Restricted multimorbidity was defined as at least two of hypertension, hypercholesterolemia, cardiovascular disease, kidney disease, and kidney stones. Features were summarized using principal components analysis and k-means clustering. We compared multivariable logistic regression with XGBoost and used SHAP values to quantify contributions. Results:Early multimorbidity occurred in 6.6% (106/1,602); 71.0% had no chronic condition and 22.4% had one. Adverse reproductive burden was common: 58% had at least one adverse reproductive factor and 12.6% had three or more. Four latent phenotypes emerged (n=398, 508, 102, 594), including a fragile subgroup in which 77.5% met the multimorbidity definition. In holdout evaluation, XGBoost improved discrimination relative to logistic regression (ROC-AUC 0.766 vs 0.667), but showed worse probability accuracy and calibration (Brier 0.069 vs 0.059; expected calibration error 0.113 vs 0.037). Dominant drivers were age, PHQ-9 score, income-to-poverty ratio, race/ethnicity, education, and the adverse reproductive index. Conclusions: Principal components analysis and k-means phenotyping revealed that adverse reproductive life-course structure is strongly clustered with concurrent early multimorbidity in U.S. women aged 20-44 years. Although XGBoost improved discrimination, calibration and feature attribution remained essential for reliable translation into practice

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes NHANES 2017-March 2020 data for 1,602 U.S. women aged 20-44 years. It applies principal components analysis and k-means clustering to reproductive-history variables, chronic-condition indicators (hypertension, hypercholesterolemia, CVD, kidney disease, kidney stones), and PHQ-9 scores to derive four latent phenotypes. Multimorbidity is defined as ≥2 of the chronic conditions. A 'fragile' phenotype (n=102) shows 77.5% multimorbidity prevalence. XGBoost is compared to logistic regression for prediction, with SHAP for interpretability, reporting AUC 0.766 vs 0.667 but poorer calibration.

Significance. If the phenotypes are not artifacts of the feature selection, the work provides evidence linking adverse reproductive life-course factors to concurrent early multimorbidity, with potential for improved risk stratification in clinical practice. The emphasis on calibration and explainability is a strength, but the clustering approach requires validation to confirm the association is driven by reproductive variables rather than the outcome indicators themselves.

major comments (2)
  1. [Methods] Methods section: The feature set for PCA and k-means clustering includes the same chronic-condition indicators used to define multimorbidity (≥2 conditions). This setup makes the separation of a high-multimorbidity 'fragile' cluster (77.5% prevalence) expected whenever these indicators have variance, potentially rendering the reported association with reproductive structure partly definitional. No ablation removing chronic indicators or reporting of PC loadings and variable contributions within clusters is provided to demonstrate that reproductive variables drive the phenotypes.
  2. [Results] Results/Abstract: The manuscript reports concrete metrics (AUC 0.766 vs 0.667, Brier scores, calibration error) and phenotype prevalences, but lacks details on cross-validation strategy, missing-data handling, exact feature list, sensitivity to choice of k=4, or how the multimorbidity threshold affects clustering. These omissions limit assessment of robustness for the central phenotyping claim.
minor comments (2)
  1. [Abstract] Abstract conclusion: The phrasing that PCA and k-means 'revealed' the clustering overstates the discovery without supporting diagnostics such as loadings or ablation, given the feature overlap with the outcome definition.
  2. Consider adding a table of PC loadings or cluster centroids to allow readers to evaluate variable importance and assess whether reproductive factors, rather than chronic indicators, dominate the latent structure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight key areas for improving the transparency and robustness of our phenotyping and predictive analyses. We address each major comment point by point below, indicating revisions where appropriate.

read point-by-point responses
  1. Referee: [Methods] Methods section: The feature set for PCA and k-means clustering includes the same chronic-condition indicators used to define multimorbidity (≥2 conditions). This setup makes the separation of a high-multimorbidity 'fragile' cluster (77.5% prevalence) expected whenever these indicators have variance, potentially rendering the reported association with reproductive structure partly definitional. No ablation removing chronic indicators or reporting of PC loadings and variable contributions within clusters is provided to demonstrate that reproductive variables drive the phenotypes.

    Authors: We acknowledge that this concern is valid: because the chronic-condition indicators are included in the feature set and also define the multimorbidity outcome, the high prevalence in the 'fragile' cluster is partly by construction. Our aim was to identify integrated latent structures linking reproductive history, chronic conditions, and depressive symptoms rather than to isolate reproductive effects. To address the referee's point directly, we will add an ablation analysis repeating PCA and k-means using only reproductive-history variables and PHQ-9 scores (excluding chronic indicators), report the resulting cluster-multimorbidity associations, and include principal component loadings plus per-variable contributions to cluster membership. These additions will appear in the revised Methods and Results sections. revision: yes

  2. Referee: [Results] Results/Abstract: The manuscript reports concrete metrics (AUC 0.766 vs 0.667, Brier scores, calibration error) and phenotype prevalences, but lacks details on cross-validation strategy, missing-data handling, exact feature list, sensitivity to choice of k=4, or how the multimorbidity threshold affects clustering. These omissions limit assessment of robustness for the central phenotyping claim.

    Authors: We agree that these methodological details are necessary for reproducibility and to evaluate robustness. In the revision we will expand the Methods section to specify the exact feature list, missing-data handling procedures, cross-validation strategy for the XGBoost and logistic regression models, sensitivity analyses for the choice of k (including silhouette scores and comparisons for k=3 to 6), and the effect of alternative multimorbidity thresholds on cluster stability. These additions will allow readers to assess the central phenotyping results more rigorously. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the phenotyping or ML analysis

full rationale

The paper conducts unsupervised PCA and k-means on NHANES features that include both reproductive-history variables and chronic-condition indicators, then reports cluster compositions with respect to a post-hoc multimorbidity definition (>=2 chronic conditions). This is a descriptive grouping exercise on external public data using standard methods; the observed co-clustering of reproductive burden with multimorbidity is an empirical pattern in the data rather than a quantity forced by construction or by any fitted parameter. No equations, self-citations, or uniqueness theorems are invoked to derive the central claim. The analysis remains self-contained against the external benchmark data without tautological reduction of outputs to inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The central claim rests on standard epidemiological assumptions about survey data and data-driven clustering choices rather than new mathematical axioms or unproven physical entities.

free parameters (2)
  • Number of clusters k
    Selected to identify latent phenotypes after PCA; value 4 reported in results
  • Multimorbidity definition threshold
    Set as at least two of five listed conditions; directly determines the 6.6% prevalence and phenotype rates
axioms (2)
  • domain assumption NHANES 2017-March 2020 sample is representative of U.S. women aged 20-44 for the variables analyzed
    Invoked to support generalization of phenotype and multimorbidity findings
  • domain assumption Adverse reproductive factors can be meaningfully summarized into a single index for clustering and attribution
    Used as a dominant driver in SHAP and phenotype interpretation
invented entities (1)
  • Fragile subgroup phenotype no independent evidence
    purpose: To label the high-multimorbidity cluster emerging from k-means
    Derived entirely from the clustering procedure on this dataset; no external validation or independent measurement provided

pith-pipeline@v0.9.0 · 5686 in / 1680 out tokens · 52537 ms · 2026-05-08T08:38:05.309971+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

  1. [1]

    TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods

    Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:q902. doi:10.1136/bmj.q902

  2. [2]

    PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods

    Moons KGM, Damen JAA, Kaul T, Hooft L, Andaur Navarro C, Dhiman P, et al. PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ. 2025;388:e082505. doi:10.1136/bmj-2024-082505

  3. [3]

    Evaluation of clinical prediction models (part 1): from development to external validation

    Collins GS, Dhiman P, Ma J, Schlussel MM, Archer L, Van Calster B, et al. Evaluation of clinical prediction models (part 1): from development to external validation. BMJ. 2024;384:e074819. doi:10.1136/bmj-2023-074819

  4. [4]

    Developing clinical prediction models: a step -by-step guide

    Efthimiou O, Seo M, Chalkou K, Debray TPA, Egger M, Salanti G. Developing clinical prediction models: a step -by-step guide. BMJ. 2024;386:e078276. doi:10.1136/bmj-2023- 078276

  5. [5]

    Uncertainty of risk estimates from clinical prediction models: rationale, challenges, and approaches

    Riley RD, Collins GS, Kirton L, Snell KIE, Ensor J, Whittle R, et al. Uncertainty of risk estimates from clinical prediction models: rationale, challenges, and approaches. BMJ. 2025;388:e080749. doi:10.1136/bmj-2024-080749

  6. [6]

    URL https://bmcmedicine.biomedcentral

    Van Calster B, McLernon DJ, Van Smeden M, Wynants L, Steyerberg EW, Bossuyt P, et al. Calibration: the Achilles heel of predictive analytics. BMC Med . 2019;17(1):230. doi:10.1186/s12916-019-1466-7

  7. [7]

    Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression Refereed (Peer-Reviewed) Conference Paper: CS016 LLM and Agent Applications II | 2026 Symposium on Data Science and Statistics | American Statistical Association for clinical prediction...

  8. [8]

    Reporting and interpreting decision curve analysis: a guide for investigators

    Van Calster B, Wynants L, Verbeek JFM, Verbakel JY, Christodoulou E, Vickers AJ, et al. Reporting and interpreting decision curve analysis: a guide for investigators. Eur Urol . 2018;74(6):796-804. doi:10.1016/j.eururo.2018.08.038

  9. [9]

    Andrea Cristina McGlinchey and Peter J

    Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell . 2020;2:56-67. doi:10.1038/s42256-019-0138-9

  10. [10]

    Association between the reproductive health of young women and cardiovascular disease in later life: umbrella review

    Okoth K, Chandan JS, Marshall T, Thangaratinam S, Thomas GN, Nirantharakumar K, Adderley NJ. Association between the reproductive health of young women and cardiovascular disease in later life: umbrella review. BMJ. 2020;371:m3502. doi:10.1136/bmj.m3502

  11. [11]

    Parikh NI, Gonzalez JM, Anderson CAM, Judd SE, Rexrode KM, Hlatky MA, et al. Adverse pregnancy outcomes and cardiovascular disease risk: unique opportunities for cardiovascular disease prevention in women: a scientific statement from the American Heart Association. Circulation. 2021;143(18):e902-e916. doi:10.1161/CIR.0000000000000961

  12. [12]

    Incidence and long -term outcomes of hypertensive disorders of pregnancy

    Garovic VD, White WM, Vaughan L, Saiki M, Parashuram S, Garcia -Valencia O, et al. Incidence and long -term outcomes of hypertensive disorders of pregnancy. J Am Coll Cardiol. 2020;75(18):2323-2334. doi:10.1016/j.jacc.2020.03.028

  13. [13]

    Pregnancy and reproductive risk factors for cardiovascular disease in women

    O’Kelly AC, Michos ED, Shufelt CL, Vermunt JV, Minissian MB, Quesada O, et al. Pregnancy and reproductive risk factors for cardiovascular disease in women. Circ Res . 2022;130(4):652-672. doi:10.1161/CIRCRESAHA.121.319895

  14. [14]

    Pregnancy complications and later life women’s health

    McNestry C, Killeen SL, Crowley RK, McAuliffe FM. Pregnancy complications and later life women’s health. Acta Obstet Gynecol Scand. 2023;102(5):523-531. doi:10.1111/aogs.14523

  15. [15]

    Miscarriage matters: the epidemiological, physical, psychological and economic burden of early pregnancy loss

    Quenby S, Gallos ID, Dhillon -Smith RK, Podesek M, Stephenson MD, Fisher J, et al. Miscarriage matters: the epidemiological, physical, psychological and economic burden of early pregnancy loss. Lancet. 2021;397(10285):1658-1667. doi:10.1016/S0140-6736(21)00682- 6

  16. [16]

    National Health and Nutrition Examination Survey 2017 –March 2020 prepandemic data files —development of files and prevalence estimates for selected health outcomes

    Stierman B, Afful J, Carroll MD, Chen TC, Davy O, Fink S, et al. National Health and Nutrition Examination Survey 2017 –March 2020 prepandemic data files —development of files and prevalence estimates for selected health outcomes. Natl Health Stat Report. 2021;(158). doi:10.15620/cdc:106273

  17. [17]

    National Health and Nutrition Examination Survey, 2017 –March 2020 prepandemic file: sample design, estimation, and analytic guidelines

    Akinbami LJ, Chen TC, Davy O, Ogden CL, Fink S, Clark J, et al. National Health and Nutrition Examination Survey, 2017 –March 2020 prepandemic file: sample design, estimation, and analytic guidelines. Vital Health Stat 2. 2022;(190):1 -36. doi:10.15620/cdc:115434

  18. [18]

    XGBoost: A Scalable Tree Boosting System

    Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY: ACM; 2016:785-794. doi:10.1145/2939672.2939785

  19. [19]

    1987 , issue_date =

    Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53-65. doi:10.1016/0377-0427(87)90125-7. Refereed (Peer-Reviewed) Conference Paper: CS016 LLM and Agent Applications II | 2026 Symposium on Data Science and Statistics | American Statistical Association

  20. [20]

    Estimating the number of clusters in a data set via the gap statistic

    Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Series B Stat Methodol. 2001;63(2):411 -423. doi:10.1111/1467 - 9868.00293

  21. [21]

    Cluster -wise assessment of cluster stability

    Hennig C. Cluster -wise assessment of cluster stability. Comput Stat Data Anal. 2007;52(1):258-271. doi:10.1016/j.csda.2006.11.025