Recognition: unknown
AI-Derived Reproductive Phenotypes and Explainable ML for Concurrent Early Multimorbidity in U.S. Women: NHANES 2017-March 2020
Pith reviewed 2026-05-08 08:38 UTC · model grok-4.3
The pith
Adverse reproductive life-course patterns strongly cluster with concurrent early multimorbidity in U.S. women aged 20-44.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Principal components analysis and k-means phenotyping revealed that adverse reproductive life-course structure is strongly clustered with concurrent early multimorbidity in U.S. women aged 20-44 years. Although XGBoost improved discrimination, calibration and feature attribution remained essential for reliable translation into practice.
What carries the argument
Principal components analysis to reduce reproductive-history and multimorbidity features, followed by k-means clustering into four phenotypes, with SHAP values to explain contributions in logistic regression and XGBoost models.
If this is right
- One latent phenotype showed 77.5 percent meeting the multimorbidity definition of at least two conditions among hypertension, hypercholesterolemia, cardiovascular disease, kidney disease, and kidney stones.
- XGBoost achieved higher discrimination (ROC-AUC 0.766) than logistic regression (0.667) but worse calibration (Brier score 0.069 versus 0.059).
- Dominant drivers of the phenotypes were age, PHQ-9 depression score, income-to-poverty ratio, race/ethnicity, education level, and the adverse reproductive index.
- Adverse reproductive burden affected 58 percent of the sample and was strongly represented in the fragile cluster.
Where Pith is reading between the lines
- Reproductive history variables could be added to routine young-adult health checks as an inexpensive early warning for multimorbidity risk.
- The same phenotyping pipeline might be tested on longitudinal cohorts to assess whether the clusters predict future disease incidence beyond cross-sectional association.
- Calibration issues in tree-based models suggest that hybrid or post-hoc recalibration steps would be needed before any phenotype-based triage enters clinical guidelines.
Load-bearing premise
The chosen reproductive and chronic-condition variables, after PCA reduction and k-means clustering with k=4, produce clinically meaningful phenotypes instead of artifacts of variable coding or the specific multimorbidity definition.
What would settle it
An independent dataset or alternative clustering method in which the high-burden reproductive phenotype shows multimorbidity rates no higher than the other groups would falsify the claimed clustering.
Figures
read the original abstract
Background:Adverse reproductive history is a multisystemic risk factor, but evidence is constrained by isolated outcome studies, limited adjustment, and non-interpretable algorithmic models. We re-frame the estimand from prediction to concurrent risk classification and emphasize calibration, interpretability, and systematic error. Methods:We analyzed 1,602 U.S. women aged 20-44 years from NHANES 2017-March 2020 with reproductive-history variables, chronic-condition indicators, and PHQ-9 data. Restricted multimorbidity was defined as at least two of hypertension, hypercholesterolemia, cardiovascular disease, kidney disease, and kidney stones. Features were summarized using principal components analysis and k-means clustering. We compared multivariable logistic regression with XGBoost and used SHAP values to quantify contributions. Results:Early multimorbidity occurred in 6.6% (106/1,602); 71.0% had no chronic condition and 22.4% had one. Adverse reproductive burden was common: 58% had at least one adverse reproductive factor and 12.6% had three or more. Four latent phenotypes emerged (n=398, 508, 102, 594), including a fragile subgroup in which 77.5% met the multimorbidity definition. In holdout evaluation, XGBoost improved discrimination relative to logistic regression (ROC-AUC 0.766 vs 0.667), but showed worse probability accuracy and calibration (Brier 0.069 vs 0.059; expected calibration error 0.113 vs 0.037). Dominant drivers were age, PHQ-9 score, income-to-poverty ratio, race/ethnicity, education, and the adverse reproductive index. Conclusions: Principal components analysis and k-means phenotyping revealed that adverse reproductive life-course structure is strongly clustered with concurrent early multimorbidity in U.S. women aged 20-44 years. Although XGBoost improved discrimination, calibration and feature attribution remained essential for reliable translation into practice
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes NHANES 2017-March 2020 data for 1,602 U.S. women aged 20-44 years. It applies principal components analysis and k-means clustering to reproductive-history variables, chronic-condition indicators (hypertension, hypercholesterolemia, CVD, kidney disease, kidney stones), and PHQ-9 scores to derive four latent phenotypes. Multimorbidity is defined as ≥2 of the chronic conditions. A 'fragile' phenotype (n=102) shows 77.5% multimorbidity prevalence. XGBoost is compared to logistic regression for prediction, with SHAP for interpretability, reporting AUC 0.766 vs 0.667 but poorer calibration.
Significance. If the phenotypes are not artifacts of the feature selection, the work provides evidence linking adverse reproductive life-course factors to concurrent early multimorbidity, with potential for improved risk stratification in clinical practice. The emphasis on calibration and explainability is a strength, but the clustering approach requires validation to confirm the association is driven by reproductive variables rather than the outcome indicators themselves.
major comments (2)
- [Methods] Methods section: The feature set for PCA and k-means clustering includes the same chronic-condition indicators used to define multimorbidity (≥2 conditions). This setup makes the separation of a high-multimorbidity 'fragile' cluster (77.5% prevalence) expected whenever these indicators have variance, potentially rendering the reported association with reproductive structure partly definitional. No ablation removing chronic indicators or reporting of PC loadings and variable contributions within clusters is provided to demonstrate that reproductive variables drive the phenotypes.
- [Results] Results/Abstract: The manuscript reports concrete metrics (AUC 0.766 vs 0.667, Brier scores, calibration error) and phenotype prevalences, but lacks details on cross-validation strategy, missing-data handling, exact feature list, sensitivity to choice of k=4, or how the multimorbidity threshold affects clustering. These omissions limit assessment of robustness for the central phenotyping claim.
minor comments (2)
- [Abstract] Abstract conclusion: The phrasing that PCA and k-means 'revealed' the clustering overstates the discovery without supporting diagnostics such as loadings or ablation, given the feature overlap with the outcome definition.
- Consider adding a table of PC loadings or cluster centroids to allow readers to evaluate variable importance and assess whether reproductive factors, rather than chronic indicators, dominate the latent structure.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight key areas for improving the transparency and robustness of our phenotyping and predictive analyses. We address each major comment point by point below, indicating revisions where appropriate.
read point-by-point responses
-
Referee: [Methods] Methods section: The feature set for PCA and k-means clustering includes the same chronic-condition indicators used to define multimorbidity (≥2 conditions). This setup makes the separation of a high-multimorbidity 'fragile' cluster (77.5% prevalence) expected whenever these indicators have variance, potentially rendering the reported association with reproductive structure partly definitional. No ablation removing chronic indicators or reporting of PC loadings and variable contributions within clusters is provided to demonstrate that reproductive variables drive the phenotypes.
Authors: We acknowledge that this concern is valid: because the chronic-condition indicators are included in the feature set and also define the multimorbidity outcome, the high prevalence in the 'fragile' cluster is partly by construction. Our aim was to identify integrated latent structures linking reproductive history, chronic conditions, and depressive symptoms rather than to isolate reproductive effects. To address the referee's point directly, we will add an ablation analysis repeating PCA and k-means using only reproductive-history variables and PHQ-9 scores (excluding chronic indicators), report the resulting cluster-multimorbidity associations, and include principal component loadings plus per-variable contributions to cluster membership. These additions will appear in the revised Methods and Results sections. revision: yes
-
Referee: [Results] Results/Abstract: The manuscript reports concrete metrics (AUC 0.766 vs 0.667, Brier scores, calibration error) and phenotype prevalences, but lacks details on cross-validation strategy, missing-data handling, exact feature list, sensitivity to choice of k=4, or how the multimorbidity threshold affects clustering. These omissions limit assessment of robustness for the central phenotyping claim.
Authors: We agree that these methodological details are necessary for reproducibility and to evaluate robustness. In the revision we will expand the Methods section to specify the exact feature list, missing-data handling procedures, cross-validation strategy for the XGBoost and logistic regression models, sensitivity analyses for the choice of k (including silhouette scores and comparisons for k=3 to 6), and the effect of alternative multimorbidity thresholds on cluster stability. These additions will allow readers to assess the central phenotyping results more rigorously. revision: yes
Circularity Check
No significant circularity in the phenotyping or ML analysis
full rationale
The paper conducts unsupervised PCA and k-means on NHANES features that include both reproductive-history variables and chronic-condition indicators, then reports cluster compositions with respect to a post-hoc multimorbidity definition (>=2 chronic conditions). This is a descriptive grouping exercise on external public data using standard methods; the observed co-clustering of reproductive burden with multimorbidity is an empirical pattern in the data rather than a quantity forced by construction or by any fitted parameter. No equations, self-citations, or uniqueness theorems are invoked to derive the central claim. The analysis remains self-contained against the external benchmark data without tautological reduction of outputs to inputs.
Axiom & Free-Parameter Ledger
free parameters (2)
- Number of clusters k
- Multimorbidity definition threshold
axioms (2)
- domain assumption NHANES 2017-March 2020 sample is representative of U.S. women aged 20-44 for the variables analyzed
- domain assumption Adverse reproductive factors can be meaningfully summarized into a single index for clustering and attribution
invented entities (1)
-
Fragile subgroup phenotype
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:q902. doi:10.1136/bmj.q902
-
[2]
Moons KGM, Damen JAA, Kaul T, Hooft L, Andaur Navarro C, Dhiman P, et al. PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ. 2025;388:e082505. doi:10.1136/bmj-2024-082505
-
[3]
Evaluation of clinical prediction models (part 1): from development to external validation
Collins GS, Dhiman P, Ma J, Schlussel MM, Archer L, Van Calster B, et al. Evaluation of clinical prediction models (part 1): from development to external validation. BMJ. 2024;384:e074819. doi:10.1136/bmj-2023-074819
-
[4]
Developing clinical prediction models: a step -by-step guide
Efthimiou O, Seo M, Chalkou K, Debray TPA, Egger M, Salanti G. Developing clinical prediction models: a step -by-step guide. BMJ. 2024;386:e078276. doi:10.1136/bmj-2023- 078276
-
[5]
Uncertainty of risk estimates from clinical prediction models: rationale, challenges, and approaches
Riley RD, Collins GS, Kirton L, Snell KIE, Ensor J, Whittle R, et al. Uncertainty of risk estimates from clinical prediction models: rationale, challenges, and approaches. BMJ. 2025;388:e080749. doi:10.1136/bmj-2024-080749
-
[6]
URL https://bmcmedicine.biomedcentral
Van Calster B, McLernon DJ, Van Smeden M, Wynants L, Steyerberg EW, Bossuyt P, et al. Calibration: the Achilles heel of predictive analytics. BMC Med . 2019;17(1):230. doi:10.1186/s12916-019-1466-7
-
[7]
Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression Refereed (Peer-Reviewed) Conference Paper: CS016 LLM and Agent Applications II | 2026 Symposium on Data Science and Statistics | American Statistical Association for clinical prediction...
-
[8]
Reporting and interpreting decision curve analysis: a guide for investigators
Van Calster B, Wynants L, Verbeek JFM, Verbakel JY, Christodoulou E, Vickers AJ, et al. Reporting and interpreting decision curve analysis: a guide for investigators. Eur Urol . 2018;74(6):796-804. doi:10.1016/j.eururo.2018.08.038
-
[9]
Andrea Cristina McGlinchey and Peter J
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell . 2020;2:56-67. doi:10.1038/s42256-019-0138-9
-
[10]
Okoth K, Chandan JS, Marshall T, Thangaratinam S, Thomas GN, Nirantharakumar K, Adderley NJ. Association between the reproductive health of young women and cardiovascular disease in later life: umbrella review. BMJ. 2020;371:m3502. doi:10.1136/bmj.m3502
-
[11]
Parikh NI, Gonzalez JM, Anderson CAM, Judd SE, Rexrode KM, Hlatky MA, et al. Adverse pregnancy outcomes and cardiovascular disease risk: unique opportunities for cardiovascular disease prevention in women: a scientific statement from the American Heart Association. Circulation. 2021;143(18):e902-e916. doi:10.1161/CIR.0000000000000961
-
[12]
Incidence and long -term outcomes of hypertensive disorders of pregnancy
Garovic VD, White WM, Vaughan L, Saiki M, Parashuram S, Garcia -Valencia O, et al. Incidence and long -term outcomes of hypertensive disorders of pregnancy. J Am Coll Cardiol. 2020;75(18):2323-2334. doi:10.1016/j.jacc.2020.03.028
-
[13]
Pregnancy and reproductive risk factors for cardiovascular disease in women
O’Kelly AC, Michos ED, Shufelt CL, Vermunt JV, Minissian MB, Quesada O, et al. Pregnancy and reproductive risk factors for cardiovascular disease in women. Circ Res . 2022;130(4):652-672. doi:10.1161/CIRCRESAHA.121.319895
-
[14]
Pregnancy complications and later life women’s health
McNestry C, Killeen SL, Crowley RK, McAuliffe FM. Pregnancy complications and later life women’s health. Acta Obstet Gynecol Scand. 2023;102(5):523-531. doi:10.1111/aogs.14523
-
[15]
Quenby S, Gallos ID, Dhillon -Smith RK, Podesek M, Stephenson MD, Fisher J, et al. Miscarriage matters: the epidemiological, physical, psychological and economic burden of early pregnancy loss. Lancet. 2021;397(10285):1658-1667. doi:10.1016/S0140-6736(21)00682- 6
-
[16]
Stierman B, Afful J, Carroll MD, Chen TC, Davy O, Fink S, et al. National Health and Nutrition Examination Survey 2017 –March 2020 prepandemic data files —development of files and prevalence estimates for selected health outcomes. Natl Health Stat Report. 2021;(158). doi:10.15620/cdc:106273
-
[17]
Akinbami LJ, Chen TC, Davy O, Ogden CL, Fink S, Clark J, et al. National Health and Nutrition Examination Survey, 2017 –March 2020 prepandemic file: sample design, estimation, and analytic guidelines. Vital Health Stat 2. 2022;(190):1 -36. doi:10.15620/cdc:115434
-
[18]
XGBoost: A Scalable Tree Boosting System
Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY: ACM; 2016:785-794. doi:10.1145/2939672.2939785
-
[19]
Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53-65. doi:10.1016/0377-0427(87)90125-7. Refereed (Peer-Reviewed) Conference Paper: CS016 LLM and Agent Applications II | 2026 Symposium on Data Science and Statistics | American Statistical Association
-
[20]
Estimating the number of clusters in a data set via the gap statistic
Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Series B Stat Methodol. 2001;63(2):411 -423. doi:10.1111/1467 - 9868.00293
-
[21]
Cluster -wise assessment of cluster stability
Hennig C. Cluster -wise assessment of cluster stability. Comput Stat Data Anal. 2007;52(1):258-271. doi:10.1016/j.csda.2006.11.025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.