Predicting Fetal Birthweight from High Dimensional Data using Advanced Machine Learning
Pith reviewed 2026-05-23 01:49 UTC · model grok-4.3
The pith
Tree-based feature selection paired with ensemble regression models improves fetal birth weight prediction by identifying key predictors and modeling complex interactions in high-dimensional clinical data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Among the methodologies explored, tree-based feature selection methods demonstrated superior capability in identifying the most relevant predictors, while ensemble-based regression models proved highly effective in capturing non-linear relationships and complex maternal-fetal interactions within the data. The study shows that integrating advanced imputation strategies with these selection and modeling steps strengthens predictive performance even when the dataset is limited, and it highlights the clinical significance of the resulting physiological determinants for maternal and fetal health.
What carries the argument
Tree-based feature selection combined with ensemble regression models, which rank predictors and capture non-linear maternal-fetal interactions after imputation of missing values.
If this is right
- Key maternal and fetal physiological factors become more clearly ranked for clinical attention.
- Risk assessment in perinatal care can incorporate non-linear interaction terms that linear models miss.
- Data-driven decisions in maternal and neonatal settings gain accuracy from the identified predictors.
- Preprocessing steps gain explicit priority when datasets contain missing entries or high dimensionality.
Where Pith is reading between the lines
- The same pipeline could be tested on other perinatal outcomes such as preterm delivery risk to check transferability.
- External validation across diverse populations would be needed before routine clinical deployment.
- The identified physiological determinants could be checked against known medical literature for mechanistic plausibility.
Load-bearing premise
The assumption that imputation and supervised feature selection on this particular constrained dataset produce reliable predictions without bias from missing-data patterns or overfitting to the sampled clinical population.
What would settle it
A drop in predictive performance when the same pipeline is applied to an independent birth-weight dataset collected from a different hospital or geographic region.
Figures
read the original abstract
Birth weight serves as a fundamental indicator of neonatal health, closely linked to both early medical interventions and long-term developmental risks. Traditional predictive models, often constrained by limited feature selection and incomplete datasets, struggle to achieve overlooking complex maternal and fetal interactions in diverse clinical settings. This research explores machine learning to address these limitations, utilizing a structured methodology that integrates advanced imputation strategies, supervised feature selection techniques, and predictive modeling. Given the constraints of the dataset, the research strengthens the role of data preprocessing in improving the model performance. Among the various methodologies explored, tree-based feature selection methods demonstrated superior capability in identifying the most relevant predictors, while ensemble-based regression models proved highly effective in capturing non-linear relationships and complex maternal-fetal interactions within the data. Beyond model performance, the study highlights the clinical significance of key physiological determinants, offering insights into maternal and fetal health factors that influence birth weight, offering insights that extend over statistical modeling. By bridging computational intelligence with perinatal research, this work underscores the transformative role of machine learning in enhancing predictive accuracy, refining risk assessment and informing data-driven decision-making in maternal and neonatal care. Keywords: Birth weight prediction, maternal-fetal health, MICE, BART, Gradient Boosting, neonatal outcomes, Clinipredictive.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes an empirical machine learning study for predicting fetal birthweight from high-dimensional maternal-fetal data. It integrates MICE imputation, supervised feature selection (asserting superiority of tree-based methods), and ensemble regression models (BART, Gradient Boosting) to capture non-linear interactions, while stressing preprocessing under dataset constraints and noting clinical insights into physiological determinants.
Significance. Birthweight prediction has established clinical relevance for neonatal risk assessment. If the superiority claims were backed by quantitative evidence, cross-validation, and external validation, the work could inform perinatal modeling. As presented, however, the absence of any performance numbers, baselines, or validation details leaves the significance indeterminate.
major comments (4)
- [Abstract] Abstract: the claim that 'tree-based feature selection methods demonstrated superior capability in identifying the most relevant predictors' supplies no supporting metrics, importance rankings, ablation results, or comparisons to alternative selectors such as LASSO or mutual information.
- [Abstract] Abstract: the assertion that 'ensemble-based regression models proved highly effective in capturing non-linear relationships' is unsupported by any reported error metrics (RMSE, MAE, R²), statistical significance tests, or comparisons against linear baselines or single models.
- [Abstract] Abstract: no dataset description (sample size, feature dimensionality, missingness mechanism), cross-validation scheme, or external cohort is mentioned, so the generalizability claim cannot be evaluated and the risk of population-specific overfitting remains unaddressed.
- [Abstract] Abstract: the statement that the work 'strengthens the role of data preprocessing in improving the model performance' is not accompanied by any before/after performance deltas or ablation study.
minor comments (2)
- [Abstract] Abstract, sentence 2: 'struggle to achieve overlooking complex' is grammatically unclear; rephrase for readability.
- [Keywords] Keywords: 'Clinipredictive' is listed without definition or prior mention in the text.
Simulated Author's Rebuttal
We thank the referee for the detailed feedback on the abstract. We agree that several claims require supporting quantitative details to be properly evaluated and will revise the abstract to include key metrics, dataset characteristics, and validation information drawn from the main text. We address each comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'tree-based feature selection methods demonstrated superior capability in identifying the most relevant predictors' supplies no supporting metrics, importance rankings, ablation results, or comparisons to alternative selectors such as LASSO or mutual information.
Authors: We acknowledge the abstract lacks these specifics. The main text reports feature importance rankings from tree-based selection, ablation comparisons showing improved downstream model performance versus LASSO and mutual information, and the selected predictor sets. The revised abstract will summarize the top-ranked predictors and note the performance lift from tree-based selection. revision: yes
-
Referee: [Abstract] Abstract: the assertion that 'ensemble-based regression models proved highly effective in capturing non-linear relationships' is unsupported by any reported error metrics (RMSE, MAE, R²), statistical significance tests, or comparisons against linear baselines or single models.
Authors: The abstract will be updated to report the key error metrics (RMSE, MAE, R²) achieved by the BART and Gradient Boosting ensembles, along with comparisons to linear regression and single-tree baselines, and note the statistical tests used in the results section. revision: yes
-
Referee: [Abstract] Abstract: no dataset description (sample size, feature dimensionality, missingness mechanism), cross-validation scheme, or external cohort is mentioned, so the generalizability claim cannot be evaluated and the risk of population-specific overfitting remains unaddressed.
Authors: We will add a concise dataset description (sample size, dimensionality, missingness) and the 5-fold cross-validation scheme to the abstract. External validation on an independent cohort was not performed due to data-access limitations; this will be explicitly stated as a limitation rather than claiming broad generalizability. revision: yes
-
Referee: [Abstract] Abstract: the statement that the work 'strengthens the role of data preprocessing in improving the model performance' is not accompanied by any before/after performance deltas or ablation study.
Authors: The revised abstract will include before/after performance deltas from the MICE imputation and feature-selection ablations reported in the main text, quantifying the improvement attributable to preprocessing steps. revision: yes
- External validation on an independent cohort was not conducted in the study and cannot be added without new data access.
Circularity Check
No circularity: purely empirical ML modeling with no derivation chain
full rationale
The paper describes an applied machine-learning workflow (MICE imputation, tree-based feature selection, BART/Gradient Boosting ensembles) on a single clinical dataset. No mathematical derivation, first-principles claim, or predictive equation is presented that could reduce to its own inputs by construction. All statements concern observed performance on the given data; there are no self-definitional loops, fitted-parameter-as-prediction artifacts, or load-bearing self-citations. The work is therefore self-contained as standard empirical modeling and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Parental Imprints On Birth Weight: A Data-Driven Model For Neonatal Prediction In Low Resource Prenatal Care
Machine learning framework predicts fetal birth weight using parental factors in low-resource settings.
Reference graph
Works this paper leans on
-
[1]
W. H. Organization, “Low birth weight,” 2023, accessed: 2024-01-28. [Online]. Available: https://www.who.int/data/nutrition/nlis/info/low-birth-weight
work page 2023
-
[2]
Z. Liu, N. Han, T. Su, Y. Ji, H. Bao, S. Zhou, S. Luo, H. Wang, J. Liu, and H. Wang, “Interpretable machine learning to identify important predictors of birth weight: A prospective cohort study,” Frontiers in Pediatrics , vol. 10, p. 899954, 2022. [Online]. Available: https://doi.org/10.3389/fped.2022.899954
-
[3]
W. Khan, N. Zaki, M. Masud et al. , “Infant birth weight estimation and low birth weight classification in united arab emirates using machine learning algorithms,” Scientific Reports , vol. 12, p. 12110, 2022. [Online]. Available: https://doi.org/10.1038/s41598-022-14393-6
-
[4]
Postnatal glucose homeostasis in late-preterm and term infants,
D. Adamkin, C. on Fetus, and Newborn, “Postnatal glucose homeostasis in late-preterm and term infants,” Pediatrics, vol. 127, no. 3, pp. 575–579, 2011. [Online]. Available: https://doi.org/10.1542/peds.2010-3851
-
[5]
Prediction of weight range of neonate using machine learning approach,
S. Adeeba, B. Kuhaneswaran, and B. Kumara, “Prediction of weight range of neonate using machine learning approach,” in 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT) , February 2022, pp. 427–432
work page 2022
-
[6]
Machine learning algorithms for predicting low birth weight in ethiopia,
W. T. Bekele, “Machine learning algorithms for predicting low birth weight in ethiopia,” BMC Medical Informatics and Decision Making , vol. 22, no. 1, p. 232, 2022. [Online]. Available: https://doi.org/10.1186/s12911-022-01981-9
-
[7]
T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . New York, NY, USA: Association for Computing Machinery, 2016, p. 785–794. [Online]. Available: https://doi.org/10.1145/2939672.2939785
-
[8]
Handling missing data in longi- tudinal anthropometric data using multiple imputation method,
D. Varma, C. S. Yajnik, A. Thorave, and N. Sharma, “Handling missing data in longi- tudinal anthropometric data using multiple imputation method,” in Data Management, Analytics and Innovation , N. Sharma, A. C. Goje, A. Chakrabarti, and A. M. Bruck- stein, Eds. Singapore: Springer Nature Singapore, 2024, pp. 273–287
work page 2024
-
[9]
On the performance of imputation techniques for missing values on healthcare datasets,
L. O. Joel, W. Doorsamy, and B. S. Paul, “On the performance of imputation techniques for missing values on healthcare datasets,” 2024. [Online]. Available: https://arxiv.org/abs/2403.14687 18
-
[10]
Evaluating the state of the art in missing data imputation for clinical data,
Y. Luo, “Evaluating the state of the art in missing data imputation for clinical data,” Briefings in Bioinformatics , vol. 22, no. 6, p. bbab489, 2021. [Online]. Available: https://doi.org/10.1093/bib/bbab489
-
[11]
R. Gaillard, E. A. P. Steegers, A. Hofman, and V. W. V. Jaddoe, “Associations of maternal obesity with blood pressure and the risks of gestational hypertensive disorders. the generation r study,” Hypertension, vol. 57, no. 5, pp. 1068–1074, 2011. [Online]. Available: https://doi.org/10.1097/HJH.0b013e328345500c
-
[12]
N. D’souza, R. V. Behere, B. Patni, M. Deshpande, D. Bhat, A. Bhalerao, S. Sonawane, R. Shah, R. Ladkat, P. Yajnik, S. K. Bandyopadhyay, K. Kumaran, C. Fall, and C. S. Yajnik, “Corrigendum: Pre-conceptional maternal vitamin b12 supplementation improves offspring neurodevelopment at 2 years of age: Priya trial,” Frontiers in pediatrics , vol. 10, p. 860732...
work page 2022
-
[13]
E. Liu, P. X. Lin, Q. Wang, and K. C. Feng, “Feature selection approaches for newborn birthweight prediction in multiple linear regression models,” 2024. [Online]. Available: https://arxiv.org/abs/2411.11167
-
[14]
A review of feature selection methods for machine learning-based disease risk prediction,
N. Pudjihartono, T. Fadason, A. W. Kempa-Liehr, and J. M. O’Sullivan, “A review of feature selection methods for machine learning-based disease risk prediction,” Frontiers in Bioinformatics , vol. 2, 2022. [Online]. Available: https: //www.frontiersin.org/journals/bioinformatics/articles/10.3389/fbinf.2022.927312
-
[15]
Prediction and feature selection of low birth weight using machine learning algorithms,
T. Reza and N. Salma, “Prediction and feature selection of low birth weight using machine learning algorithms,” Journal of Health, Population and Nutrition , vol. 43, p. 157, 2024. [Online]. Available: https://doi.org/10.1186/s41043-024-00647-8
-
[16]
Fetal birth weight estimation in high-risk pregnancies through machine learning techniques,
M. Moreira, J. Rodrigues, V. Furtado, C. Mavromoustakis, N. Kumar, and I. Woun- gang, “Fetal birth weight estimation in high-risk pregnancies through machine learning techniques,” in 2019 IEEE International Conference on Communications, ICC 2019 - Proceedings, ser. IEEE International Conference on Communications. Institute of Elec- trical and Electronics ...
work page 2019
-
[17]
Feature Selection via Mutual Information: New Theoretical Insights
M. Beraha, A. M. Metelli, M. Papini, A. Tirinzoni, and M. Restelli, “Feature selection via mutual information: New theoretical insights,” 2019. [Online]. Available: https://arxiv.org/abs/1907.07384
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[18]
Kendall transformation brings a robust categorical representation of ordinal data,
M. Kursa, “Kendall transformation brings a robust categorical representation of ordinal data,” Scientific Reports , vol. 12, p. 8341, 2022. [Online]. Available: https://doi.org/10.1038/s41598-022-12224-2
-
[19]
An introduction of variable and feature selection,
I. Guyon and A. Elisseeff, “An introduction of variable and feature selection,” J. Ma- chine Learning Research Special Issue on Variable and Feature Selection , vol. 3, pp. 1157 – 1182, 01 2003. 19
work page 2003
-
[20]
S. Li, Q. Yang, S. Niu, and Y. Liu, “Effectiveness of remote fetal monitoring on maternal-fetal outcomes: Systematic review and meta-analysis,” JMIR mHealth and uHealth, vol. 11, p. e41508, 2023. [Online]. Available: https://doi.org/10.2196/41508
-
[21]
Z. Hussain and M. Borah, “Birth weight prediction of new born baby with applica- tion of machine learning techniques on features of mother,” Journal of Statistics and Management Systems, 09 2020
work page 2020
-
[22]
Sice: an improved missing data imputation technique,
S. Khan and A. Hoque, “Sice: an improved missing data imputation technique,” Journal of Big Data , vol. 7, no. 37, 2020. [Online]. Available: https: //doi.org/10.1186/s40537-020-00313-w
-
[23]
A comprehensive review of feature selection and feature selection stability in machine learning,
Mustafa and M. Okur, “A comprehensive review of feature selection and feature selection stability in machine learning,” Gazi University Journal of Science , vol. 36,
-
[24]
Available: https://doi.org/10.35378/gujs.993763
[Online]. Available: https://doi.org/10.35378/gujs.993763
-
[25]
M. M. Amin, A. Zainal, N. F. Mohd. Azmi, and N. A. Ali, “Feature selection using multivariate adaptive regression splines in telecommunication fraud detection,” IOP Conference Series: Materials Science and Engineering , vol. 864, p. 012059, 2020, published under licence by IOP Publishing Ltd, 2nd Joint Conference on Green Engineering Technology & Applied ...
-
[26]
Variable selection using bayesian additive regression trees,
C. Luo and M. J. Daniels, “Variable selection using bayesian additive regression trees,” arXiv preprint arXiv:2112.13998, 2021, 40 pages, 13 figures. [Online]. Available: https://doi.org/10.48550/arXiv.2112.13998
-
[27]
Feature selection with the r package mxm [version 2; peer review: 2 approved],
M. Tsagris and I. Tsamardinos, “Feature selection with the r package mxm [version 2; peer review: 2 approved],” F1000Research, vol. 7, 2019. [Online]. Available: https://doi.org/10.12688/f1000research.16216.2
-
[28]
Hopular: Modern hopfield networks for tabular data,
B. Sch¨ afl, L. Gruber, A. Bitto-Nemling, and S. Hochreiter, “Hopular: Modern hopfield networks for tabular data,” 2022. [Online]. Available: https://openreview.net/forum? id=3zJVXU311-Q
work page 2022
-
[29]
Sex differences in fetal growth responses to maternal height and weight,
M. Lampl, F. Gotsch, J. P. Kusanovic, R. Gomez, J. K. Nien, E. A. Frongillo, and R. Romero, “Sex differences in fetal growth responses to maternal height and weight,” American Journal of Human Biology: The Official Journal of the Human Biology Council , vol. 22, no. 4, pp. 431–443, 2010. [Online]. Available: https://doi.org/10.1002/ajhb.21014
-
[30]
Prediction and classification of low birth weight data using machine learning techniques,
A. Faruk, E. Cahyono, and I. Ijost, “Prediction and classification of low birth weight data using machine learning techniques,”Indonesian Journal of Science and Technology, vol. 3, pp. 18–28, 04 2018
work page 2018
-
[31]
Machine learning-based classifiers for the prediction of low birth weight,
M. Arayeshgari, S. Najafi-Ghobadi, H. Tarhsaz, S. Parami, and L. Tapak, “Machine learning-based classifiers for the prediction of low birth weight,” Healthcare Informatics Research , vol. 29, no. 1, pp. 54–63, 2023. [Online]. Available: http://e-hir.org/journal/view.php?number=1148 20
work page 2023
-
[32]
Accurate prediction of term birth weight from prospectively measurable maternal characteristics,
G. G. Nahum, H. Stanislaw, and B. Huffaker, “Accurate prediction of term birth weight from prospectively measurable maternal characteristics,” Primary Care Update for OB/GYNS , vol. 5, no. 4, pp. 193–194, 1998. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1068607X98001218
work page 1998
-
[33]
L. J. Salomon, J. P. Bernard, and Y. Ville, “Estimation of fetal weight: reference range at 20–36 weeks’ gestation and comparison with actual birth-weight reference range,” Ultrasound in Obstetrics Gynecology, vol. 29, no. 5, pp. 550–555, 2007
work page 2007
-
[34]
Birthweight range prediction and classification: A machine learning-based sustainable approach,
D. A. Alabbad, S. Y. Ajibi, R. B. Alotaibi, N. K. Alsqer, R. A. Alqahtani, N. M. Felem- ban, A. Rahman, S. S. Aljameel, M. I. B. Ahmed, and M. M. Youldash, “Birthweight range prediction and classification: A machine learning-based sustainable approach,” Machine Learning and Knowledge Extraction , vol. 6, no. 2, pp. 770–788, 2024. 21
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.