Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods: A Retrospective Cohort Study
Pith reviewed 2026-05-18 04:21 UTC · model grok-4.3
The pith
A LASSO logistic regression model using ten routine EHR features predicts MASLD with an AUROC of 0.84 before fairness adjustments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors developed the MASER model, a LASSO logistic regression trained on the top 10 features from a large retrospective EHR cohort of over 100,000 participants split into training, validation, and testing sets. Before fairness adjustment, it achieved an AUROC of 0.84, accuracy of 78%, sensitivity of 72%, and specificity of 79%. After equal opportunity postprocessing to equalize true positive rates across subgroups, performance shifted to 81% accuracy, 94% specificity, and 41% sensitivity. The model relies on routinely collected clinical features in a diverse population and is intended to support early MASLD detection in primary care.
What carries the argument
The MASER prediction model, consisting of LASSO logistic regression on the top 10 ranked EHR features combined with equal opportunity postprocessing to enforce fairness across racial and ethnic groups.
If this is right
- Supports early detection of MASLD in primary care using only routinely collected data.
- Demonstrates that fairness adjustments can be applied to improve equity in prediction models at a modest cost to sensitivity.
- Achieves performance comparable to more complex models while remaining interpretable.
- Designed for potential integration into existing primary care workflows pending further validation.
Where Pith is reading between the lines
- Prospective validation in real-time clinical settings would be needed to confirm the model's utility in changing patient outcomes.
- The sensitivity-specificity trade-off suggests it may work best as a rule-out tool or in combination with other screening methods.
- Similar fairness-aware modeling could be applied to prediction of other chronic conditions using EHR data.
- The top 10 features identified could inform which routine checks are most valuable for liver health monitoring.
Load-bearing premise
The electronic health record data used for training and testing correctly labels MASLD cases and includes all key predictors without significant bias or missing information that would alter the feature rankings or performance.
What would settle it
A follow-up study that applies the model to a new, independent cohort of patients and compares its predictions against gold-standard MASLD diagnoses obtained through imaging or biopsy to verify if the reported AUROC and post-adjustment metrics hold.
Figures
read the original abstract
Background: Metabolic dysfunction-associated steatotic liver disease (MASLD) affects 30-40% of US adults and is the most common chronic liver disease. Although often asymptomatic, progression can lead to cirrhosis. The objective of the study was to develop and evaluate an electronic health record (EHR) based prediction model to support early detection of MASLD in primary care settings. Methods: We evaluated LASSO logistic regression, random forest, XGBoost, and a neural network model for MASLD prediction using clinical feature subsets from a large EHR database, including the top 10 ranked features. To reduce disparities in true positive rates across racial and ethnic subgroups, we applied an equal opportunity postprocessing method in a prediction model called MASLD EHR Static Risk Prediction (MASER). Results: This retrospective cohort study included 59,492 participants in the training data, 24,198 in the validating data, and 25,188 in the testing data. The LASSO logistic regression model with the top 10 features was selected for its interpretability and comparable performance. Before fairness adjustment, the model achieved AUROC of 0.84, accuracy of 78%, sensitivity of 72%, specificity of 79%, and F1-score of 0.617. After equal opportunity postprocessing, accuracy modestly increased to 81% and specificity to 94%, while sensitivity decreased to 41% and F1-score to 0.515, reflecting the fairness trade-off. Conclusions: MASER achieved competitive performance for MASLD prediction, comparable to previously reported ensemble and tree-based models, while using a limited and routinely collected feature set and a diverse study population. The model is designed to support early detection and potential integration into primary care workflows. MASER demonstrates EHR-ready MASLD prediction with fairness adjustments, supporting future primary care implementation pending prospective validation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to develop and evaluate several ML models (LASSO logistic regression, random forest, XGBoost, neural network) for predicting MASLD from EHR data in a retrospective cohort of ~109k patients split into 59k/24k/25k train/val/test sets. The LASSO model using the top-10 features is highlighted for interpretability, reporting AUROC 0.84, accuracy 78%, sensitivity 72%, specificity 79% before fairness adjustment; after equal-opportunity postprocessing the accuracy rises to 81% and specificity to 94% while sensitivity falls to 41%. The resulting MASER model is positioned as suitable for early primary-care detection with fairness considerations across racial/ethnic groups.
Significance. If the outcome labels prove reliable, the work supplies an interpretable, limited-feature EHR model that explicitly trades off performance for equal-opportunity fairness, which is a concrete contribution for deployment-oriented MASLD screening. The large, diverse cohort and before/after fairness metrics are strengths that would support clinical translation once label validity is demonstrated.
major comments (3)
- [Methods] Methods (case definition paragraph): The binary MASLD outcome is never given an explicit algorithm (ICD-10 codes, ALT/AST thresholds, FibroScan mention, or billing diagnosis combination). Because retrospective EHR proxies typically achieve only 30-60% sensitivity versus imaging or biopsy, the reported AUROC of 0.84, the top-10 feature ranking, and the post-adjustment sensitivity drop to 41% could all be artifacts of label noise rather than true signal; a chart-review validation subset or sensitivity analysis on label thresholds is required.
- [Methods] Methods (feature selection): The description of how the top-10 features were ranked is ambiguous; if ranking used the full cohort or validation/test data rather than training data alone, leakage would inflate the held-out AUROC and undermine the claim that the model generalizes.
- [Results] Results (fairness paragraph): The equal-opportunity postprocessing produces a clinically large sensitivity reduction (72% → 41%) while specificity rises to 94%; the manuscript must quantify the net clinical utility (e.g., number of missed cases per 1000 screened) and discuss whether this trade-off still supports “early detection” use.
minor comments (2)
- [Abstract] Abstract: the phrase “top 10 ranked features” should specify the ranking criterion (LASSO coefficient magnitude, permutation importance, etc.).
- [Results] Table/figure captions: ensure all performance metrics are accompanied by 95% CIs or standard errors on the test set.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We address each major point below and have revised the manuscript to improve clarity and address concerns where feasible.
read point-by-point responses
-
Referee: [Methods] Methods (case definition paragraph): The binary MASLD outcome is never given an explicit algorithm (ICD-10 codes, ALT/AST thresholds, FibroScan mention, or billing diagnosis combination). Because retrospective EHR proxies typically achieve only 30-60% sensitivity versus imaging or biopsy, the reported AUROC of 0.84, the top-10 feature ranking, and the post-adjustment sensitivity drop to 41% could all be artifacts of label noise rather than true signal; a chart-review validation subset or sensitivity analysis on label thresholds is required.
Authors: We agree that the case definition paragraph requires greater explicitness. In the revised manuscript we have added the precise combination of ICD-10 codes and laboratory value thresholds used to define the binary MASLD outcome. We have also expanded the limitations section to discuss the known performance characteristics of EHR-based proxies. A full chart-review validation subset was not available within the scope of this retrospective study; however, we have added a sensitivity analysis that varies the laboratory thresholds to assess the stability of model performance and feature rankings. revision: partial
-
Referee: [Methods] Methods (feature selection): The description of how the top-10 features were ranked is ambiguous; if ranking used the full cohort or validation/test data rather than training data alone, leakage would inflate the held-out AUROC and undermine the claim that the model generalizes.
Authors: We appreciate the referee highlighting this potential ambiguity. Feature ranking for the top-10 features was performed exclusively on the training set (n=59,492) using LASSO coefficients prior to any evaluation on validation or test data. We have revised the methods section to state this explicitly and to confirm that no information from the held-out sets influenced feature selection or ranking. revision: yes
-
Referee: [Results] Results (fairness paragraph): The equal-opportunity postprocessing produces a clinically large sensitivity reduction (72% → 41%) while specificity rises to 94%; the manuscript must quantify the net clinical utility (e.g., number of missed cases per 1000 screened) and discuss whether this trade-off still supports “early detection” use.
Authors: We agree that the clinical consequences of the observed sensitivity reduction merit explicit quantification. In the revised results and discussion we have added calculations of net clinical utility, including the approximate number of additional missed cases per 1,000 patients screened under the equal-opportunity postprocessed model. We also discuss the implications for primary-care early detection, noting that the substantial gain in specificity may reduce over-referral while the fairness adjustment improves equity, although the lower sensitivity remains a limitation for comprehensive case finding. revision: yes
- Chart-review validation of the MASLD outcome labels
Circularity Check
No significant circularity in the empirical ML pipeline
full rationale
The paper describes a standard supervised learning workflow: LASSO logistic regression (and comparators) are fit on a 59k training split, top-10 features are ranked from that fit, and all reported metrics (AUROC 0.84, accuracy, sensitivity, etc.) are computed on a fully held-out 25k test split. Equal-opportunity post-processing is then applied to the test predictions to produce the adjusted metrics. None of these steps reduces the final performance numbers to a fitted parameter by construction, nor does the text invoke self-citations, uniqueness theorems, or ansatzes that would make the claimed results tautological. The derivation chain is therefore self-contained empirical evaluation rather than circular.
Axiom & Free-Parameter Ledger
free parameters (2)
- Top-10 feature ranking threshold
- Equal-opportunity fairness threshold
axioms (2)
- domain assumption EHR diagnostic codes and lab values provide reliable ground-truth labels for MASLD without substantial misclassification or missing data.
- domain assumption The study population and feature distributions are representative of primary-care patients who would receive the model in practice.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Inclusion criteria... ICD-10-CM K76.0 or K75.81... propensity score matching on sex and age
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Epidemiology of metabolic dysfunction-associated steatotic liver disease
Younossi ZM, Kalligeros M, Henry L. Epidemiology of metabolic dysfunction-associated steatotic liver disease. Clin Mol Hepatol. 2025;31(Suppl):S32-S50. doi:10.3350/cmh.2024.0431
-
[2]
Current status and future trends of the global burden of MASLD
Miao L, Targher G, Byrne C, Cao YY, Zheng MH. Current status and future trends of the global burden of MASLD. Trends in Endocrinology & Metabolism. 2024;35(8):697-707
work page 2024
-
[3]
Metabolic Dysfunction–Associated Steatotic Liver Disease— The New Epidemic of Chronic Liver Disease
Ginès P, Serra-Burriel M, Kamath PS. Metabolic Dysfunction–Associated Steatotic Liver Disease— The New Epidemic of Chronic Liver Disease. JAMA Netw Open. 2025;8(6):e2516381. doi:10.1001/jamanetworkopen.2025.16381
-
[4]
Leoni S, Tovoli F, Napoli L, Serio I, Ferri S, Bolondi L. Current guidelines for the management of non-alcoholic fatty liver disease: A systematic review with comparative analysis. World J Gastroenterol. 2018;24(30):3361-3373. doi:10.3748/wjg.v24.i30.3361
-
[5]
Non-alcoholic fatty liver disease: An expanded review
Benedict M, Zhang X. Non-alcoholic fatty liver disease: An expanded review. World J Hepatol. 2017;9(16):715. doi:10.4254/wjh.v9.i16.715
-
[6]
Imaging evaluation of non-alcoholic fatty liver disease: focused on quantification
Lee DH. Imaging evaluation of non-alcoholic fatty liver disease: focused on quantification. Clin Mol Hepatol. 2017;23(4):290-301. doi:10.3350/cmh.2017.0042
-
[7]
Metabolomics and lipidomics in NAFLD: biomarkers and non-invasive diagnostic tests
Masoodi M, Gastaldelli A, Hyötyläinen T, et al. Metabolomics and lipidomics in NAFLD: biomarkers and non-invasive diagnostic tests. Nat Rev Gastroenterol Hepatol. 2021;18(12):835-
work page 2021
-
[8]
doi:10.1038/s41575-021-00502-9
-
[9]
Appendix N, Cost-Effectiveness Analysis: Diagnostic Tests for NAFLD and Advanced Fibrosis
National Guideline Centre (UK). Appendix N, Cost-Effectiveness Analysis: Diagnostic Tests for NAFLD and Advanced Fibrosis. In: Non-Alcoholic Fatty Liver Disease Disease: Assessment and Management. London: National Institute for Health and Care Excellence (NICE); 2016
work page 2016
-
[10]
Bedogni G, Miglioli L, Masutti F, Tiribelli C, Marchesini G, Bellentani S. Prevalence of and risk factors for nonalcoholic fatty liver disease: The Dionysos nutrition and liver study. Hepatology. 2005;42(1):44-52. doi:10.1002/hep.20734
-
[11]
Bedogni G, Bellentani S, Miglioli L, et al. The Fatty Liver Index: a simple and accurate predictor of hepatic steatosis in the general population. BMC Gastroenterol. 2006;6(1):33. doi:10.1186/1471- 230X-6-33
-
[12]
Hepatic steatosis index: A simple screening tool reflecting nonalcoholic fatty liver disease
Lee JH, Kim D, Kim HJ, et al. Hepatic steatosis index: A simple screening tool reflecting nonalcoholic fatty liver disease. Digestive and Liver Disease. 2010;42(7):503-508. doi:10.1016/j.dld.2009.08.002
-
[13]
ZJU index: a novel model for predicting nonalcoholic fatty liver disease in a Chinese population
Wang J, Xu C, Xun Y, et al. ZJU index: a novel model for predicting nonalcoholic fatty liver disease in a Chinese population. Sci Rep. 2015;5(1):16494. doi:10.1038/srep16494
-
[14]
Measurement error of waist circumference: gaps in knowledge
Verweij LM, Terwee CB, Proper KI, Hulshof CTJ, van Mechelen W. Measurement error of waist circumference: gaps in knowledge. Public Health Nutr. 2013;16(2):281-288. doi:10.1017/S1368980012002741 24
-
[15]
Ross R, Neeland IJ, Yamashita S, et al. Waist circumference as a vital sign in clinical practice: a Consensus Statement from the IAS and ICCR Working Group on Visceral Obesity. Nat Rev Endocrinol. 2020;16(3):177-189. doi:10.1038/s41574-019-0310-7
-
[16]
What Do We Know about Inequalities in NAFLD Distribution and Outcomes? A Scoping Review
Talens M, Tumas N, Lazarus J V, Benach J, Pericàs JM. What Do We Know about Inequalities in NAFLD Distribution and Outcomes? A Scoping Review. J Clin Med. 2021;10(21). doi:10.3390/jcm10215019
-
[17]
Rich NE, Oji S, Mufti AR, et al. Racial and Ethnic Disparities in Nonalcoholic Fatty Liver Disease Prevalence, Severity, and Outcomes in the United States: A Systematic Review and Meta-analysis. Clin Gastroenterol Hepatol. 2018;16(2):198-210.e2. doi:10.1016/j.cgh.2017.09.041
-
[18]
Penn State Clinical and Translational Science Institute
Citing the Institute. Penn State Clinical and Translational Science Institute. Accessed August 14,
-
[19]
https://ctsi.psu.edu/citing-ctsi/
-
[20]
Cleveland Clinic Medical Professional. Liver Disease. Cleveland Clinic. October 4, 2023. Accessed November 15, 2023. https://my.clevelandclinic.org/health/diseases/17179-liver-disease
work page 2023
-
[21]
Mayo Clinic Staff. Liver Disease. Mayo Clinic. June 21, 2023. Accessed November 15, 2023. https://www.mayoclinic.org/diseases-conditions/liver-problems/symptoms-causes/syc- 20374502
work page 2023
-
[22]
Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning
Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine Learning Research. 2017;18(17):1-
work page 2017
-
[23]
http://jmlr.org/papers/v18/16-365
Accessed October 9, 2025. http://jmlr.org/papers/v18/16-365
work page 2025
-
[24]
A unified approach to interpreting model predictions
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: von Luxburg U, Guyon I, eds. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates Inc.; 2017:4768-4777
work page 2017
-
[25]
Receiver operating characteristic curve analysis in diagnostic accuracy studies
Çorbacıoğlu ŞK, Aksel G. Receiver operating characteristic curve analysis in diagnostic accuracy studies. Turk J Emerg Med. 2023;23(4):195-198. doi:10.4103/tjem.tjem_182_23
-
[26]
Evaluating Diagnostic and Screening Tests
Poorolajal J. Evaluating Diagnostic and Screening Tests. In: Illustrated Epidemiology. Springer Nature Singapore; 2025:11-22. doi:10.1007/978-981-96-9566-9_2
-
[27]
Fairlearn: Assessing and Improving Fairness of AI Systems
Weerts H, Dudik M, Edgar R, Jalali A, Lutz R, Madaio Mi. Fairlearn: Assessing and Improving Fairness of AI Systems. Journal of Machine Learning Research. 2023;24(257):1-8. Accessed October 9, 2025. http://jmlr.org/papers/v24/23-0389.html
work page 2023
-
[28]
Teodorescu M, Morse L, Kane G. Exploring Fairness In Machine Learning For International Development - Module 3: Pedagogical Framework for Addressing Ethical Challenges - Fairness Criteria. MIT Open Learning
-
[29]
Atsawarungruangkit A, Laoveeravat P, Promrat K. Machine learning models for predicting non- alcoholic fatty liver disease in the general United States population: NHANES database. World J Hepatol. 2021;13(10):1417-1427. doi:10.4254/wjh.v13.i10.1417 25
-
[30]
Zhu G, Song Y, Lu Z, et al. Machine learning models for predicting metabolic dysfunction- associated steatotic liver disease prevalence using basic demographic and clinical characteristics. J Transl Med. 2025;23(1):381. doi:10.1186/s12967-025-06387-5
-
[31]
Noureddin M, Ntanios F, Malhotra D, et al. Predicting NAFLD prevalence in the United States using National Health and Nutrition Examination Survey 2017-2018 transient elastography data and application of machine learning. Hepatol Commun. 2022;6(7):1537-1548. doi:10.1002/hep4.1935
-
[32]
Rodriguez LA, Tucker LYS, Saxena V, Levin TR. Discrepancy in Metabolic Dysfunction–Associated Steatotic Liver Disease Prevalence in a Large Northern California Cohort. Gastro Hep Advances. 2025;4(5):100630. doi:10.1016/j.gastha.2025.100630
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.