pith. sign in

arxiv: 2510.22293 · v4 · submitted 2025-10-25 · 💻 cs.LG · cs.CY· q-bio.QM

Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods: A Retrospective Cohort Study

Pith reviewed 2026-05-18 04:21 UTC · model grok-4.3

classification 💻 cs.LG cs.CYq-bio.QM
keywords MASLDmachine learningelectronic health recordslogistic regressionfairnessliver disease predictionprimary care
0
0 comments X

The pith

A LASSO logistic regression model using ten routine EHR features predicts MASLD with an AUROC of 0.84 before fairness adjustments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The study builds a machine learning model to detect metabolic dysfunction-associated steatotic liver disease early using data already in electronic health records. Researchers tested several algorithms and chose LASSO logistic regression for its simplicity and solid results on the top ten clinical features. This model reached an area under the curve of 0.84 on test data from nearly sixty thousand patients. To address unequal detection rates across racial and ethnic groups, they applied a postprocessing step that boosted overall accuracy to 81 percent and specificity to 94 percent, though sensitivity dropped to 41 percent. The goal is to create a practical tool that primary care doctors can use for screening without needing new tests or complex systems.

Core claim

The authors developed the MASER model, a LASSO logistic regression trained on the top 10 features from a large retrospective EHR cohort of over 100,000 participants split into training, validation, and testing sets. Before fairness adjustment, it achieved an AUROC of 0.84, accuracy of 78%, sensitivity of 72%, and specificity of 79%. After equal opportunity postprocessing to equalize true positive rates across subgroups, performance shifted to 81% accuracy, 94% specificity, and 41% sensitivity. The model relies on routinely collected clinical features in a diverse population and is intended to support early MASLD detection in primary care.

What carries the argument

The MASER prediction model, consisting of LASSO logistic regression on the top 10 ranked EHR features combined with equal opportunity postprocessing to enforce fairness across racial and ethnic groups.

If this is right

  • Supports early detection of MASLD in primary care using only routinely collected data.
  • Demonstrates that fairness adjustments can be applied to improve equity in prediction models at a modest cost to sensitivity.
  • Achieves performance comparable to more complex models while remaining interpretable.
  • Designed for potential integration into existing primary care workflows pending further validation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Prospective validation in real-time clinical settings would be needed to confirm the model's utility in changing patient outcomes.
  • The sensitivity-specificity trade-off suggests it may work best as a rule-out tool or in combination with other screening methods.
  • Similar fairness-aware modeling could be applied to prediction of other chronic conditions using EHR data.
  • The top 10 features identified could inform which routine checks are most valuable for liver health monitoring.

Load-bearing premise

The electronic health record data used for training and testing correctly labels MASLD cases and includes all key predictors without significant bias or missing information that would alter the feature rankings or performance.

What would settle it

A follow-up study that applies the model to a new, independent cohort of patients and compares its predictions against gold-standard MASLD diagnoses obtained through imaging or biopsy to verify if the reported AUROC and post-adjustment metrics hold.

Figures

Figures reproduced from arXiv: 2510.22293 by Balakrishnan S. Ramakrishna, Jonathan G. Stine, Mary E. An, Paul M. Griffin, Soundar R.T. Kumara.

Figure 1
Figure 1. Figure 1: Flowchart of Patient Selection with ICD-10-CM codes and Subsequently with ICD-9-CM codes Data Preprocessing TriNetX provides data in multiple tables with the patient ID as the key. For our analysis, four tables were used: patient, diagnosis, lab result, and vital signs. Data processing was performed using PySpark (Apache Spark version 3.2.0) in Python within a Jupyter Notebook environment [PITH_FULL_IMAGE… view at source ↗
read the original abstract

Background: Metabolic dysfunction-associated steatotic liver disease (MASLD) affects 30-40% of US adults and is the most common chronic liver disease. Although often asymptomatic, progression can lead to cirrhosis. The objective of the study was to develop and evaluate an electronic health record (EHR) based prediction model to support early detection of MASLD in primary care settings. Methods: We evaluated LASSO logistic regression, random forest, XGBoost, and a neural network model for MASLD prediction using clinical feature subsets from a large EHR database, including the top 10 ranked features. To reduce disparities in true positive rates across racial and ethnic subgroups, we applied an equal opportunity postprocessing method in a prediction model called MASLD EHR Static Risk Prediction (MASER). Results: This retrospective cohort study included 59,492 participants in the training data, 24,198 in the validating data, and 25,188 in the testing data. The LASSO logistic regression model with the top 10 features was selected for its interpretability and comparable performance. Before fairness adjustment, the model achieved AUROC of 0.84, accuracy of 78%, sensitivity of 72%, specificity of 79%, and F1-score of 0.617. After equal opportunity postprocessing, accuracy modestly increased to 81% and specificity to 94%, while sensitivity decreased to 41% and F1-score to 0.515, reflecting the fairness trade-off. Conclusions: MASER achieved competitive performance for MASLD prediction, comparable to previously reported ensemble and tree-based models, while using a limited and routinely collected feature set and a diverse study population. The model is designed to support early detection and potential integration into primary care workflows. MASER demonstrates EHR-ready MASLD prediction with fairness adjustments, supporting future primary care implementation pending prospective validation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims to develop and evaluate several ML models (LASSO logistic regression, random forest, XGBoost, neural network) for predicting MASLD from EHR data in a retrospective cohort of ~109k patients split into 59k/24k/25k train/val/test sets. The LASSO model using the top-10 features is highlighted for interpretability, reporting AUROC 0.84, accuracy 78%, sensitivity 72%, specificity 79% before fairness adjustment; after equal-opportunity postprocessing the accuracy rises to 81% and specificity to 94% while sensitivity falls to 41%. The resulting MASER model is positioned as suitable for early primary-care detection with fairness considerations across racial/ethnic groups.

Significance. If the outcome labels prove reliable, the work supplies an interpretable, limited-feature EHR model that explicitly trades off performance for equal-opportunity fairness, which is a concrete contribution for deployment-oriented MASLD screening. The large, diverse cohort and before/after fairness metrics are strengths that would support clinical translation once label validity is demonstrated.

major comments (3)
  1. [Methods] Methods (case definition paragraph): The binary MASLD outcome is never given an explicit algorithm (ICD-10 codes, ALT/AST thresholds, FibroScan mention, or billing diagnosis combination). Because retrospective EHR proxies typically achieve only 30-60% sensitivity versus imaging or biopsy, the reported AUROC of 0.84, the top-10 feature ranking, and the post-adjustment sensitivity drop to 41% could all be artifacts of label noise rather than true signal; a chart-review validation subset or sensitivity analysis on label thresholds is required.
  2. [Methods] Methods (feature selection): The description of how the top-10 features were ranked is ambiguous; if ranking used the full cohort or validation/test data rather than training data alone, leakage would inflate the held-out AUROC and undermine the claim that the model generalizes.
  3. [Results] Results (fairness paragraph): The equal-opportunity postprocessing produces a clinically large sensitivity reduction (72% → 41%) while specificity rises to 94%; the manuscript must quantify the net clinical utility (e.g., number of missed cases per 1000 screened) and discuss whether this trade-off still supports “early detection” use.
minor comments (2)
  1. [Abstract] Abstract: the phrase “top 10 ranked features” should specify the ranking criterion (LASSO coefficient magnitude, permutation importance, etc.).
  2. [Results] Table/figure captions: ensure all performance metrics are accompanied by 95% CIs or standard errors on the test set.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major point below and have revised the manuscript to improve clarity and address concerns where feasible.

read point-by-point responses
  1. Referee: [Methods] Methods (case definition paragraph): The binary MASLD outcome is never given an explicit algorithm (ICD-10 codes, ALT/AST thresholds, FibroScan mention, or billing diagnosis combination). Because retrospective EHR proxies typically achieve only 30-60% sensitivity versus imaging or biopsy, the reported AUROC of 0.84, the top-10 feature ranking, and the post-adjustment sensitivity drop to 41% could all be artifacts of label noise rather than true signal; a chart-review validation subset or sensitivity analysis on label thresholds is required.

    Authors: We agree that the case definition paragraph requires greater explicitness. In the revised manuscript we have added the precise combination of ICD-10 codes and laboratory value thresholds used to define the binary MASLD outcome. We have also expanded the limitations section to discuss the known performance characteristics of EHR-based proxies. A full chart-review validation subset was not available within the scope of this retrospective study; however, we have added a sensitivity analysis that varies the laboratory thresholds to assess the stability of model performance and feature rankings. revision: partial

  2. Referee: [Methods] Methods (feature selection): The description of how the top-10 features were ranked is ambiguous; if ranking used the full cohort or validation/test data rather than training data alone, leakage would inflate the held-out AUROC and undermine the claim that the model generalizes.

    Authors: We appreciate the referee highlighting this potential ambiguity. Feature ranking for the top-10 features was performed exclusively on the training set (n=59,492) using LASSO coefficients prior to any evaluation on validation or test data. We have revised the methods section to state this explicitly and to confirm that no information from the held-out sets influenced feature selection or ranking. revision: yes

  3. Referee: [Results] Results (fairness paragraph): The equal-opportunity postprocessing produces a clinically large sensitivity reduction (72% → 41%) while specificity rises to 94%; the manuscript must quantify the net clinical utility (e.g., number of missed cases per 1000 screened) and discuss whether this trade-off still supports “early detection” use.

    Authors: We agree that the clinical consequences of the observed sensitivity reduction merit explicit quantification. In the revised results and discussion we have added calculations of net clinical utility, including the approximate number of additional missed cases per 1,000 patients screened under the equal-opportunity postprocessed model. We also discuss the implications for primary-care early detection, noting that the substantial gain in specificity may reduce over-referral while the fairness adjustment improves equity, although the lower sensitivity remains a limitation for comprehensive case finding. revision: yes

standing simulated objections not resolved
  • Chart-review validation of the MASLD outcome labels

Circularity Check

0 steps flagged

No significant circularity in the empirical ML pipeline

full rationale

The paper describes a standard supervised learning workflow: LASSO logistic regression (and comparators) are fit on a 59k training split, top-10 features are ranked from that fit, and all reported metrics (AUROC 0.84, accuracy, sensitivity, etc.) are computed on a fully held-out 25k test split. Equal-opportunity post-processing is then applied to the test predictions to produce the adjusted metrics. None of these steps reduces the final performance numbers to a fitted parameter by construction, nor does the text invoke self-citations, uniqueness theorems, or ansatzes that would make the claimed results tautological. The derivation chain is therefore self-contained empirical evaluation rather than circular.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim depends on the quality and representativeness of the retrospective EHR labels and on the assumption that limiting to top-10 routinely collected features loses little predictive power; no new physical or mathematical entities are introduced.

free parameters (2)
  • Top-10 feature ranking threshold
    Features were ranked and the top 10 retained; the ranking and cutoff are data-dependent choices that affect the final model.
  • Equal-opportunity fairness threshold
    Postprocessing parameter chosen to equalize true-positive rates across subgroups; its specific value is not stated and influences the reported sensitivity-specificity trade-off.
axioms (2)
  • domain assumption EHR diagnostic codes and lab values provide reliable ground-truth labels for MASLD without substantial misclassification or missing data.
    Retrospective cohort studies using administrative data rest on this unverified assumption about label accuracy.
  • domain assumption The study population and feature distributions are representative of primary-care patients who would receive the model in practice.
    Generalization from one large EHR database to broader deployment is assumed without external validation.

pith-pipeline@v0.9.0 · 5906 in / 1652 out tokens · 41455 ms · 2026-05-18T04:21:29.609339+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    Epidemiology of metabolic dysfunction-associated steatotic liver disease

    Younossi ZM, Kalligeros M, Henry L. Epidemiology of metabolic dysfunction-associated steatotic liver disease. Clin Mol Hepatol. 2025;31(Suppl):S32-S50. doi:10.3350/cmh.2024.0431

  2. [2]

    Current status and future trends of the global burden of MASLD

    Miao L, Targher G, Byrne C, Cao YY, Zheng MH. Current status and future trends of the global burden of MASLD. Trends in Endocrinology & Metabolism. 2024;35(8):697-707

  3. [3]

    Metabolic Dysfunction–Associated Steatotic Liver Disease— The New Epidemic of Chronic Liver Disease

    Ginès P, Serra-Burriel M, Kamath PS. Metabolic Dysfunction–Associated Steatotic Liver Disease— The New Epidemic of Chronic Liver Disease. JAMA Netw Open. 2025;8(6):e2516381. doi:10.1001/jamanetworkopen.2025.16381

  4. [4]

    Current guidelines for the management of non-alcoholic fatty liver disease: A systematic review with comparative analysis

    Leoni S, Tovoli F, Napoli L, Serio I, Ferri S, Bolondi L. Current guidelines for the management of non-alcoholic fatty liver disease: A systematic review with comparative analysis. World J Gastroenterol. 2018;24(30):3361-3373. doi:10.3748/wjg.v24.i30.3361

  5. [5]

    Non-alcoholic fatty liver disease: An expanded review

    Benedict M, Zhang X. Non-alcoholic fatty liver disease: An expanded review. World J Hepatol. 2017;9(16):715. doi:10.4254/wjh.v9.i16.715

  6. [6]

    Imaging evaluation of non-alcoholic fatty liver disease: focused on quantification

    Lee DH. Imaging evaluation of non-alcoholic fatty liver disease: focused on quantification. Clin Mol Hepatol. 2017;23(4):290-301. doi:10.3350/cmh.2017.0042

  7. [7]

    Metabolomics and lipidomics in NAFLD: biomarkers and non-invasive diagnostic tests

    Masoodi M, Gastaldelli A, Hyötyläinen T, et al. Metabolomics and lipidomics in NAFLD: biomarkers and non-invasive diagnostic tests. Nat Rev Gastroenterol Hepatol. 2021;18(12):835-

  8. [8]

    doi:10.1038/s41575-021-00502-9

  9. [9]

    Appendix N, Cost-Effectiveness Analysis: Diagnostic Tests for NAFLD and Advanced Fibrosis

    National Guideline Centre (UK). Appendix N, Cost-Effectiveness Analysis: Diagnostic Tests for NAFLD and Advanced Fibrosis. In: Non-Alcoholic Fatty Liver Disease Disease: Assessment and Management. London: National Institute for Health and Care Excellence (NICE); 2016

  10. [10]

    Prevalence of and risk factors for nonalcoholic fatty liver disease: The Dionysos nutrition and liver study

    Bedogni G, Miglioli L, Masutti F, Tiribelli C, Marchesini G, Bellentani S. Prevalence of and risk factors for nonalcoholic fatty liver disease: The Dionysos nutrition and liver study. Hepatology. 2005;42(1):44-52. doi:10.1002/hep.20734

  11. [11]

    The Fatty Liver Index: a simple and accurate predictor of hepatic steatosis in the general population

    Bedogni G, Bellentani S, Miglioli L, et al. The Fatty Liver Index: a simple and accurate predictor of hepatic steatosis in the general population. BMC Gastroenterol. 2006;6(1):33. doi:10.1186/1471- 230X-6-33

  12. [12]

    Hepatic steatosis index: A simple screening tool reflecting nonalcoholic fatty liver disease

    Lee JH, Kim D, Kim HJ, et al. Hepatic steatosis index: A simple screening tool reflecting nonalcoholic fatty liver disease. Digestive and Liver Disease. 2010;42(7):503-508. doi:10.1016/j.dld.2009.08.002

  13. [13]

    ZJU index: a novel model for predicting nonalcoholic fatty liver disease in a Chinese population

    Wang J, Xu C, Xun Y, et al. ZJU index: a novel model for predicting nonalcoholic fatty liver disease in a Chinese population. Sci Rep. 2015;5(1):16494. doi:10.1038/srep16494

  14. [14]

    Measurement error of waist circumference: gaps in knowledge

    Verweij LM, Terwee CB, Proper KI, Hulshof CTJ, van Mechelen W. Measurement error of waist circumference: gaps in knowledge. Public Health Nutr. 2013;16(2):281-288. doi:10.1017/S1368980012002741 24

  15. [15]

    Waist circumference as a vital sign in clinical practice: a Consensus Statement from the IAS and ICCR Working Group on Visceral Obesity

    Ross R, Neeland IJ, Yamashita S, et al. Waist circumference as a vital sign in clinical practice: a Consensus Statement from the IAS and ICCR Working Group on Visceral Obesity. Nat Rev Endocrinol. 2020;16(3):177-189. doi:10.1038/s41574-019-0310-7

  16. [16]

    What Do We Know about Inequalities in NAFLD Distribution and Outcomes? A Scoping Review

    Talens M, Tumas N, Lazarus J V, Benach J, Pericàs JM. What Do We Know about Inequalities in NAFLD Distribution and Outcomes? A Scoping Review. J Clin Med. 2021;10(21). doi:10.3390/jcm10215019

  17. [17]

    Racial and Ethnic Disparities in Nonalcoholic Fatty Liver Disease Prevalence, Severity, and Outcomes in the United States: A Systematic Review and Meta-analysis

    Rich NE, Oji S, Mufti AR, et al. Racial and Ethnic Disparities in Nonalcoholic Fatty Liver Disease Prevalence, Severity, and Outcomes in the United States: A Systematic Review and Meta-analysis. Clin Gastroenterol Hepatol. 2018;16(2):198-210.e2. doi:10.1016/j.cgh.2017.09.041

  18. [18]

    Penn State Clinical and Translational Science Institute

    Citing the Institute. Penn State Clinical and Translational Science Institute. Accessed August 14,

  19. [19]

    https://ctsi.psu.edu/citing-ctsi/

  20. [20]

    Liver Disease

    Cleveland Clinic Medical Professional. Liver Disease. Cleveland Clinic. October 4, 2023. Accessed November 15, 2023. https://my.clevelandclinic.org/health/diseases/17179-liver-disease

  21. [21]

    Liver Disease

    Mayo Clinic Staff. Liver Disease. Mayo Clinic. June 21, 2023. Accessed November 15, 2023. https://www.mayoclinic.org/diseases-conditions/liver-problems/symptoms-causes/syc- 20374502

  22. [22]

    Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning

    Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine Learning Research. 2017;18(17):1-

  23. [23]

    http://jmlr.org/papers/v18/16-365

    Accessed October 9, 2025. http://jmlr.org/papers/v18/16-365

  24. [24]

    A unified approach to interpreting model predictions

    Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: von Luxburg U, Guyon I, eds. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates Inc.; 2017:4768-4777

  25. [25]

    Receiver operating characteristic curve analysis in diagnostic accuracy studies

    Çorbacıoğlu ŞK, Aksel G. Receiver operating characteristic curve analysis in diagnostic accuracy studies. Turk J Emerg Med. 2023;23(4):195-198. doi:10.4103/tjem.tjem_182_23

  26. [26]

    Evaluating Diagnostic and Screening Tests

    Poorolajal J. Evaluating Diagnostic and Screening Tests. In: Illustrated Epidemiology. Springer Nature Singapore; 2025:11-22. doi:10.1007/978-981-96-9566-9_2

  27. [27]

    Fairlearn: Assessing and Improving Fairness of AI Systems

    Weerts H, Dudik M, Edgar R, Jalali A, Lutz R, Madaio Mi. Fairlearn: Assessing and Improving Fairness of AI Systems. Journal of Machine Learning Research. 2023;24(257):1-8. Accessed October 9, 2025. http://jmlr.org/papers/v24/23-0389.html

  28. [28]

    Exploring Fairness In Machine Learning For International Development - Module 3: Pedagogical Framework for Addressing Ethical Challenges - Fairness Criteria

    Teodorescu M, Morse L, Kane G. Exploring Fairness In Machine Learning For International Development - Module 3: Pedagogical Framework for Addressing Ethical Challenges - Fairness Criteria. MIT Open Learning

  29. [29]

    Machine learning models for predicting non- alcoholic fatty liver disease in the general United States population: NHANES database

    Atsawarungruangkit A, Laoveeravat P, Promrat K. Machine learning models for predicting non- alcoholic fatty liver disease in the general United States population: NHANES database. World J Hepatol. 2021;13(10):1417-1427. doi:10.4254/wjh.v13.i10.1417 25

  30. [30]

    Machine learning models for predicting metabolic dysfunction- associated steatotic liver disease prevalence using basic demographic and clinical characteristics

    Zhu G, Song Y, Lu Z, et al. Machine learning models for predicting metabolic dysfunction- associated steatotic liver disease prevalence using basic demographic and clinical characteristics. J Transl Med. 2025;23(1):381. doi:10.1186/s12967-025-06387-5

  31. [31]

    Predicting NAFLD prevalence in the United States using National Health and Nutrition Examination Survey 2017-2018 transient elastography data and application of machine learning

    Noureddin M, Ntanios F, Malhotra D, et al. Predicting NAFLD prevalence in the United States using National Health and Nutrition Examination Survey 2017-2018 transient elastography data and application of machine learning. Hepatol Commun. 2022;6(7):1537-1548. doi:10.1002/hep4.1935

  32. [32]

    Discrepancy in Metabolic Dysfunction–Associated Steatotic Liver Disease Prevalence in a Large Northern California Cohort

    Rodriguez LA, Tucker LYS, Saxena V, Levin TR. Discrepancy in Metabolic Dysfunction–Associated Steatotic Liver Disease Prevalence in a Large Northern California Cohort. Gastro Hep Advances. 2025;4(5):100630. doi:10.1016/j.gastha.2025.100630