pith. sign in

arxiv: 2605.20523 · v1 · pith:3AVGJW4Nnew · submitted 2026-05-19 · 💻 cs.LG · cs.AI· q-bio.QM

Machine-Learning-Enhanced Non-Invasive Testing for MASLD Fibrosis: Shallow-Deep Neural Networks Versus FIB-4, Tabular Foundation Models, and Large Language Models

Pith reviewed 2026-05-21 06:57 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-bio.QM
keywords MASLDadvanced fibrosisFIB-4machine learningneural networknon-invasive testexternal validation
0
0 comments X

The pith

A compact neural network improves advanced fibrosis detection over FIB-4 in MASLD using only the same five routine variables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether machine learning can extract more diagnostic value from the inputs already used in the FIB-4 score to identify advanced fibrosis in metabolic dysfunction-associated steatotic liver disease. It trains a small shallow-deep neural network and compares it against the fixed FIB-4 formula, a tabular foundation model, and a fine-tuned large language model on biopsy-confirmed cases. Performance is measured on two held-out external cohorts from Malaysia and India after training on a Chinese cohort. If the reported gains hold, clinicians could obtain modestly higher accuracy in fibrosis staging without ordering additional tests or expanding the data collected in routine care.

Core claim

A shallow-deep neural network with 354 trainable parameters that takes age, FIB-4, aspartate aminotransferase, alanine aminotransferase, and platelet count as inputs achieves external ROC-AUCs of 0.77 in Malaysia and 0.67 in India, compared with FIB-4 values of 0.75 and 0.60 on the same cohorts. The model shows balanced calibration with Brier scores of 0.18 and 0.22 and identifies AST and FIB-4 as the dominant variables by permutation importance.

What carries the argument

The shallow-deep neural network (s-DNN), a compact non-linear model with a few hundred parameters that learns flexible combinations of the five FIB-4 variables to output advanced fibrosis probability.

If this is right

  • Routine blood-test panels already contain enough information for modestly better fibrosis staging if combined non-linearly.
  • Very small models can match or exceed larger foundation models for this narrow clinical task while remaining easy to deploy.
  • AST and the FIB-4 score itself carry most of the predictive signal, so data collection can stay focused on existing labs.
  • External validation on two separate cohorts provides evidence that the gain is not limited to the training population.
  • Similar compact ML replacements could be tested for other fixed non-invasive scores in liver disease.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Integration of the s-DNN into electronic health record systems could enable automatic, real-time fibrosis risk alerts during standard visits.
  • Further testing on cohorts that include more Western patients or varied comorbidities would clarify whether the performance edge persists across populations.
  • Pairing the model with transient elastography or other imaging NITs might produce combined scores that further reduce the need for biopsy.
  • The finding that a tiny network outperforms much larger models suggests that task-specific simplicity can be preferable to general-purpose foundation models in narrow medical applications.

Load-bearing premise

The Malaysian and Indian external cohorts are representative of the broader MASLD population and free of unmeasured selection or label biases that would change the observed performance differences.

What would settle it

A prospective study on an independent biopsy-confirmed MASLD cohort from a different geographic or demographic setting that finds the s-DNN ROC-AUC no higher than FIB-4 would falsify the generalization of the reported improvement.

Figures

Figures reproduced from arXiv: 2605.20523 by Athanasios Angelakis, Eleni-Myrto Trifylli, Filomena Ferrucci, Gabriele De Vito.

Figure 1
Figure 1. Figure 1: System prompt used in the zero-shot setting. [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: User prompt template instantiated per patient. Placeholders in braces are filled with [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: External calibration and explainability analysis of the s-DNN MLE-NIT. Panels A–B [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Exploratory decision-curve analysis on the Malaysian and Indian external validation [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
read the original abstract

Advanced fibrosis is a major determinant of liver-related morbidity in metabolic dysfunction-associated steatotic liver disease (MASLD). FIB-4 is widely used as a first-line non-invasive test, but its fixed formula may underuse diagnostic information contained in age, aspartate aminotransferase, alanine aminotransferase, and platelet count. We evaluated whether machine-learning-enhanced non-invasive testing (MLE-NIT) can improve advanced fibrosis detection while preserving this FIB-4 variable space. We used three biopsy-confirmed MASLD cohorts from China, Malaysia, and India (n=784). The Chinese cohort was split into 486 training and 54 internal validation/tuning patients; final performance was reported only on the Malaysian and Indian external cohorts. Models used five variables: age, FIB-4, aspartate aminotransferase, platelet count, and alanine aminotransferase. We compared FIB-4 with a shallow-deep neural network (s-DNN), TabPFN, and gpt-4o-2024-08-06. FIB-4 achieved external ROC-AUCs of 0.75 and 0.60 in Malaysia and India, respectively. TabPFN achieved 0.69 and 0.66, fine-tuned GPT-4o achieved 0.75 and 0.63, and the s-DNN achieved 0.77 and 0.67, respectively. The s-DNN contained only 354 trainable parameters, compared with 7,244,554 for TabPFN, yet provided a more balanced external operating profile. Calibration showed s-DNN Brier scores of 0.18 and 0.22, and permutation importance identified AST and FIB-4 as dominant variables. Compact non-linear MLE-NITs may enhance FIB-4-based fibrosis assessment without increasing clinical data requirements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript evaluates whether a compact shallow-deep neural network (s-DNN) can improve detection of advanced fibrosis in MASLD over the standard FIB-4 index by using the same clinical variables (age, AST, ALT, platelets) plus FIB-4 itself. Training occurs on a Chinese biopsy-confirmed cohort (n=486), with performance reported on two external cohorts from Malaysia and India. The s-DNN (354 parameters) achieves external ROC-AUCs of 0.77 and 0.67 versus FIB-4's 0.75 and 0.60; comparisons are also made to TabPFN and fine-tuned GPT-4o.

Significance. If the modest AUC gains prove robust, the work could support a low-complexity, non-linear enhancement to FIB-4 that requires no additional clinical data collection. External validation on two independent cohorts is a methodological strength, and the emphasis on model compactness (354 trainable parameters) addresses practical deployment constraints in clinical settings.

major comments (3)
  1. [Abstract/Results] Abstract and Results: The reported AUC improvements are small (Δ=0.02 in Malaysia, Δ=0.07 in India) and the manuscript provides no confidence intervals, DeLong tests, or other statistical comparisons to establish whether these differences exceed sampling variability.
  2. [Methods/Results] Methods and Results: No cohort-matching statistics, demographic tables, or harmonization details (e.g., NASH CRN vs. other staging systems) are supplied for the Malaysian and Indian external cohorts relative to the Chinese training set. This information is load-bearing for the generalizability claim that the s-DNN's performance reflects architecture rather than unmeasured site or selection effects.
  3. [Results] Results: While Brier scores (0.18/0.22) and permutation importance (AST and FIB-4 dominant) are reported, the manuscript does not include calibration plots, decision-curve analysis, or sensitivity checks for label noise in the biopsy ground truth across sites.
minor comments (2)
  1. [Abstract] Abstract: Explicitly state the number of patients in each external cohort (currently only total n=784 is given).
  2. [Methods] Notation: Clarify whether the s-DNN input includes the pre-computed FIB-4 value as a single feature or its four constituent variables separately.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We value the detailed feedback provided, which has helped us identify areas to improve the clarity and rigor of our work. Below, we address each major comment in turn, indicating the revisions we plan to make.

read point-by-point responses
  1. Referee: [Abstract/Results] Abstract and Results: The reported AUC improvements are small (Δ=0.02 in Malaysia, Δ=0.07 in India) and the manuscript provides no confidence intervals, DeLong tests, or other statistical comparisons to establish whether these differences exceed sampling variability.

    Authors: We agree that providing measures of uncertainty and formal statistical comparisons is essential to interpret the modest AUC gains. In the revised manuscript, we will compute and report 95% bootstrap confidence intervals for all AUC values. Additionally, we will perform DeLong tests to assess whether the differences between the s-DNN and FIB-4 (as well as other models) are statistically significant. These results will be added to the Results section and summarized in the Abstract. We note that even small improvements in this clinical context can be meaningful given the low complexity of the model, but we will let the statistical tests speak to their robustness. revision: yes

  2. Referee: [Methods/Results] Methods and Results: No cohort-matching statistics, demographic tables, or harmonization details (e.g., NASH CRN vs. other staging systems) are supplied for the Malaysian and Indian external cohorts relative to the Chinese training set. This information is load-bearing for the generalizability claim that the s-DNN's performance reflects architecture rather than unmeasured site or selection effects.

    Authors: We acknowledge that detailed cohort comparison is important for assessing generalizability. We will add a new table presenting demographic and clinical characteristics (age, sex, BMI, AST, ALT, platelets, FIB-4, fibrosis stage distribution) for all three cohorts. Regarding staging harmonization, all cohorts were biopsy-confirmed MASLD with fibrosis staged using the NASH CRN system or equivalent histological criteria by expert pathologists; we will explicitly state this and any minor differences in the Methods section. This will help clarify that performance differences are more likely attributable to model architecture than site-specific effects. revision: yes

  3. Referee: [Results] Results: While Brier scores (0.18/0.22) and permutation importance (AST and FIB-4 dominant) are reported, the manuscript does not include calibration plots, decision-curve analysis, or sensitivity checks for label noise in the biopsy ground truth across sites.

    Authors: We will enhance the Results by including calibration plots (reliability curves) for the s-DNN and FIB-4 in the supplementary materials to visually assess calibration beyond Brier scores. We will also add decision curve analysis to evaluate clinical utility across different threshold probabilities. For label noise in biopsy ground truth, we will add a discussion noting that while all biopsies were reviewed by experienced hepatopathologists, inter-observer variability is a known limitation in fibrosis staging; however, performing a formal sensitivity analysis would require re-reading of slides or additional annotations not available in the current datasets. We will include this as a limitation. revision: partial

Circularity Check

0 steps flagged

No significant circularity; standard external validation on held-out cohorts.

full rationale

The paper trains shallow-deep NN, TabPFN, and GPT-4o variants on the Chinese cohort (486 train + 54 internal val) and reports performance exclusively on the independent Malaysian and Indian external cohorts. No equations, fitted parameters, or self-citations reduce the reported AUCs, Brier scores, or permutation importances to quantities computed on the test data itself. The derivation chain consists of ordinary supervised learning followed by external evaluation; the central claim that the 354-parameter s-DNN modestly improves on FIB-4 therefore rests on empirical generalization rather than definitional or self-referential reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard supervised-learning assumptions plus the representativeness of the external validation cohorts; no new entities are postulated.

free parameters (1)
  • s-DNN trainable parameters
    The architecture is constrained to 354 parameters; exact layer widths and activation choices are not detailed in the abstract.
axioms (1)
  • domain assumption External cohorts are representative and biopsy labels are reliable
    Invoked when claiming generalization from Chinese training data to Malaysian and Indian test sets.

pith-pipeline@v0.9.0 · 5903 in / 1302 out tokens · 50151 ms · 2026-05-21T06:57:46.992386+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

  1. [1]

    A multisociety delphi consensus statement on new fatty liver disease nomenclature.Hepa- tology, 78(6):1966–1986, 2023

    Mary E Rinella, Jeffrey V Lazarus, Vlad Ratziu, Sven M Francque, Arun J Sanyal, Fasiha Kanwal, Diana Romero, Manal F Abdelmalek, Quentin M Anstee, Juan Pablo Arab, et al. A multisociety delphi consensus statement on new fatty liver disease nomenclature.Hepa- tology, 78(6):1966–1986, 2023

  2. [2]

    The global epidemiology of nonalcoholic fatty liver disease (nafld) and nonalcoholic steatohepatitis (nash): a systematic review.Hepatology, 77(4):1335–1347, 2023

    Zobair M Younossi, Pegah Golabi, James M Paik, Austin Henry, Catherine Van Dongen, and Linda Henry. The global epidemiology of nonalcoholic fatty liver disease (nafld) and nonalcoholic steatohepatitis (nash): a systematic review.Hepatology, 77(4):1335–1347, 2023

  3. [3]

    Mortality outcomes by fibrosis stage in nonalcoholic fatty liver disease: a systematic review and meta-analysis.Clinical Gastroenterology and Hepatology, 21(4):931–939, 2023

    Cheng Han Ng, Wen Hui Lim, Grace En Hui Lim, Darren Jun Hao Tan, Nicholas Syn, Mark D Muthiah, Daniel Q Huang, and Rohit Loomba. Mortality outcomes by fibrosis stage in nonalcoholic fatty liver disease: a systematic review and meta-analysis.Clinical Gastroenterology and Hepatology, 21(4):931–939, 2023

  4. [4]

    Non-invasive testing and risk-stratification in patients with masld.European Journal of Internal Medicine, 122: 11–19, 2024

    Mirko Zoncape, Antonio Liguori, and Emmanuel A Tsochatzis. Non-invasive testing and risk-stratification in patients with masld.European Journal of Internal Medicine, 122: 11–19, 2024

  5. [5]

    Development of a simple noninvasive index to predict significant fibrosis in patients with hiv/hcv coinfection.Hepatology, 43(6):1317–1325, 2006

    Richard K Sterling, Eric Lissen, Nathan Clumeck, Ricard Sola, Maria C Correa, Julio Montaner, Mark S Sulkowski, Francesca J Torriani, Douglas T Dieterich, David L Thomas, Daniel Messinger, and Mark Nelson. Development of a simple noninvasive index to predict significant fibrosis in patients with hiv/hcv coinfection.Hepatology, 43(6):1317–1325, 2006. doi: ...

  6. [6]

    Diabetes and obesity reduce fib-4 accuracy in masld referral pathways.JHEP Reports, page 101735, 2026

    Abdel-Aziz Shaheen, Elizabeth Baguley, Mark G Swain, Matthew Tam, Mang Ming Ma, Giada Sebastiani, Jason Jiang, Frank Lee, Alexandra Medellin, and Juan G Abraldes. Diabetes and obesity reduce fib-4 accuracy in masld referral pathways.JHEP Reports, page 101735, 2026

  7. [7]

    Easl clinical practice guidelines on non- invasive tests for evaluation of liver disease severity and prognosis – 2021 update.Journal of Hepatology, 75(3):659–689, 2021

    European Association for the Study of the Liver. Easl clinical practice guidelines on non- invasive tests for evaluation of liver disease severity and prognosis – 2021 update.Journal of Hepatology, 75(3):659–689, 2021. doi: https://doi.org/10.1016/j.jhep.2021.05.025

  8. [8]

    Aasld practice guideline on blood- based noninvasive liver disease assessment of hepatic fibrosis and steatosis.Hepatology, 81 (1):321–357, 2025

    Richard K Sterling, Keyur Patel, Andres Duarte-Rojo, Sumeet K Asrani, Mouaz Alsawas, Jonathan A Dranoff, Maria I Fiel, M Hassan Murad, Daniel H Leung, Deborah Levine, Tamar H Taddei, Bachir Taouli, and Don C Rockey. Aasld practice guideline on blood- based noninvasive liver disease assessment of hepatic fibrosis and steatosis.Hepatology, 81 (1):321–357, 2...

  9. [9]

    Easl–easd–easo clinical practice guidelines on the management of metabolic dysfunction- associated steatotic liver disease (masld).Journal of Hepatology, 81(3):492–542, 2024

    Frank Tacke, Patrick Horn, Vincent Wai-Sun Wong, Vlad Ratziu, Elisabetta Bugianesi, Sven Francque, Shira Zelber-Sagi, Luca Valenti, Michael Roden, Fritz Schick, Roberto Vettor, Alexandra Kautzky-Willer, Emmanuel A Tsochatzis, and Jörn M Schattenberg. Easl–easd–easo clinical practice guidelines on the management of metabolic dysfunction- associated steatot...

  10. [10]

    Athanasios Angelakis, Ilias Gatos, Thanasis Loupas, Irene Vafiadis, Emanuel Manesis, and Pavlos Zoumpoulis. A deep learning approach to the non-alcoholic fatty liver disease binary classification problem using patient’s gender and features derived from b-mode ultrasound imagesregardingspeedofsoundandechogenicity. InAmerican College of Radiology Annual Mee...

  11. [11]

    Binary classification of chronic liver disease patients using deep learning on morphologic b-mode and demographic data

    Athanasios Angelakis, Ilias Gatos, I Theotokas, E Panteleakou, A Kanavaki, A Soultatos, I Vafiadis, E Manesis, and P S Zoumpoulis. Binary classification of chronic liver disease patients using deep learning on morphologic b-mode and demographic data. InAIUM 2018 Annual Convention, New York, NY, 2018. Conference abstract

  12. [12]

    Athanasios Angelakis, Ilias Gatos, I Theotokas, E Panteleakou, A Kanavaki, A Soultatos, I Vafiadis, E Manesis, and P S Zoumpoulis. A deep learning approach to the significant liver fibrosis binary classification problem using gender, morphologic and hemodynamic measurements derived from b-mode ultrasound images. InEuropean Congress of Radiol- ogy, Vienna,...

  13. [13]

    Athanasios Angelakis and Tianlu Chen. Lbp-02 - using fib-4’s parameters an explainable black-box machine learning model outperforms fib-4 index on the diagnosis of advanced fibrosis of non alcohol related fatty liver disease patients in three cohorts from china, malaysiaandindia.Journal of Hepatology, 78:S100–S101, 2023. ISSN0168-8278. doi: https: //doi.o...

  14. [14]

    Wed-347 diagnosis of advanced liver fibrosis: the synergy of open data, synthetic data generation, catboost, and feature engineering.Journal of Hepatology, 80:S561, 2024

    Athanasios Angelakis. Wed-347 diagnosis of advanced liver fibrosis: the synergy of open data, synthetic data generation, catboost, and feature engineering.Journal of Hepatology, 80:S561, 2024. ISSN0168-8278. doi: https://doi.org/10.1016/S0168-8278(24)01662-3. URL https://www.sciencedirect.com/science/article/pii/S0168827824016623. Abstract Book of EASL Co...

  15. [15]

    Athanasios Angelakis. Wed-369 a shallow-deep neural network approach combining non- invasive tests to enhance advanced fibrosis detection in metabolic dysfunction–associated steatotic liver disease patients.Journal of Hepatology, 82:S533, 2025. ISSN 0168-8278. doi: https://doi.org/10.1016/S0168-8278(25)01463-1. URL https://www.sciencedirect. com/science/a...

  16. [16]

    Diagnosis of fibrosis using blood markers and logistic regression in southeast asian patients with non-alcoholic fatty liver disease.Frontiers in Medicine, 8:637652, 2021

    Chao Sang, Hongmei Yan, Wah Kheong Chan, Xiaopeng Zhu, Tao Sun, Xinxia Chang, Mingfeng Xia, Xiaoyang Sun, Xiqi Hu, Xin Gao, Wei Jia, Hua Bian, Tianlu Chen, and Guoxiang Xie. Diagnosis of fibrosis using blood markers and logistic regression in southeast asian patients with non-alcoholic fatty liver disease.Frontiers in Medicine, 8:637652, 2021. doi: 10.338...

  17. [17]

    Modeling tabular data using conditional gan

    Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Modeling tabular data using conditional gan. InAdvances in Neural Information Processing Systems 32, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/ 254ed7d2de3b23ab10936522dd547b78-Abstract.html

  18. [18]

    Catboost: Unbiased boosting with categorical features

    Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. Catboost: Unbiased boosting with categorical features. InAdvances in Neu- ral Information Processing Systems 31, pages 6638–6648, 2018. URL https://papers.nips. cc/paper_files/paper/2018/hash/14491b756b3a51daac41c24863285549-Abstract.html

  19. [19]

    Cybenko, Approximation by superpositions of a sigmoidal function , Math Control, Signal 2 (4) (1989) 303

    George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2:303–314, 1989. doi: 10.1007/BF02551274

  20. [20]

    Multilayer feedforward net- works are universal approximators.Neural Networks, 2(5):359–366, 1989

    Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward net- works are universal approximators.Neural Networks, 2(5):359–366, 1989. doi: 10.1016/ 0893-6080(89)90020-8

  21. [21]

    LeCun, Y

    Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.Nature, 521(7553): 436–444, 2015. doi: 10.1038/nature14539

  22. [22]

    Zach-vit: Regime-dependent inductive bias in compact vision trans- formers for medical imaging, 2026

    Athanasios Angelakis. Zach-vit: Regime-dependent inductive bias in compact vision trans- formers for medical imaging, 2026. URL https://arxiv.org/abs/2602.17929v2

  23. [23]

    The nafld fibrosis score: A noninvasive system that identifies liver fibrosis in patients with nafld.Hepatology, 45(4):846–854, 2007

    Paul Angulo, Jennifer M Hui, Giulio Marchesini, Elisabetta Bugianesi, Jacob George, Geof- frey C Farrell, Felicity Enders, Sanjeev Saksena, Alastair D Burt, John P Bida, Keith Lin- dor, Susan O Sanderson, Massimo Lenzi, Leon A Adams, James Kench, Terry M Therneau, and Christopher P Day. The nafld fibrosis score: A noninvasive system that identifies liver ...

  24. [24]

    A simple noninvasive index can predict both significant fibrosis and cirrhosis in patients with chronic hepatitis c.Hepatology, 38(2): 518–526, 2003

    Chee T Wai, Joel K Greenson, Robert J Fontana, John D Kalbfleisch, Jorge A Marrero, Hari S Conjeevaram, and Anna S F Lok. A simple noninvasive index can predict both significant fibrosis and cirrhosis in patients with chronic hepatitis c.Hepatology, 38(2): 518–526, 2003. doi: 10.1053/jhep.2003.50346

  25. [25]

    Eleni-Myrto Trifylli, Aleksandra Leszczynska, Anastasios Kriebardis, Nikolaos Papadopou- los, Melanie Deutsch, and Athanasios Angelakis. Wed-368 three proteins in advanced liver fibrosis: a minimalistic shallowdeep neural network approach on metabolic dysfunction– associated steatotic liver disease patients using open data.Journal of Hepatology, 82: S533,...

  26. [26]

    Explainable artificial intelligence on proteomics for the diagnosis of advanced liver fibrosis on masld patients using open data, 2024

    Eleni-Myrto Trifylli, Anastasios G Kriebardis, Nikolaos Papadopoulos, Melanie Deutsch, and Athanasios Angelakis. Explainable artificial intelligence on proteomics for the diagnosis of advanced liver fibrosis on masld patients using open data, 2024. AASLD 2024 abstract

  27. [27]

    Fri- 439 shallow-deep neural networks reveal extracellular vesicles as robust biomarkers for liver steatosis stages s0 vs

    Eleni-Myrto Trifylli, Athanasios Angelakis, Anastasios Kriebardis, Nikolaos Papadopou- los, Sotirios Fortis, Vasiliki Pantazatou, Ioannis Koskinas, Hariklia Kranidioti, Evange- los Koustas, Panagiotis Sarantis, Spilios Manolakopoulos, and Melanie Deutsch. Fri- 439 shallow-deep neural networks reveal extracellular vesicles as robust biomarkers for liver st...

  28. [28]

    Eleni-Myrto Trifylli, Athanasios Angelakis, Anastasios G Kriebardis, Nikolaos Papadopou- los, Sotirios P Fortis, Vasiliki Pantazatou, John Koskinas, Hariklia Kranidioti, Evangelos Koustas, Panagiotis Sarantis, Spilios Manolakopoulos, and Melanie Deutsch. Extracellular vesicles as biomarkers for metabolic dysfunction-associated steatotic liver disease stag...

  29. [29]

    doi: 10.3748/wjg.v31.i22.106937

  30. [30]

    Tabpfn: A transformer that solves small tabular classification problems in a second

    Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. Tabpfn: A transformer that solves small tabular classification problems in a second. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview. net/forum?id=cp5PvcI6w8_

  31. [31]

    Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation.Journal of Machine Learning Technologies, 2(1):37–63, 2011

    David MW Powers. Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation.Journal of Machine Learning Technologies, 2(1):37–63, 2011

  32. [32]

    Chapman & Hall/CRC, New York, 1993

    Bradley Efron and Robert J Tibshirani.An Introduction to the Bootstrap. Chapman & Hall/CRC, New York, 1993

  33. [33]

    Bootstrap confidence intervals: when, which, what? a practical guide for medical statisticians.Statistics in Medicine, 19(9):1141–1164, 2000

    James Carpenter and John Bithell. Bootstrap confidence intervals: when, which, what? a practical guide for medical statisticians.Statistics in Medicine, 19(9):1141–1164, 2000

  34. [34]

    Decisioncurveanalysis: anovelmethodforevaluating prediction models.Medical Decision Making, 26(6):565–574, 2006

    AndrewJVickersandElenaBElkin. Decisioncurveanalysis: anovelmethodforevaluating prediction models.Medical Decision Making, 26(6):565–574, 2006

  35. [35]

    GPT-4o System Card

    OpenAI. GPT-4o System Card. https://openai.com/index/gpt-4o-system-card/, 2024. Accessed 2026-05-10

  36. [36]

    Fine-Tuning Guide

    OpenAI. Fine-Tuning Guide. https://platform.openai.com/docs/guides/fine-tuning, 2026. Accessed 2026-05-10

  37. [37]

    Heliot: Llm-based cdss for adverse drug reaction management.Knowledge-Based Systems, page 114184, 2025

    Gabriele De Vito, Filomena Ferrucci, and Athanasios Angelakis. Heliot: Llm-based cdss for adverse drug reaction management.Knowledge-Based Systems, page 114184, 2025. doi: 10.1016/j.knosys.2025.114184

  38. [38]

    Llms for drug-drug inter- action prediction using textual drug descriptors.Knowledge-Based Systems, page 115486,

    Gabriele De Vito, Filomena Ferrucci, and Athanasios Angelakis. Llms for drug-drug inter- action prediction using textual drug descriptors.Knowledge-Based Systems, page 115486,

  39. [39]

    doi: 10.1016/j.knosys.2026.115486

  40. [40]

    Assessing the performance of prediction models: a framework for traditional and novel measures.Epidemiology, 21 (1):128–138, 2010

    Ewout W Steyerberg, Andrew J Vickers, Nancy R Cook, Thomas Gerds, Mithat Gonen, Nancy Obuchowski, Michael J Pencina, and Michael W Kattan. Assessing the performance of prediction models: a framework for traditional and novel measures.Epidemiology, 21 (1):128–138, 2010. 25

  41. [41]

    On calibration of modern neural networks

    Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. InInternational Conference on Machine Learning, pages 1321–1330. PMLR, 2017

  42. [42]

    Random forests.Machine Learning, 45(1):5–32, 2001

    Leo Breiman. Random forests.Machine Learning, 45(1):5–32, 2001

  43. [43]

    Permutation impor- tance: a corrected feature importance measure.Bioinformatics, 26(10):1340–1347, 2010

    Andre Altmann, Laura Toloşi, Oliver Sander, and Thomas Lengauer. Permutation impor- tance: a corrected feature importance measure.Bioinformatics, 26(10):1340–1347, 2010. 26