pith. sign in

arxiv: 2606.09860 · v1 · pith:JLPRTFUMnew · submitted 2026-05-31 · 💻 cs.LG · cs.AI· stat.AP· stat.ML

Conformal Risk Prediction for Non-Alcoholic Fatty Liver Disease Using Gradient Boosting with Distribution-Free Coverages

Pith reviewed 2026-06-28 17:34 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.APstat.ML
keywords NAFLDconformal predictiongradient boostingrisk predictiondistribution-free coveragerisk stratificationmetabolic biomarkers
0
0 comments X

The pith

Gradient boosting with conformal prediction yields NAFLD risk sets that meet distribution-free coverage guarantees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that couples gradient-boosted decision trees with conformal prediction to generate individual risk estimates for non-alcoholic fatty liver disease accompanied by calibrated coverage bounds that hold without distributional assumptions. On a primary multicenter cohort of 2187 patients and an external set of 412, the model attains AUROCs of 0.912 and 0.891 respectively while outperforming deep neural networks, TabNet, SVMs, and logistic regression. A three-tier stratification derived from the conformal scores separates patients such that the high-risk group exhibits a 4.7-fold higher 12-month progression rate than the low-risk group, and the selected metabolic features carry biological plausibility.

Core claim

Coupling gradient-boosted trees with mutual-information stability selection and conformal prediction produces prediction sets for NAFLD risk whose marginal coverage exceeds any user-specified level, while the underlying scores achieve AUROC 0.912 internally and 0.891 externally and support a risk stratification in which the high-risk tier shows 4.7 times the 12-month progression rate of the low-risk tier.

What carries the argument

Conformal prediction sets constructed around gradient-boosting outputs, which enforce distribution-free marginal coverage on the individual risk estimates.

If this is right

  • The conformal sets achieve 91.3% empirical coverage at the 90% nominal level.
  • The high-risk subgroup shows a 12-month progression rate 4.7 times that of the low-risk tier.
  • The retained features (waist circumference, ALT, GGT, triglycerides, fasting glucose, BMI) match established metabolic risk factors.
  • The framework outperforms deep neural networks, TabNet, support vector machines, and logistic regression on both internal and external validation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pipeline could be applied to other metabolic or hepatic conditions without altering the coverage guarantees, provided exchangeability holds.
  • The compact feature subset may enable low-cost screening programs that avoid full laboratory panels.
  • External validation on populations with different demographics would test whether the coverage property transfers beyond the Chinese cohorts studied.

Load-bearing premise

The observations satisfy the exchangeability condition required for the conformal procedure to deliver valid marginal coverage guarantees.

What would settle it

An independent cohort in which the fraction of true outcomes contained inside the conformal prediction sets falls below the nominal coverage level.

Figures

Figures reproduced from arXiv: 2606.09860 by Xinze Zhang.

Figure 1
Figure 1. Figure 1: Schematic illustration of the clinical gap in NAFLD screening. Existing composite scores and single biomarkers [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the LIVERRISK pipeline. Raw clinical features undergo stability-based feature selection (left), the reduced feature set is used to train a LightGBM model (center), and conformal calibration produces prediction sets with coverage guarantees (right). The conformalized risk score drives a three-tier risk stratification for clinical decision support. 4 Experiments 4.1 Dataset and Experimental Setup… view at source ↗
Figure 3
Figure 3. Figure 3: AUROC on the (a) internal test set and (b) external [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Feature group ablation: AUROC decrease when [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Internal-test AUROC for different combinations [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
read the original abstract

Non-alcoholic fatty liver disease (NAFLD) affects roughly 25% of global adults, posing substantial hepatic and cardiovascular risks. Yet, population-level screening tools remain inadequate. We present Method, a machine-learning framework for NAFLD risk prediction coupling gradient-boosted decision trees with conformal prediction to yield calibrated, distribution-free coverage guarantees on individual risk estimates. It integrates a mutual-information-based stability selection procedure to identify a compact, clinically interpretable feature subset via bootstrap resampling, constructing prediction sets whose marginal coverage provably exceeds a user-specified confidence level. We evaluated Method on a multicenter cohort from Guangzhou, China (primary n=2,187; external validation n=412) using 78 candidate features across demographics, metabolic biomarkers, and lifestyle factors. Method achieves an AUROC of 0.912 internally and 0.891 externally, outperforming deep neural networks, TabNet, support vector machines, and logistic regression. Conformal prediction sets achieve 91.3% empirical coverage at the 90% nominal level. A three-tier risk stratification derived from these scores separates the population into distinct groups, with the high-risk subgroup showing a 12-month progression rate 4.7 times that of the low-risk tier. The selected features -- notably waist circumference, ALT, GGT, triglycerides, fasting glucose, and BMI -- align with established metabolic risk factors, providing biological plausibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces a framework combining gradient-boosted decision trees with conformal prediction and mutual-information stability selection to produce NAFLD risk predictions with distribution-free marginal coverage guarantees. On a multicenter Guangzhou cohort (n=2187 primary, n=412 external) using 78 features, it reports AUROC 0.912 (internal) and 0.891 (external), 91.3% empirical coverage at the 90% nominal level, and a three-tier stratification in which the high-risk group exhibits 4.7-fold higher 12-month progression than the low-risk group, outperforming DNNs, TabNet, SVMs, and logistic regression.

Significance. If the coverage guarantees are valid, the work supplies a practically useful, interpretable risk-stratification tool for a high-prevalence condition, with external validation and explicit comparison to strong baselines. The conformal component supplies a form of calibrated uncertainty that is uncommon in clinical ML and could support more reliable screening decisions.

major comments (1)
  1. [Methods (conformal prediction procedure) and Results (coverage evaluation)] The central claim of distribution-free marginal coverage (provably exceeding the nominal level) rests on the exchangeability assumption required by conformal prediction. The multicenter design and 12-month progression labels introduce plausible site-specific and temporal dependence; no exchangeability diagnostics, randomization tests, or sensitivity analysis appear in the reported methods or results. This assumption is load-bearing for the guarantee, while the reported empirical coverage alone does not verify it.
minor comments (1)
  1. [Abstract] The abstract refers to the method as 'Method'; a concrete name would aid readability and citation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights an important assumption underlying our conformal prediction approach. We address the major comment below and commit to revisions that strengthen the manuscript.

read point-by-point responses
  1. Referee: [Methods (conformal prediction procedure) and Results (coverage evaluation)] The central claim of distribution-free marginal coverage (provably exceeding the nominal level) rests on the exchangeability assumption required by conformal prediction. The multicenter design and 12-month progression labels introduce plausible site-specific and temporal dependence; no exchangeability diagnostics, randomization tests, or sensitivity analysis appear in the reported methods or results. This assumption is load-bearing for the guarantee, while the reported empirical coverage alone does not verify it.

    Authors: We agree that the exchangeability assumption is fundamental to the distribution-free marginal coverage guarantee of conformal prediction, and that the multicenter structure together with the 12-month follow-up labels could introduce site-specific or temporal dependencies that merit explicit examination. The external validation cohort (n=412) drawn from a separate center already provides some empirical evidence of transportability, yet this does not substitute for direct diagnostics. In the revised manuscript we will add: (i) a methods subsection describing exchangeability checks (e.g., Kolmogorov-Smirnov tests on covariate distributions across the two centers and a permutation test that randomly reassigns site labels while recomputing coverage); (ii) a sensitivity analysis that reports coverage under leave-one-center-out and time-stratified splits; and (iii) explicit language in both the methods and discussion stating that the theoretical guarantee holds conditionally on exchangeability and discussing the practical implications of potential mild violations. These additions will be placed in the Methods and Results sections without altering the reported empirical coverage or performance metrics. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper applies off-the-shelf conformal prediction to gradient-boosted outputs and reports empirical coverage (91.3 % at 90 % nominal) as a post-hoc check rather than a derived quantity. The marginal-coverage guarantee is imported from standard conformal theory; the paper does not re-derive or redefine it via its own equations. Stability selection, feature ranking, and three-tier stratification are presented as downstream applications, none of which reduce by construction to fitted parameters or self-citations. No load-bearing self-citation chain or ansatz smuggling appears in the provided text. The derivation chain is therefore self-contained against external conformal results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard exchangeability assumption of conformal prediction and on the representativeness of the reported Chinese cohort; no free parameters or invented entities are introduced beyond the user-specified coverage level.

axioms (1)
  • domain assumption Data points are exchangeable so that conformal prediction supplies valid marginal coverage
    This is the load-bearing premise that converts the conformal procedure into distribution-free guarantees.

pith-pipeline@v0.9.1-grok · 5784 in / 1284 out tokens · 31725 ms · 2026-06-28T17:34:48.406598+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    A. N. Angelopoulos and S. Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification.arXiv preprint arXiv:2107.07511, 2021

  2. [2]

    Angulo, J

    P. Angulo, J. M. Hui, G. Marchesini, E. Bugianesi, J. George, G. C. Farrell, F. Enders, S. Saksena, A. D. Burt, J. P. Bida, et al. The nafld fibrosis score: a non- invasive system that identifies liver fibrosis in patients with nafld.Hepatology, 45(4):846–854, 2007

  3. [3]

    S. O. Arik and T. Pfister. TabNet: Attentive interpretable tabular learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6679–6687, 2021

  4. [4]

    R. F. Barber, E. J. Cand `es, A. Ramdas, and R. J. Tib- shirani. Conformal prediction beyond exchangeability. Annals of Statistics, 51(2):816–845, 2023

  5. [5]

    Bedogni, S

    G. Bedogni, S. Bellentani, L. Miglioli, F. Masutti, M. Passalacqua, A. Castiglione, and C. Tiribelli. The fatty liver index: a simple and accurate predictor of hep- atic steatosis in the general population.BMC Gastroen- terology, 6(1):1–7, 2006

  6. [6]

    L. Breiman. Random forests.Machine Learning, 45(1):5–32, 2001

  7. [7]

    E. J. Cand `es, L. Lei, and Z. Ren. Conformalized survival analysis.Journal of the Royal Statistical Society: Series B, 85(1):24–45, 2023

  8. [8]

    Chalasani, Z

    N. Chalasani, Z. Younossi, J. E. Lavine, M. Charlton, K. Cusi, M. Rinella, S. A. Harrison, E. M. Brunt, and A. J. Sanyal. The diagnosis and management of non- alcoholic fatty liver disease: practice guidance from 10 the american association for the study of liver diseases. Hepatology, 67(1):328–357, 2018

  9. [9]

    Chen and C

    T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining, pages 785–794, 2016

  10. [10]

    Z. Chen, Y . Hu, Z. Li, Z. Fu, X. Song, and L. Nie. Off- set: Segmentation-based focus shift revision for com- posed image retrieval. InProceedings of the ACM Inter- national Conference on Multimedia (ACM MM), pages 6113–6122, 2025

  11. [11]

    Z. Chen, Y . Hu, Z. Li, Z. Fu, H. Wen, and W. Guan. Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval. InProceedings of the ACM International Conference on Multimedia (ACM MM), pages 6143–6152, 2025

  12. [12]

    Intent: Invari- ance and discrimination-aware noise mitigation for ro- bust composed image retrieval

    Zhiwei Chen, Yupeng Hu, Zhiheng Fu, Zixu Li, Jiale Huang, Qinlei Huang, and Yinwei Wei. Intent: Invari- ance and discrimination-aware noise mitigation for ro- bust composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, 2026

  13. [13]

    T. M. Cover and J. A. Thomas.Elements of Information Theory. John Wiley & Sons, 2nd edition, 2006

  14. [14]

    F. Fleuret. Fast binary feature selection with conditional mutual information.Journal of Machine Learning Re- search, 5:1531–1555, 2004

  15. [15]

    Air-know: Arbiter-calibrated knowledge-internalizing robust network for composed image retrieval

    Zhiheng Fu, Yupeng Hu, Qianyun Yang, Shiqi Zhang, Zhiwei Chen, and Zixu Li. Air-know: Arbiter-calibrated knowledge-internalizing robust network for composed image retrieval. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), 2026

  16. [16]

    Grinsztajn, E

    L. Grinsztajn, E. Oyallon, and G. Varoquaux. Why do tree-based models still outperform deep learning on typ- ical tabular data?Advances in Neural Information Pro- cessing Systems, 35:507–520, 2022

  17. [17]

    Guyon, J

    I. Guyon, J. Weston, S. Barnhill, and V . Vapnik. Gene selection for cancer classification using support vector machines.Machine Learning, 46(1):389–422, 2002

  18. [18]

    Y . Hu, Z. Li, Z. Chen, Q. Huang, Z. Fu, M. Xu, and L. Nie. Refine: Composed video retrieval via shared and differential semantics enhancement.ACM Transac- tions on Multimedia Computing, Communications and Applications, 2026

  19. [19]

    G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Y . Liu. LightGBM: A highly efficient gra- dient boosting decision tree. InAdvances in Neural In- formation Processing Systems, volume 30, pages 3146– 3154, 2017

  20. [20]

    Kraskov, H

    A. Kraskov, H. St ¨ogbauer, and P. Grassberger. Es- timating mutual information.Physical Review E, 69(6):066138, 2004

  21. [21]

    J. H. Lee, D. Kim, H. J. Kim, C. H. Lee, J. I. Yang, W. Kim, Y . J. Kim, J. H. Yoon, S. H. Cho, M. W. Sung, et al. Hepatic steatosis index: a simple screening tool reflecting nonalcoholic fatty liver disease.Digestive and Liver Disease, 42(7):503–508, 2010

  22. [22]

    B. Li, H. Dong, D. Zhang, Z. Zhao, J. Gao, and X. Li. Exploring efficient open-vocabulary segmentation in the remote sensing.arXiv preprint arXiv:2509.12040, 2025

  23. [23]

    B. Li, T. Huo, D. Zhang, Z. Zhao, J. Gao, and X. Li. Exploring the underwater world segmentation without extra training.arXiv preprint arXiv:2511.07923, 2025

  24. [24]

    B. Li, F. Wang, D. Zhang, Z. Zhao, J. Gao, and X. Li. Maris: Marine open-vocabulary instance segmentation with geometric enhancement and semantic alignment. arXiv preprint arXiv:2510.15398, 2025

  25. [25]

    B. Li, D. Zhang, Z. Zhao, J. Gao, and X. Li. Stitch- fusion: Weaving any visual modalities to enhance mul- timodal semantic segmentation. InProceedings of the ACM International Conference on Multimedia, pages 1308–1317, 2025

  26. [26]

    B. Li, D. Zhang, Z. Zhao, J. Gao, and X. Li. U3m: Unbi- ased multiscale modal fusion model for multimodal se- mantic segmentation.Pattern Recognition, 168:111801, 2025

  27. [27]

    Z. Li, Z. Chen, H. Wen, Z. Fu, Y . Hu, and W. Guan. Encoder: Entity mining and modification relation bind- ing for composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 5101–5109, 2025

  28. [28]

    Retrack: Evidence-driven dual-stream directional anchor calibra- tion network for composed video retrieval

    Zixu Li, Yupeng Hu, Zhiwei Chen, Qinlei Huang, Guozhi Qiu, Zhiheng Fu, and Meng Liu. Retrack: Evidence-driven dual-stream directional anchor calibra- tion network for composed video retrieval. InProceed- ings of the AAAI Conference on Artificial Intelligence, 2026

  29. [29]

    Conesep: Cone-based ro- bust noise-unlearning compositional network for com- posed image retrieval

    Zixu Li, Yupeng Hu, Zhiwei Chen, Mingyu Zhang, Zhi- heng Fu, and Liqiang Nie. Conesep: Cone-based ro- bust noise-unlearning compositional network for com- posed image retrieval. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2026

  30. [30]

    Habit: Chrono-synergia robust progressive learning framework for composed image retrieval

    Zixu Li, Yupeng Hu, Zhiwei Chen, Shiqi Zhang, Qin- lei Huang, Zhiheng Fu, and Yinwei Wei. Habit: Chrono-synergia robust progressive learning framework for composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, 2026. 11

  31. [31]

    C. Lu, A. Lemay, K. C. Chang, C. H ¨obel, and P. Gol- land. Fair conformal predictors for applications in med- ical imaging. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 12008–12016, 2022

  32. [32]

    H. Ma, C. Xu, Z. Shen, C. Yu, and Y . Li. Application of machine learning techniques for clinical predictive mod- eling: a cross-sectional study on nonalcoholic fatty liver disease in china.BioMed Research International, pages 1–8, 2021

  33. [33]

    Maximos, F

    M. Maximos, F. Bril, P. Portillo Sanchez, R. Lomonaco, B. Orsak, D. Biernacki, A. Suman, M. Weber, and K. Cusi. The role of liver fat and insulin resistance as determinants of plasma aminotransferase elevation in nonalcoholic fatty liver disease.Hepatology, 61(1):153– 160, 2015

  34. [34]

    Meinshausen and P

    N. Meinshausen and P. B ¨uhlmann. Stability selec- tion.Journal of the Royal Statistical Society: Series B, 72(4):417–473, 2010

  35. [35]

    Papadopoulos, K

    H. Papadopoulos, K. Proedrou, V . V ovk, and A. Gam- merman. Inductive confidence machines for regression. InEuropean Conference on Machine Learning, pages 345–356, 2002

  36. [36]

    J. Platt. Probabilistic outputs for support vector ma- chines and comparisons to regularized likelihood meth- ods.Advances in Large Margin Classifiers, pages 61– 74, 1999

  37. [37]

    Reasoning in computer vision: Taxonomy, models, tasks, and methodologies.arXiv preprint arXiv:2508.10523, 2025

    Ayushman Sarkar, Mohd Yamani Idna Idris, and Zhenyu Yu. Reasoning in computer vision: Taxonomy, models, tasks, and methodologies.arXiv preprint arXiv:2508.10523, 2025

  38. [38]

    Shafer and V

    G. Shafer and V . V ovk. A tutorial on conformal pre- diction.Journal of Machine Learning Research, 9:371– 421, 2008

  39. [39]

    R. D. Shah and R. J. Samworth. Variable selection with error control: another look at stability selection.Journal of the Royal Statistical Society: Series B, 75(1):55–80, 2013

  40. [40]

    Shwartz-Ziv and A

    R. Shwartz-Ziv and A. Armon. Tabular data: deep learn- ing is not all you need.Information Fusion, 81:84–90, 2022

  41. [41]

    K. Song, Y . Zhu, and Q. Liu. Deep learning meth- ods for hepatological disease prediction from electronic health records.Computer Methods and Programs in Biomedicine, 215:106608, 2022

  42. [42]

    J. P. Sowa, S. Atmaca, R. K. Gieseler, and A. Canbay. A deep learning approach for detection of non-alcoholic fatty liver disease.Journal of Hepatology, 74:S165– S166, 2021

  43. [43]

    V ovk, A

    V . V ovk, A. Gammerman, and G. Shafer. Algorithmic learning in a random world. 2005

  44. [44]

    M. Xia, H. Bian, and X. Gao. Nafld-related risk pre- diction models using machine learning: a systematic re- view.Metabolism, 106:154243, 2021

  45. [45]

    S. J. Yu, W. Kim, D. Kim, H. S. Yoon, and J. Lee. Visceral obesity predicts significant fibrosis in pa- tients with nonalcoholic fatty liver disease.Medicine, 94(48):e2159, 2015

  46. [46]

    From physics to foundation models: A review of ai-driven quan- titative remote sensing inversion.arXiv preprint arXiv:2507.09081, 2025

    Zhenyu Yu, Mohd Yamani Idna Idris, Hua Wang, Pei Wang, Junyi Chen, and Kun Wang. From physics to foundation models: A review of ai-driven quan- titative remote sensing inversion.arXiv preprint arXiv:2507.09081, 2025

  47. [47]

    Dinov3-powered multi-task founda- tion model for quantitative remote sensing estimation (student abstract)

    Zhenyu Yu, Mohd Yamani Idna Idris, Pei Wang, and Rizwan Qureshi. Dinov3-powered multi-task founda- tion model for quantitative remote sensing estimation (student abstract). InProceedings of the AAAI Confer- ence on Artificial Intelligence, volume 40, pages 41455– 41456, 2026

  48. [48]

    Spatiotemporal alignment for remote sens- ing image recovery via terrain-aware diffusion

    Zhenyu Yu, Haoran Jiang, Pei Wang, Zizhen Lin, and Yong Xiang. Spatiotemporal alignment for remote sens- ing image recovery via terrain-aware diffusion. In ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 11257–11261. IEEE, 2026

  49. [49]

    Qrs-trs: Style transfer-based image- to-image translation for carbon stock estimation in quan- titative remote sensing.IEEE Access, 2025

    Zhenyu Yu, Jinnian Wang, Hanqing Chen, and Mohd Yamani Idna Idris. Qrs-trs: Style transfer-based image- to-image translation for carbon stock estimation in quan- titative remote sensing.IEEE Access, 2025. 12