Conformal Risk Prediction for Non-Alcoholic Fatty Liver Disease Using Gradient Boosting with Distribution-Free Coverages
Pith reviewed 2026-06-28 17:34 UTC · model grok-4.3
The pith
Gradient boosting with conformal prediction yields NAFLD risk sets that meet distribution-free coverage guarantees.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Coupling gradient-boosted trees with mutual-information stability selection and conformal prediction produces prediction sets for NAFLD risk whose marginal coverage exceeds any user-specified level, while the underlying scores achieve AUROC 0.912 internally and 0.891 externally and support a risk stratification in which the high-risk tier shows 4.7 times the 12-month progression rate of the low-risk tier.
What carries the argument
Conformal prediction sets constructed around gradient-boosting outputs, which enforce distribution-free marginal coverage on the individual risk estimates.
If this is right
- The conformal sets achieve 91.3% empirical coverage at the 90% nominal level.
- The high-risk subgroup shows a 12-month progression rate 4.7 times that of the low-risk tier.
- The retained features (waist circumference, ALT, GGT, triglycerides, fasting glucose, BMI) match established metabolic risk factors.
- The framework outperforms deep neural networks, TabNet, support vector machines, and logistic regression on both internal and external validation.
Where Pith is reading between the lines
- The same pipeline could be applied to other metabolic or hepatic conditions without altering the coverage guarantees, provided exchangeability holds.
- The compact feature subset may enable low-cost screening programs that avoid full laboratory panels.
- External validation on populations with different demographics would test whether the coverage property transfers beyond the Chinese cohorts studied.
Load-bearing premise
The observations satisfy the exchangeability condition required for the conformal procedure to deliver valid marginal coverage guarantees.
What would settle it
An independent cohort in which the fraction of true outcomes contained inside the conformal prediction sets falls below the nominal coverage level.
Figures
read the original abstract
Non-alcoholic fatty liver disease (NAFLD) affects roughly 25% of global adults, posing substantial hepatic and cardiovascular risks. Yet, population-level screening tools remain inadequate. We present Method, a machine-learning framework for NAFLD risk prediction coupling gradient-boosted decision trees with conformal prediction to yield calibrated, distribution-free coverage guarantees on individual risk estimates. It integrates a mutual-information-based stability selection procedure to identify a compact, clinically interpretable feature subset via bootstrap resampling, constructing prediction sets whose marginal coverage provably exceeds a user-specified confidence level. We evaluated Method on a multicenter cohort from Guangzhou, China (primary n=2,187; external validation n=412) using 78 candidate features across demographics, metabolic biomarkers, and lifestyle factors. Method achieves an AUROC of 0.912 internally and 0.891 externally, outperforming deep neural networks, TabNet, support vector machines, and logistic regression. Conformal prediction sets achieve 91.3% empirical coverage at the 90% nominal level. A three-tier risk stratification derived from these scores separates the population into distinct groups, with the high-risk subgroup showing a 12-month progression rate 4.7 times that of the low-risk tier. The selected features -- notably waist circumference, ALT, GGT, triglycerides, fasting glucose, and BMI -- align with established metabolic risk factors, providing biological plausibility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a framework combining gradient-boosted decision trees with conformal prediction and mutual-information stability selection to produce NAFLD risk predictions with distribution-free marginal coverage guarantees. On a multicenter Guangzhou cohort (n=2187 primary, n=412 external) using 78 features, it reports AUROC 0.912 (internal) and 0.891 (external), 91.3% empirical coverage at the 90% nominal level, and a three-tier stratification in which the high-risk group exhibits 4.7-fold higher 12-month progression than the low-risk group, outperforming DNNs, TabNet, SVMs, and logistic regression.
Significance. If the coverage guarantees are valid, the work supplies a practically useful, interpretable risk-stratification tool for a high-prevalence condition, with external validation and explicit comparison to strong baselines. The conformal component supplies a form of calibrated uncertainty that is uncommon in clinical ML and could support more reliable screening decisions.
major comments (1)
- [Methods (conformal prediction procedure) and Results (coverage evaluation)] The central claim of distribution-free marginal coverage (provably exceeding the nominal level) rests on the exchangeability assumption required by conformal prediction. The multicenter design and 12-month progression labels introduce plausible site-specific and temporal dependence; no exchangeability diagnostics, randomization tests, or sensitivity analysis appear in the reported methods or results. This assumption is load-bearing for the guarantee, while the reported empirical coverage alone does not verify it.
minor comments (1)
- [Abstract] The abstract refers to the method as 'Method'; a concrete name would aid readability and citation.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights an important assumption underlying our conformal prediction approach. We address the major comment below and commit to revisions that strengthen the manuscript.
read point-by-point responses
-
Referee: [Methods (conformal prediction procedure) and Results (coverage evaluation)] The central claim of distribution-free marginal coverage (provably exceeding the nominal level) rests on the exchangeability assumption required by conformal prediction. The multicenter design and 12-month progression labels introduce plausible site-specific and temporal dependence; no exchangeability diagnostics, randomization tests, or sensitivity analysis appear in the reported methods or results. This assumption is load-bearing for the guarantee, while the reported empirical coverage alone does not verify it.
Authors: We agree that the exchangeability assumption is fundamental to the distribution-free marginal coverage guarantee of conformal prediction, and that the multicenter structure together with the 12-month follow-up labels could introduce site-specific or temporal dependencies that merit explicit examination. The external validation cohort (n=412) drawn from a separate center already provides some empirical evidence of transportability, yet this does not substitute for direct diagnostics. In the revised manuscript we will add: (i) a methods subsection describing exchangeability checks (e.g., Kolmogorov-Smirnov tests on covariate distributions across the two centers and a permutation test that randomly reassigns site labels while recomputing coverage); (ii) a sensitivity analysis that reports coverage under leave-one-center-out and time-stratified splits; and (iii) explicit language in both the methods and discussion stating that the theoretical guarantee holds conditionally on exchangeability and discussing the practical implications of potential mild violations. These additions will be placed in the Methods and Results sections without altering the reported empirical coverage or performance metrics. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper applies off-the-shelf conformal prediction to gradient-boosted outputs and reports empirical coverage (91.3 % at 90 % nominal) as a post-hoc check rather than a derived quantity. The marginal-coverage guarantee is imported from standard conformal theory; the paper does not re-derive or redefine it via its own equations. Stability selection, feature ranking, and three-tier stratification are presented as downstream applications, none of which reduce by construction to fitted parameters or self-citations. No load-bearing self-citation chain or ansatz smuggling appears in the provided text. The derivation chain is therefore self-contained against external conformal results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Data points are exchangeable so that conformal prediction supplies valid marginal coverage
Reference graph
Works this paper leans on
-
[1]
A. N. Angelopoulos and S. Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification.arXiv preprint arXiv:2107.07511, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[2]
Angulo, J
P. Angulo, J. M. Hui, G. Marchesini, E. Bugianesi, J. George, G. C. Farrell, F. Enders, S. Saksena, A. D. Burt, J. P. Bida, et al. The nafld fibrosis score: a non- invasive system that identifies liver fibrosis in patients with nafld.Hepatology, 45(4):846–854, 2007
2007
-
[3]
S. O. Arik and T. Pfister. TabNet: Attentive interpretable tabular learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6679–6687, 2021
2021
-
[4]
R. F. Barber, E. J. Cand `es, A. Ramdas, and R. J. Tib- shirani. Conformal prediction beyond exchangeability. Annals of Statistics, 51(2):816–845, 2023
2023
-
[5]
Bedogni, S
G. Bedogni, S. Bellentani, L. Miglioli, F. Masutti, M. Passalacqua, A. Castiglione, and C. Tiribelli. The fatty liver index: a simple and accurate predictor of hep- atic steatosis in the general population.BMC Gastroen- terology, 6(1):1–7, 2006
2006
-
[6]
L. Breiman. Random forests.Machine Learning, 45(1):5–32, 2001
2001
-
[7]
E. J. Cand `es, L. Lei, and Z. Ren. Conformalized survival analysis.Journal of the Royal Statistical Society: Series B, 85(1):24–45, 2023
2023
-
[8]
Chalasani, Z
N. Chalasani, Z. Younossi, J. E. Lavine, M. Charlton, K. Cusi, M. Rinella, S. A. Harrison, E. M. Brunt, and A. J. Sanyal. The diagnosis and management of non- alcoholic fatty liver disease: practice guidance from 10 the american association for the study of liver diseases. Hepatology, 67(1):328–357, 2018
2018
-
[9]
Chen and C
T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining, pages 785–794, 2016
2016
-
[10]
Z. Chen, Y . Hu, Z. Li, Z. Fu, X. Song, and L. Nie. Off- set: Segmentation-based focus shift revision for com- posed image retrieval. InProceedings of the ACM Inter- national Conference on Multimedia (ACM MM), pages 6113–6122, 2025
2025
-
[11]
Z. Chen, Y . Hu, Z. Li, Z. Fu, H. Wen, and W. Guan. Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval. InProceedings of the ACM International Conference on Multimedia (ACM MM), pages 6143–6152, 2025
2025
-
[12]
Intent: Invari- ance and discrimination-aware noise mitigation for ro- bust composed image retrieval
Zhiwei Chen, Yupeng Hu, Zhiheng Fu, Zixu Li, Jiale Huang, Qinlei Huang, and Yinwei Wei. Intent: Invari- ance and discrimination-aware noise mitigation for ro- bust composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, 2026
2026
-
[13]
T. M. Cover and J. A. Thomas.Elements of Information Theory. John Wiley & Sons, 2nd edition, 2006
2006
-
[14]
F. Fleuret. Fast binary feature selection with conditional mutual information.Journal of Machine Learning Re- search, 5:1531–1555, 2004
2004
-
[15]
Air-know: Arbiter-calibrated knowledge-internalizing robust network for composed image retrieval
Zhiheng Fu, Yupeng Hu, Qianyun Yang, Shiqi Zhang, Zhiwei Chen, and Zixu Li. Air-know: Arbiter-calibrated knowledge-internalizing robust network for composed image retrieval. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), 2026
2026
-
[16]
Grinsztajn, E
L. Grinsztajn, E. Oyallon, and G. Varoquaux. Why do tree-based models still outperform deep learning on typ- ical tabular data?Advances in Neural Information Pro- cessing Systems, 35:507–520, 2022
2022
-
[17]
Guyon, J
I. Guyon, J. Weston, S. Barnhill, and V . Vapnik. Gene selection for cancer classification using support vector machines.Machine Learning, 46(1):389–422, 2002
2002
-
[18]
Y . Hu, Z. Li, Z. Chen, Q. Huang, Z. Fu, M. Xu, and L. Nie. Refine: Composed video retrieval via shared and differential semantics enhancement.ACM Transac- tions on Multimedia Computing, Communications and Applications, 2026
2026
-
[19]
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Y . Liu. LightGBM: A highly efficient gra- dient boosting decision tree. InAdvances in Neural In- formation Processing Systems, volume 30, pages 3146– 3154, 2017
2017
-
[20]
Kraskov, H
A. Kraskov, H. St ¨ogbauer, and P. Grassberger. Es- timating mutual information.Physical Review E, 69(6):066138, 2004
2004
-
[21]
J. H. Lee, D. Kim, H. J. Kim, C. H. Lee, J. I. Yang, W. Kim, Y . J. Kim, J. H. Yoon, S. H. Cho, M. W. Sung, et al. Hepatic steatosis index: a simple screening tool reflecting nonalcoholic fatty liver disease.Digestive and Liver Disease, 42(7):503–508, 2010
2010
- [22]
- [23]
- [24]
-
[25]
B. Li, D. Zhang, Z. Zhao, J. Gao, and X. Li. Stitch- fusion: Weaving any visual modalities to enhance mul- timodal semantic segmentation. InProceedings of the ACM International Conference on Multimedia, pages 1308–1317, 2025
2025
-
[26]
B. Li, D. Zhang, Z. Zhao, J. Gao, and X. Li. U3m: Unbi- ased multiscale modal fusion model for multimodal se- mantic segmentation.Pattern Recognition, 168:111801, 2025
2025
-
[27]
Z. Li, Z. Chen, H. Wen, Z. Fu, Y . Hu, and W. Guan. Encoder: Entity mining and modification relation bind- ing for composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 5101–5109, 2025
2025
-
[28]
Retrack: Evidence-driven dual-stream directional anchor calibra- tion network for composed video retrieval
Zixu Li, Yupeng Hu, Zhiwei Chen, Qinlei Huang, Guozhi Qiu, Zhiheng Fu, and Meng Liu. Retrack: Evidence-driven dual-stream directional anchor calibra- tion network for composed video retrieval. InProceed- ings of the AAAI Conference on Artificial Intelligence, 2026
2026
-
[29]
Conesep: Cone-based ro- bust noise-unlearning compositional network for com- posed image retrieval
Zixu Li, Yupeng Hu, Zhiwei Chen, Mingyu Zhang, Zhi- heng Fu, and Liqiang Nie. Conesep: Cone-based ro- bust noise-unlearning compositional network for com- posed image retrieval. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2026
2026
-
[30]
Habit: Chrono-synergia robust progressive learning framework for composed image retrieval
Zixu Li, Yupeng Hu, Zhiwei Chen, Shiqi Zhang, Qin- lei Huang, Zhiheng Fu, and Yinwei Wei. Habit: Chrono-synergia robust progressive learning framework for composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, 2026. 11
2026
-
[31]
C. Lu, A. Lemay, K. C. Chang, C. H ¨obel, and P. Gol- land. Fair conformal predictors for applications in med- ical imaging. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 12008–12016, 2022
2022
-
[32]
H. Ma, C. Xu, Z. Shen, C. Yu, and Y . Li. Application of machine learning techniques for clinical predictive mod- eling: a cross-sectional study on nonalcoholic fatty liver disease in china.BioMed Research International, pages 1–8, 2021
2021
-
[33]
Maximos, F
M. Maximos, F. Bril, P. Portillo Sanchez, R. Lomonaco, B. Orsak, D. Biernacki, A. Suman, M. Weber, and K. Cusi. The role of liver fat and insulin resistance as determinants of plasma aminotransferase elevation in nonalcoholic fatty liver disease.Hepatology, 61(1):153– 160, 2015
2015
-
[34]
Meinshausen and P
N. Meinshausen and P. B ¨uhlmann. Stability selec- tion.Journal of the Royal Statistical Society: Series B, 72(4):417–473, 2010
2010
-
[35]
Papadopoulos, K
H. Papadopoulos, K. Proedrou, V . V ovk, and A. Gam- merman. Inductive confidence machines for regression. InEuropean Conference on Machine Learning, pages 345–356, 2002
2002
-
[36]
J. Platt. Probabilistic outputs for support vector ma- chines and comparisons to regularized likelihood meth- ods.Advances in Large Margin Classifiers, pages 61– 74, 1999
1999
-
[37]
Ayushman Sarkar, Mohd Yamani Idna Idris, and Zhenyu Yu. Reasoning in computer vision: Taxonomy, models, tasks, and methodologies.arXiv preprint arXiv:2508.10523, 2025
-
[38]
Shafer and V
G. Shafer and V . V ovk. A tutorial on conformal pre- diction.Journal of Machine Learning Research, 9:371– 421, 2008
2008
-
[39]
R. D. Shah and R. J. Samworth. Variable selection with error control: another look at stability selection.Journal of the Royal Statistical Society: Series B, 75(1):55–80, 2013
2013
-
[40]
Shwartz-Ziv and A
R. Shwartz-Ziv and A. Armon. Tabular data: deep learn- ing is not all you need.Information Fusion, 81:84–90, 2022
2022
-
[41]
K. Song, Y . Zhu, and Q. Liu. Deep learning meth- ods for hepatological disease prediction from electronic health records.Computer Methods and Programs in Biomedicine, 215:106608, 2022
2022
-
[42]
J. P. Sowa, S. Atmaca, R. K. Gieseler, and A. Canbay. A deep learning approach for detection of non-alcoholic fatty liver disease.Journal of Hepatology, 74:S165– S166, 2021
2021
-
[43]
V ovk, A
V . V ovk, A. Gammerman, and G. Shafer. Algorithmic learning in a random world. 2005
2005
-
[44]
M. Xia, H. Bian, and X. Gao. Nafld-related risk pre- diction models using machine learning: a systematic re- view.Metabolism, 106:154243, 2021
2021
-
[45]
S. J. Yu, W. Kim, D. Kim, H. S. Yoon, and J. Lee. Visceral obesity predicts significant fibrosis in pa- tients with nonalcoholic fatty liver disease.Medicine, 94(48):e2159, 2015
2015
-
[46]
Zhenyu Yu, Mohd Yamani Idna Idris, Hua Wang, Pei Wang, Junyi Chen, and Kun Wang. From physics to foundation models: A review of ai-driven quan- titative remote sensing inversion.arXiv preprint arXiv:2507.09081, 2025
-
[47]
Dinov3-powered multi-task founda- tion model for quantitative remote sensing estimation (student abstract)
Zhenyu Yu, Mohd Yamani Idna Idris, Pei Wang, and Rizwan Qureshi. Dinov3-powered multi-task founda- tion model for quantitative remote sensing estimation (student abstract). InProceedings of the AAAI Confer- ence on Artificial Intelligence, volume 40, pages 41455– 41456, 2026
2026
-
[48]
Spatiotemporal alignment for remote sens- ing image recovery via terrain-aware diffusion
Zhenyu Yu, Haoran Jiang, Pei Wang, Zizhen Lin, and Yong Xiang. Spatiotemporal alignment for remote sens- ing image recovery via terrain-aware diffusion. In ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 11257–11261. IEEE, 2026
2026
-
[49]
Qrs-trs: Style transfer-based image- to-image translation for carbon stock estimation in quan- titative remote sensing.IEEE Access, 2025
Zhenyu Yu, Jinnian Wang, Hanqing Chen, and Mohd Yamani Idna Idris. Qrs-trs: Style transfer-based image- to-image translation for carbon stock estimation in quan- titative remote sensing.IEEE Access, 2025. 12
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.