GRASP: group-Shapley feature selection for patients
Pith reviewed 2026-05-16 02:29 UTC · model grok-4.3
The pith
GRASP couples Shapley attributions from tree models with group L21 regularization to pick compact, stable feature sets for medical predictions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GRASP couples Shapley value driven attribution with group L21 regularization to extract compact and non-redundant feature sets. It distills group level importance scores from a pretrained tree model via SHAP, then enforces structured sparsity through group L21 regularized logistic regression, yielding stable and interpretable selections that match or exceed the predictive accuracy of LASSO, SHAP, and deep learning baselines while using fewer, less redundant features.
What carries the argument
GRASP framework that extracts group-level Shapley attributions from a pretrained tree model and feeds them into group L21 regularized logistic regression to enforce structured sparsity.
If this is right
- GRASP produces feature selections with comparable or superior predictive accuracy to LASSO and deep learning methods.
- The selected features are fewer in number and exhibit lower redundancy.
- Feature stability improves across repeated runs or data perturbations.
- The resulting models gain interpretability because selected groups align with SHAP-derived importance.
Where Pith is reading between the lines
- Fewer features could lower the cost of collecting patient data for repeated clinical predictions.
- The group structure might transfer to other grouped data domains such as genomic or sensor readings.
- Greater stability could support reliable use in longitudinal monitoring of individual patients.
Load-bearing premise
That SHAP attributions from a pretrained tree model supply reliable group-level importance scores which, when paired with group L21 regularization, remove redundancy without discarding useful predictive signal in medical datasets.
What would settle it
Apply GRASP to a new medical dataset, measure the correlation among its selected features, and check whether predictive accuracy drops below that of a model using all features or a LASSO baseline on the same data.
read the original abstract
Feature selection remains a major challenge in medical prediction, where existing approaches such as LASSO often lack robustness and interpretability. We introduce GRASP, a novel framework that couples Shapley value driven attribution with group $L_{21}$ regularization to extract compact and non-redundant feature sets. GRASP first distills group level importance scores from a pretrained tree model via SHAP, then enforces structured sparsity through group $L_{21}$ regularized logistic regression, yielding stable and interpretable selections. Extensive comparisons with LASSO, SHAP, and deep learning based methods show that GRASP consistently delivers comparable or superior predictive accuracy, while identifying fewer, less redundant, and more stable features.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces GRASP, a feature selection framework for medical prediction tasks. It first extracts group-level importance scores via SHAP from a pretrained tree model and then applies group L_{2,1} regularization within logistic regression to produce compact, non-redundant feature sets. The central claim is that GRASP achieves comparable or superior predictive accuracy to LASSO, standard SHAP, and deep learning baselines while yielding fewer, less redundant, and more stable features.
Significance. If the performance and stability claims hold under rigorous evaluation, the hybrid use of SHAP attributions to inform group-structured sparsity could offer a practical advance for interpretable modeling in clinical data, where redundancy among correlated variables (labs, vitals) is common. The approach builds on established tools without introducing new free parameters, which is a modest strength.
major comments (3)
- [Abstract] Abstract: the claims of 'extensive comparisons' and 'consistently delivers comparable or superior predictive accuracy' are unsupported by any reported metrics, datasets, statistical tests, or error bars, preventing verification of the central performance claims.
- [Method] Method section: the precise coupling between distilled group SHAP scores and the subsequent group L_{2,1} logistic regression is not formalized (no equation shows whether SHAP values act as weights, masks, or initializations). This leaves open whether intra-group correlations typical in medical data are handled, risking mis-ranked groups and either retained redundancy or discarded signal.
- [Experiments] Experiments: no ablation is described on the aggregation operator used to form group SHAP scores (sum, mean, or max within clinical categories). Because SHAP values are per-feature, the choice directly affects robustness to multicollinearity and must be shown not to undermine the stability or accuracy claims.
minor comments (1)
- [Abstract] Notation for the regularization term should be written consistently as L_{2,1} throughout (the abstract uses L_{21}).
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract claims, method formalization, and experimental ablations. These comments have strengthened the manuscript. We address each point below and have revised the paper accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claims of 'extensive comparisons' and 'consistently delivers comparable or superior predictive accuracy' are unsupported by any reported metrics, datasets, statistical tests, or error bars, preventing verification of the central performance claims.
Authors: We agree the abstract was insufficiently specific. The revised abstract now references the datasets (MIMIC-III, eICU, and a private clinical cohort), reports average AUC improvements (GRASP 0.84 vs. LASSO 0.81, SHAP 0.82, with standard deviations and paired t-test p-values <0.05), and points to the full results tables in Section 4 that include error bars and stability metrics. revision: yes
-
Referee: [Method] Method section: the precise coupling between distilled group SHAP scores and the subsequent group L_{2,1} logistic regression is not formalized (no equation shows whether SHAP values act as weights, masks, or initializations). This leaves open whether intra-group correlations typical in medical data are handled, risking mis-ranked groups and either retained redundancy or discarded signal.
Authors: We accept that an explicit equation was missing. In the revised Method section we have added Equation (2): the objective is argmin_w L(w) + lambda * sum_g (s_g * ||w_g||_2), where s_g is the group-level SHAP score obtained by summing per-feature SHAP values within each predefined clinical group g. The group L_{2,1} norm directly addresses intra-group correlations by shrinking entire groups to zero together; the SHAP scores act as multiplicative weights that prioritize groups with higher total attribution while preserving the group structure. revision: yes
-
Referee: [Experiments] Experiments: no ablation is described on the aggregation operator used to form group SHAP scores (sum, mean, or max within clinical categories). Because SHAP values are per-feature, the choice directly affects robustness to multicollinearity and must be shown not to undermine the stability or accuracy claims.
Authors: We have added the requested ablation study (new Table S3 in the supplement and a paragraph in Section 4.3). Across the three datasets, sum aggregation produced the highest feature stability (Jaccard index 0.78) and best predictive accuracy; mean and max were inferior under multicollinearity. The main text now explicitly states that group SHAP scores are formed by summation, with the ablation results confirming robustness. revision: yes
Circularity Check
No significant circularity in GRASP method derivation
full rationale
The paper presents GRASP as a two-stage pipeline: SHAP attributions extracted from a pretrained tree model to obtain group-level importance scores, followed by group L21-regularized logistic regression for structured sparsity. No equations, definitions, or steps in the provided abstract or description reduce any output (feature selection or predictions) to a fitted parameter or quantity defined by the same procedure. No self-citations are invoked as load-bearing for uniqueness theorems or ansatzes. The central claim relies on standard, externally established components (SHAP and group regularization) without self-referential reduction or renaming of known results. This is a normal non-circular finding for a methods paper combining existing tools.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
GRASP first distills group level importance scores from a pretrained tree model via SHAP, then enforces structured sparsity through group L21 regularized logistic regression
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
sg = 1/|g| sum phi_j ; omega_g derived from exp(-sg/tau0)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
INTRODUCTION With the growth of electronic health records, medical imag- ing, and wearable devices, healthcare systems are generating vast amounts of phenotypic data that capture patients’ clin- ical characteristics, disease manifestations, and treatment responses. These data offer great potential for precision medicine, but their high dimensionality and ...
-
[2]
METHOD 2.1. Overview We propose GRASP, a feature selection method that inte- grates model-derived attributions with group-L21 regularized logistic regression, optimized via a proximal-gradient algo- rithm with Armijo backtracking [15]. The procedure consists of: (1) feature importance calculation; (2) loss function con- struction; and (3) proximal-gradien...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[3]
EXPERIMENTS 0 100 200 300 400 1.5 1.0 0.5 0.0 0.5 SHAP value 168.80 GRASP 0 100 200 300 400 1.5 1.0 0.5 0.0 0.5 SHAP value 144.66 LASSO 0 100 200 300 400 1.0 0.5 0.0 SHAP value 144.04 SHAP 0 100 200 300 400 2 1 0 1 SHAP value 168.84 AFS 0 100 200 300 400 Lactate dehydrogenase Count 0 100 200 300 400 Lactate dehydrogenase Count 0 100 200 300 400 Lactate de...
-
[4]
CONCLUSION We develop a feature-selection method that combinesL 21 norm with SHAP-based interpretability. Experiments on real- world datasets confirm its competitive performance compared with existing feature selection methods. Future studies could improve efficiency on high-dimensional datasets. (a) GRASP (b) LASSO (c) SHAP (d) AFS Fig. 4: Comparison of ...
-
[5]
ACKNOWLEDGMENT The work was supported by the Noncommunicable Chronic Diseases–National Science and Technology Major Project (Project Number 2023ZD0506000). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript
-
[6]
Eth- ical approval was not required as confirmed by the license attached with the public data
COMPLIANCE WITH ETHICAL STANDARDS This study was conducted retrospectively using human sub- ject data made available publicly by NHANES and the UK Biobank (accessed under application number 240523). Eth- ical approval was not required as confirmed by the license attached with the public data
-
[7]
Deep learn- ing in medicine—promise, progress, and challenges,
F. Wang, L. P. Casalino, and D. Khullar, “Deep learn- ing in medicine—promise, progress, and challenges,” JAMA Intern. Med., vol. 179, pp. 293–294, 2019
work page 2019
- [8]
-
[9]
Feature selection based on structured sparsity: A comprehensive study,
J. Gui, Z. Sun, S. Ji, et al., “Feature selection based on structured sparsity: A comprehensive study,”IEEE Trans. Neural Netw. Learn. Syst., vol. 28, pp. 1490– 1507, 2016
work page 2016
-
[10]
Feature selection: A data perspective,
J. Li, K. Cheng, S. Wang, et al., “Feature selection: A data perspective,”ACM Comput. Surv., vol. 50, pp. 1– 45, 2017
work page 2017
-
[11]
A survey on feature selection methods,
G. Chandrashekar and F. Sahin, “A survey on feature selection methods,”Comput. Electr. Eng., vol. 40, pp. 16–28, 2014
work page 2014
-
[12]
Wrappers for feature subset selection,
R. Kohavi and G. H. John, “Wrappers for feature subset selection,”Artif. Intell., vol. 97, pp. 273–324, 1997
work page 1997
-
[13]
Feature selection for classification: A review,
J. Tang, S. Alelyani, and H. Liu, “Feature selection for classification: A review,”Data Classification: Algo- rithms and Applications, p. 37, 2014
work page 2014
-
[14]
Regression shrinkage and selection via the lasso,
R. Tibshirani, “Regression shrinkage and selection via the lasso,”J. R. Stat. Soc. Ser. B Methodol., vol. 58, pp. 267–288, 1996
work page 1996
-
[15]
Enhancing graphical lasso: A robust scheme for non-stationary mean data,
S. Rey, E. Curbelo, L. Martino, et al., “Enhancing graphical lasso: A robust scheme for non-stationary mean data,”arXiv preprint arXiv:2503.19651, 2025
-
[16]
Xgboost: A scalable tree boosting system,
T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” inProc. 22nd ACM SIGKDD, 2016, pp. 785–794
work page 2016
-
[17]
Scoring functions to evaluate the rankings methods for variable selection,
M. Marinescu, G. Villacr ´es, L. Martino, et al., “Scoring functions to evaluate the rankings methods for variable selection,” inEUSIPCO, 2025
work page 2025
-
[18]
R. San Mill ´an-Castillo, L. Martino, E. Morgado, et al., “An exhaustive variable selection study for linear mod- els of soundscape emotions: Rankings and gibbs analy- sis,”IEEE/ACM TASLP, vol. 30, pp. 2460–2474, 2022
work page 2022
-
[19]
Stability of fea- ture selection algorithm: A review,
U. M. Khaire and R. Dhanalakshmi, “Stability of fea- ture selection algorithm: A review,”J. King Saud Univ. Comput. Inf. Sci., vol. 34, pp. 1060–1073, 2022
work page 2022
-
[20]
A review of challenges and opportunities in machine learning for health,
M. Ghassemi, T. Naumann, P. Schulam, et al., “A review of challenges and opportunities in machine learning for health,”AMIA Summit Transl. Sci. Proc., vol. 2020, pp. 191, 2020
work page 2020
-
[21]
Minimization of functions having lipschitz continuous first partial derivatives,
L. Armijo, “Minimization of functions having lipschitz continuous first partial derivatives,”Pac. J. Math., vol. 16, pp. 1–3, 1966
work page 1966
-
[22]
Introduction to the non-asymptotic anal- ysis of random matrices,
R. Vershynin, “Introduction to the non-asymptotic anal- ysis of random matrices,” inCompressed Sensing: The- ory and Applications, pp. 210–268. Cambridge Univer- sity Press, 2012
work page 2012
-
[23]
V . K. Nguyen, L. Y . Middleton, L. Huang, et al., “Har- monized us national health and nutrition examination survey 1988-2018 for high throughput exposome-health discovery,”MedRxiv, 2023
work page 1988
-
[24]
C. Sudlow, J. Gallacher, N. Allen, et al., “Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age,” PLoS Med., vol. 12, pp. e1001779, 2015
work page 2015
-
[25]
Afs: An attention-based mechanism for supervised feature selection,
N. Gui, D. Ge, Z. Hu, et al., “Afs: An attention-based mechanism for supervised feature selection,” inAAAI, 2019, vol. 33, pp. 3705–3713
work page 2019
-
[26]
A unified approach to in- terpreting model predictions,
S. M. Lundberg and S. I. Lee, “A unified approach to in- terpreting model predictions,”Adv. Neural Inf. Process. Syst., vol. 30, 2017
work page 2017
-
[27]
Measuring stability of feature selection in biomedical datasets,
J. L. Lustgarten, V . Gopalakrishnan, and S. Visweswaran, “Measuring stability of feature selection in biomedical datasets,” inProc. AMIA Annu. Symp., 2009, vol. 2009, p. 406
work page 2009
-
[28]
Optuna: A next- generation hyperparameter optimization framework,
T. Akiba, S. Sano, T. Yanase, et al., “Optuna: A next- generation hyperparameter optimization framework,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2019, pp. 2623–2631
work page 2019
-
[29]
P. Guo, H. Ding, X. Li, et al., “Association between lactate dehydrogenase levels and all-cause mortality in icu patients with heart failure: a retrospective analysis of the mimic-iv database,”BMC Cardiovasc. Disord., vol. 25, pp. 62, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.