Learning Interpretable Point-Based Clinical Risk Scores via Direct Optimization

Albert M Li; Lu Tian; Tina Hernandez-Boussard; Vivek Charu; Yeon-Mi Hwang; Ying Cui

arxiv: 2605.19113 · v1 · pith:FEFTLNGDnew · submitted 2026-05-18 · 📊 stat.ME · cs.LG· stat.ML

Learning Interpretable Point-Based Clinical Risk Scores via Direct Optimization

Ying Cui , Albert M Li , Vivek Charu , Yeon-Mi Hwang , Tina Hernandez-Boussard , Lu Tian This is my paper

Pith reviewed 2026-05-20 07:18 UTC · model grok-4.3

classification 📊 stat.ME cs.LGstat.ML

keywords clinical risk scoresgreedy optimizationinteger weightsadditive scoringelectronic health recordscomorbidity indexinterpretable modelspost-discharge mortality

0 comments

The pith

New algorithms use greedy optimization to learn integer point weights for clinical risk scores by directly maximizing chosen objectives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces machine learning algorithms that learn additive clinical risk scores with nonnegative integer points for binary features by directly optimizing explicit value functions through a greedy strategy. This contrasts with fitting regression models then rounding coefficients or solving full integer programs that become expensive for nonconcave or discontinuous objectives. The method is demonstrated by building a comorbidity score to predict post-discharge mortality in a large Epic Cosmos electronic health record cohort, accompanied by simulation studies of its finite-sample behavior. A sympathetic reader would care because the resulting scores remain sparse, integer-based, and easy to deploy while potentially achieving better alignment with the chosen optimality criterion than rounding approaches.

Core claim

We develop new machine learning algorithms that employ a flexible greedy optimization strategy to learn such additive scoring directly under explicit and sensible optimality objectives, applying the approach to construct an integer-weighted comorbidity score for post-discharge mortality risk in a large EHR cohort and examining performance through simulation.

What carries the argument

A flexible greedy optimization strategy that iteratively selects integer weight assignments to maximize a user-specified value function without requiring full integer programming.

If this is right

The learned scores use only nonnegative integers and remain sparse, supporting direct clinical use without further rounding.
Direct optimization under the chosen objective avoids the suboptimality that arises from post-hoc rounding of regression coefficients.
The approach scales to large EHR datasets for constructing comorbidity scores predicting mortality.
Simulation studies characterize finite-sample behavior under controlled conditions.
The method accommodates value functions that are nonconcave or discontinuous without prohibitive computation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the greedy method remains effective on problems with hundreds of features, it could replace rounding pipelines for many existing risk-score development tasks.
The framework might transfer to other domains that need sparse integer-coefficient models, such as credit scoring or diagnostic checklists.
One could test whether the same greedy procedure improves upon rounding when the underlying regression is itself regularized for sparsity.

Load-bearing premise

The greedy strategy efficiently locates good or optimal integer weights even when the value function is nonconcave or discontinuous.

What would settle it

On a small instance where exhaustive enumeration of integer weights is feasible, compare the value achieved by the greedy score against the true maximum value; a large gap would falsify the claim of finding good solutions.

Figures

Figures reproduced from arXiv: 2605.19113 by Albert M Li, Lu Tian, Tina Hernandez-Boussard, Vivek Charu, Yeon-Mi Hwang, Ying Cui.

**Figure 2.** Figure 2: Average running time across 50 replicates for different cases. [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Learned scores for basic greedy score (left) and look-ahead score [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Observed event rate by score for basic greedy score (left) and look [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

read the original abstract

Many clinical risk scores are deployed as additive rules with nonnegative integer points assigned to relevant binary predictive features. These integer weights not only make the score easier to use in practice but also promote sparsity in the resulting prediction model. Such risk scores are often derived by first fitting a regression model and then rounding the estimated coefficients to the nearest integer after appropriate scaling. This approach is computationally fast but does not guarantee optimality of the resulting score. Alternatively, one may search over all possible integer weights to directly optimize a value function by posing the problem as an integer programming task. However, the associated computational burden can be substantial, especially when the value function is nonconcave or even discontinuous. In this paper, we develop new machine learning algorithms that employ a flexible greedy optimization strategy to learn such additive scoring directly under explicit and sensible optimality objectives. We apply the proposed method to a large electronic health record (EHR) cohort in Epic Cosmos to construct an integer-weighted comorbidity score for measuring the risk of post-discharge mortality. We also conduct a simulation study to examine the finite-sample operating characteristics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a greedy optimization route to directly learn integer clinical risk scores, a practical alternative to rounding but one whose reliability on hard objectives still needs more proof.

read the letter

The main contribution is a flexible greedy strategy that directly optimizes nonnegative integer weights for additive clinical scores under explicit objectives, instead of fitting a regression and rounding. They apply it to build a comorbidity score from a large Epic Cosmos EHR cohort for post-discharge mortality risk and include a simulation study on finite-sample behavior. This sits between the fast but potentially suboptimal rounding method and the exact but expensive integer programming formulation, especially when the value function is nonconcave or discontinuous. The real-data example is concrete and relevant to how scores are actually used in practice, and the simulation provides some operating characteristics that ground the claims. The framing of the problem and the motivation for direct optimization are clear and honest. The soft spot is the optimization performance itself. Without approximation guarantees or targeted benchmarks against exact IP solvers on instances where the objective is badly behaved, it is hard to know how often the greedy updates land in good local solutions versus missing better ones. The tested regimes may not isolate the hardest cases where the computational advantage claim would be most tested. This work is aimed at biostatisticians and clinical ML researchers who build or evaluate interpretable risk scores and care about both predictive performance and usability. A reader working on healthcare prediction tools would get concrete value from the method and the EHR application. It shows enough substance and engagement with the literature to deserve peer review, though referees would likely ask for stronger analysis of when the greedy approach succeeds or fails.

Referee Report

2 major / 2 minor

Summary. The paper develops a flexible greedy optimization algorithm to directly learn nonnegative integer weights for additive clinical risk scores by optimizing explicit value functions (rather than rounding regression coefficients or solving full integer programs). The approach is applied to construct a comorbidity score for post-discharge mortality risk in a large Epic Cosmos EHR cohort and is evaluated via a simulation study examining finite-sample behavior.

Significance. If the greedy strategy reliably recovers good or optimal weights for nonconcave or discontinuous objectives, the work would offer a practical, computationally lighter alternative to exact integer programming for producing sparse, interpretable clinical scores. The explicit-optimization framing and real-data application are strengths; however, the absence of approximation guarantees or targeted benchmarks against branch-and-bound IP on hard instances limits the strength of the computational-advantage claim.

major comments (2)

[§3] §3 (algorithm description): the flexible greedy strategy is presented as iterative coordinate-wise updates that avoid full integer programming, yet no curvature bound, submodularity assumption, or worst-case approximation guarantee is supplied for nonconcave or discontinuous value functions; this directly underpins the central claim that the method efficiently finds good solutions without the burden of exact IP.
[Simulation study section] Simulation study section: the reported operating characteristics do not isolate regimes with highly nonconcave or discontinuous objectives where exact IP remains tractable; without such targeted comparisons, the extrapolation that the greedy approach reliably matches or exceeds exact solutions in challenging cases remains unverified.

minor comments (2)

The abstract states that the method is applied to an EHR cohort but supplies no numerical performance metrics (e.g., AUC, calibration slope, or sparsity level) for the resulting score; adding these would improve immediate readability.
[§3] Notation for the value function and the greedy update rule could be clarified with a small worked numerical example early in §3 to make the coordinate-wise steps explicit.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major comment below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (algorithm description): the flexible greedy strategy is presented as iterative coordinate-wise updates that avoid full integer programming, yet no curvature bound, submodularity assumption, or worst-case approximation guarantee is supplied for nonconcave or discontinuous value functions; this directly underpins the central claim that the method efficiently finds good solutions without the burden of exact IP.

Authors: We agree that the manuscript does not supply theoretical curvature bounds, submodularity assumptions, or worst-case approximation guarantees for arbitrary nonconcave or discontinuous value functions. Such guarantees are difficult to obtain in general because the underlying combinatorial optimization problem is NP-hard for many clinically relevant objectives. In the revised manuscript we will expand Section 3 to include an explicit discussion of this limitation, clarify that the greedy procedure is presented as a practical heuristic rather than a theoretically guaranteed algorithm, and note the empirical evidence from the simulation study and real-data application that supports its utility for the targeted clinical-risk-score setting. revision: partial
Referee: [Simulation study section] Simulation study section: the reported operating characteristics do not isolate regimes with highly nonconcave or discontinuous objectives where exact IP remains tractable; without such targeted comparisons, the extrapolation that the greedy approach reliably matches or exceeds exact solutions in challenging cases remains unverified.

Authors: We accept this criticism and will strengthen the simulation study. In the revision we will add a new set of experiments that explicitly consider small-to-moderate problem sizes with highly nonconcave and discontinuous objective functions for which exact branch-and-bound integer programming remains computationally feasible. These additional results will report the gap between the greedy solutions and the true IP optima, thereby providing direct evidence on performance in the challenging regimes highlighted by the referee. revision: yes

Circularity Check

0 steps flagged

No circularity: direct optimization of explicit objectives is self-contained

full rationale

The paper frames its contribution as a new greedy algorithm that directly optimizes explicit value functions for nonnegative integer weights, avoiding both post-hoc rounding of regression coefficients and full integer programming. No derivation step reduces a claimed prediction or optimality result to a fitted parameter by construction, nor does any load-bearing claim rest on self-citation chains or imported uniqueness theorems. The method is presented as an independent algorithmic procedure whose correctness is evaluated via simulation and EHR application rather than by re-labeling inputs. This matches the default expectation of a non-circular paper whose central claims remain independent of its own fitted outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on standard assumptions that the chosen optimality objectives are clinically sensible and that greedy search can handle nonconcave value functions without excessive computation.

axioms (1)

domain assumption The optimality objectives used are sensible for clinical risk scoring tasks.
The abstract states that the method optimizes under explicit and sensible optimality objectives.

pith-pipeline@v0.9.0 · 5735 in / 1117 out tokens · 29010 ms · 2026-05-20T07:18:32.840340+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 1 internal anchor

[1]

Journal of Chronic Diseases , volume=

A new method of classifying prognostic comorbidity in longitudinal studies: development and validation , author=. Journal of Chronic Diseases , volume=

work page
[2]

Radiology , volume=

The meaning and use of the area under a receiver operating characteristic (ROC) curve , author=. Radiology , volume=

work page
[3]

Journal of the Society for Clinical Data Management , volume=

Cosmos: real-world data powered by the healthcare community , author=. Journal of the Society for Clinical Data Management , volume=

work page
[4]

JAMA , volume=

Validation of Clinical Classification Schemes for Predicting Stroke: Results From the National Registry of Atrial Fibrillation , author=. JAMA , volume=. 2001 , doi=

work page 2001
[5]

Critical Care Medicine , volume=

APACHE II: a severity of disease classification system , author=. Critical Care Medicine , volume=. 1985 , doi=

work page 1985
[6]

Intensive Care Medicine , volume=

The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure , author=. Intensive Care Medicine , volume=. 1996 , doi=

work page 1996
[7]

Machine Learning , volume=

Supersparse Linear Integer Models for Optimized Medical Scoring Systems , author=. Machine Learning , volume=. 2016 , doi=

work page 2016
[8]

2018 , publisher=

Reinforcement Learning: An Introduction , author=. 2018 , publisher=

work page 2018
[9]

Biometrics , volume=

Combining predictors for classification using the area under the receiver operating characteristic curve , author=. Biometrics , volume=. 2006 , publisher=

work page 2006
[10]

Biostatistics , volume=

Combining diagnostic test results to increase accuracy , author=. Biostatistics , volume=. 2000 , publisher=

work page 2000
[11]

Statistics in Medicine , volume=

Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors , author=. Statistics in Medicine , volume=. 1996 , doi=

work page 1996
[12]

Statistics in Medicine , volume=

On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data , author=. Statistics in Medicine , volume=. 2011 , doi=

work page 2011
[13]

Biometrics , volume=

Survival model predictive accuracy and ROC curves , author=. Biometrics , volume=. 2005 , doi=

work page 2005
[14]

Journal of Machine Learning Research , volume=

Learning optimized risk scores , author=. Journal of Machine Learning Research , volume=

work page
[15]

arXiv preprint arXiv:2601.22324 , year=

AgentScore: Autoformulation of Deployable Clinical Scoring Systems , author=. arXiv preprint arXiv:2601.22324 , year=

work page internal anchor Pith review arXiv

[1] [1]

Journal of Chronic Diseases , volume=

A new method of classifying prognostic comorbidity in longitudinal studies: development and validation , author=. Journal of Chronic Diseases , volume=

work page

[2] [2]

Radiology , volume=

The meaning and use of the area under a receiver operating characteristic (ROC) curve , author=. Radiology , volume=

work page

[3] [3]

Journal of the Society for Clinical Data Management , volume=

Cosmos: real-world data powered by the healthcare community , author=. Journal of the Society for Clinical Data Management , volume=

work page

[4] [4]

JAMA , volume=

Validation of Clinical Classification Schemes for Predicting Stroke: Results From the National Registry of Atrial Fibrillation , author=. JAMA , volume=. 2001 , doi=

work page 2001

[5] [5]

Critical Care Medicine , volume=

APACHE II: a severity of disease classification system , author=. Critical Care Medicine , volume=. 1985 , doi=

work page 1985

[6] [6]

Intensive Care Medicine , volume=

The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure , author=. Intensive Care Medicine , volume=. 1996 , doi=

work page 1996

[7] [7]

Machine Learning , volume=

Supersparse Linear Integer Models for Optimized Medical Scoring Systems , author=. Machine Learning , volume=. 2016 , doi=

work page 2016

[8] [8]

2018 , publisher=

Reinforcement Learning: An Introduction , author=. 2018 , publisher=

work page 2018

[9] [9]

Biometrics , volume=

Combining predictors for classification using the area under the receiver operating characteristic curve , author=. Biometrics , volume=. 2006 , publisher=

work page 2006

[10] [10]

Biostatistics , volume=

Combining diagnostic test results to increase accuracy , author=. Biostatistics , volume=. 2000 , publisher=

work page 2000

[11] [11]

Statistics in Medicine , volume=

Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors , author=. Statistics in Medicine , volume=. 1996 , doi=

work page 1996

[12] [12]

Statistics in Medicine , volume=

On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data , author=. Statistics in Medicine , volume=. 2011 , doi=

work page 2011

[13] [13]

Biometrics , volume=

Survival model predictive accuracy and ROC curves , author=. Biometrics , volume=. 2005 , doi=

work page 2005

[14] [14]

Journal of Machine Learning Research , volume=

Learning optimized risk scores , author=. Journal of Machine Learning Research , volume=

work page

[15] [15]

arXiv preprint arXiv:2601.22324 , year=

AgentScore: Autoformulation of Deployable Clinical Scoring Systems , author=. arXiv preprint arXiv:2601.22324 , year=

work page internal anchor Pith review arXiv