pith. sign in

arxiv: 2603.26184 · v2 · pith:LVLF2ZSOnew · submitted 2026-03-27 · 📊 stat.AP

Why decision curves go above or below treat-all and treat-none: a PPV- and calibration-based guide for clinical prediction models

Pith reviewed 2026-05-21 09:39 UTC · model grok-4.3

classification 📊 stat.AP
keywords decision curvesnet benefitpositive predictive valuemodel calibrationclinical prediction modelstreat-alltreat-nonerisk threshold
0
0 comments X

The pith

Net benefit comparisons to treat-all and treat-none reduce to threshold-specific observed risks, connecting decision curves to subgroup calibration and positive predictive value.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops two practical interpretations of net benefit to help clinicians understand decision curves. It shows that a curve sits above or below the treat-none and treat-all lines exactly when the observed risk in patients above the chosen threshold differs from the threshold value itself. This directly ties curve performance to how well the model is calibrated inside the treated and untreated subgroups. The authors also rewrite net benefit in terms of positive predictive value, which clarifies when acting on a prediction improves decisions over simpler strategies. They conclude by recommending positive predictive value curves as a direct companion plot to standard decision curves.

Core claim

Comparisons with treat-none and treat-all can be expressed through threshold-specific observed risk in patients above and below the decision threshold, linking decision-curve performance to calibration in clinically relevant subgroups. Net benefit also relates to positive predictive value, offering a more intuitive explanation of when acting on model predictions is justified. The derivations are illustrated and positive predictive value curves are proposed as a practical complement to decision curves.

What carries the argument

Threshold-specific observed risk above and below the decision threshold, together with its algebraic link to positive predictive value.

If this is right

  • A model that is well calibrated among patients above the threshold will produce a decision curve above the treat-none line.
  • Net benefit becomes positive when positive predictive value at the threshold exceeds the harm-to-benefit ratio of treatment.
  • Positive predictive value curves supply an alternative visual check on the same information that decision curves display.
  • Poor calibration in the high-risk subgroup directly lowers or eliminates the apparent advantage of the model over treat-none.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This framing could guide targeted recalibration efforts focused only on the risk range where decisions are actually made.
  • Clinicians might choose thresholds by inspecting positive predictive value rather than net benefit alone.
  • The same observed-risk decomposition might apply to other threshold-based decision metrics beyond net benefit.

Load-bearing premise

That threshold-specific observed risks and positive predictive values directly represent clinical utility without further conditions on data quality or population traits.

What would settle it

A dataset in which the net benefit value computed at a threshold fails to match the value obtained from the observed risk among patients whose predicted risk exceeds that threshold.

Figures

Figures reproduced from arXiv: 2603.26184 by Linard Hoessly.

Figure 3
Figure 3. Figure 3: two logistic regression models fit to the complete-case GUSTO-I [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: a comparatively rich and a very simple logistic regression model [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 1
Figure 1. Figure 1: Net benefit curve with corresponding PPV curve for GUSTO-I. 2.4. Analytical arguments on calibration and net benefit. We review the observations that miscalibration can substantially reduce clinical utility and may even lead to clinical harm, i.e. net benefit below the treat-all or treat-none strate￾gies [4].Ttwo failure modes were highlighted: systematic overestimation can yield [PITH_FULL_IMAGE:figures/… view at source ↗
Figure 2
Figure 2. Figure 2: Net benefit curve with corresponding PPV curve for SUPPORT. NB(t) < 0 for thresholds t > I (worse than treat-none), whereas systematic under￾estimation can yield NB(t) < NBall(t) for thresholds t < I (worse than treat-all). Both effects can be explained by the observations in Section 2.2. For convenience we briefly go through the arguments below. Overestimation. If risks are systematically overestimated, s… view at source ↗
Figure 3
Figure 3. Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Net benefit curve with corresponding PPV curve for SUPPORT. Appendix D. Mathematical derivations for calibration D.1. Better than treat-none. Let st > 0, then Y¯≥t := T P(t)/(T P(t) + F P(t)) and T P(t) n = st Y¯≥t, F P(t) n = st (1 − Y¯≥t). Substituting into NB(t) = T P (t) n − t 1−t F P (t) n yields NB(t) = stY¯≥t − t 1 − t st(1 − Y¯≥t) = st 1 − t [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
read the original abstract

Net benefit is widely used and reported to evaluate the clinical utility of prediction models, yet its interpretation often remains difficult in practice. In this didactical note, we develop two complementary interpretations that make net benefit easier to understand for clinical audiences. We show that comparisons with treat-none and treat-all can be expressed through threshold-specific observed risk in patients above and below the decision threshold, linking decision-curve performance to calibration in clinically relevant subgroups. We also show how net benefit relates to positive predictive value, offering a more intuitive explanation of when acting on model predictions is justified. We derive and illustrate these results and propose positive predictive value curves as a practical complement to decision curves.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper is a didactical note deriving two complementary interpretations of net benefit in decision-curve analysis for clinical prediction models. It shows that net-benefit comparisons against treat-all and treat-none can be algebraically re-expressed using the empirical event rate (observed risk) among patients whose predicted probability exceeds the decision threshold (and symmetrically below it), thereby linking decision-curve performance to calibration within clinically relevant subgroups. It further relates net benefit at a given threshold to the positive predictive value of the model at that threshold. The derivations are illustrated and the authors propose PPV curves as a practical complement to decision curves.

Significance. If the algebraic identities hold, the manuscript offers a useful pedagogical contribution by grounding the interpretation of decision curves in observable quantities (subgroup calibration and PPV) rather than the abstract net-benefit formula alone. The derivations are parameter-free and follow directly from substitution into the standard net-benefit expression, so they hold for any fixed threshold and any joint distribution of predictions and outcomes. This strengthens the practical teaching and application of decision-curve analysis without introducing new empirical claims or assumptions about data quality.

minor comments (3)
  1. The manuscript would benefit from an explicit statement early in the derivations section confirming that the re-expressions are identities that hold by definition once the standard net-benefit formula is substituted, to avoid any impression of additional modeling assumptions.
  2. Figure legends and axis labels for the proposed PPV curves should more clearly distinguish them from the conventional decision curves and indicate whether the PPV axis is plotted on the same probability scale.
  3. A brief note on the handling of ties or continuous versus discrete thresholds would clarify the practical computation of the threshold-specific observed risks.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive and constructive assessment of our didactical note. We appreciate the recognition that the algebraic links between net benefit, subgroup calibration, and positive predictive value offer a useful pedagogical contribution to decision-curve analysis. No specific major comments were raised in the report, so we have no points to address individually. We will incorporate any minor editorial suggestions during revision.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central derivations consist of algebraic re-expressions that substitute the standard net-benefit formula into expressions involving threshold-specific observed risks and positive predictive value. These identities hold by definition for any fixed threshold and any joint distribution of predictions and outcomes, without fitted parameters, self-referential equations, or load-bearing self-citations. The derivations are self-contained, drawing only on prior standard definitions of net benefit, PPV, and calibration rather than reducing to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the paper relies on standard statistical definitions of net benefit, PPV, and calibration without introducing new free parameters, axioms beyond basic probability, or invented entities.

axioms (1)
  • domain assumption Standard definitions of positive predictive value and calibration as functions of predicted and observed risks hold in the relevant patient subgroups.
    Invoked implicitly when linking decision curve position to threshold-specific observed risk and PPV.

pith-pipeline@v0.9.0 · 5640 in / 1292 out tokens · 45724 ms · 2026-05-21T09:39:01.858388+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    Understanding diagnostic tests 1: sensitivity, specificity and predictive values.Acta Paediatrica, 96(3):338?341, February 2007

    Anthony K Akobeng. Understanding diagnostic tests 1: sensitivity, specificity and predictive values.Acta Paediatrica, 96(3):338?341, February 2007

  2. [2]

    Statistics notes: Diagnostic tests 2: predictive values

    Douglas G Altman and J Martin Bland. Statistics notes: Diagnostic tests 2: predictive values. BMJ, 309(6947):102, 1994

  3. [3]

    Collins, Andrew J

    Ben Van Calster, Gary S. Collins, Andrew J. Vickers, Laure Wynants, Kathleen F. Kerr, Lasai Barrenada, Gael Varoquaux, Karandeep Singh, Karel G. M. Moons, Tina Hernandez- boussard, Dirk Timmerman, David J. Mclernon, Maarten Van Smeden, and Ewout W. Steyer- berg. Performance evaluation of predictive ai models to support medical decisions: Overview and guid...

  4. [4]

    Ben Van Calster and Andrew J. Vickers. Calibration of risk prediction models: Impact on decision-analytic performance.Medical Decision Making, 35(2):162–169, 2015. PMID: 25155798

  5. [5]

    G. S. Collins and D. G. Altman. Predicting the 10 year risk of cardiovascular disease in the united kingdom: independent and external validation of an updated version of qrisk2.BMJ, 344(jun21 1):e4181?e4181, June 2012

  6. [6]

    Tripod+ai statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods.BMJ, page e078378, April 2024

    Gary S Collins, Karel G M Moons, Paula Dhiman, Richard D Riley, Andrew L Beam, Ben Van Calster, Marzyeh Ghassemi, Xiaoxuan Liu, Johannes B Reitsma, Maarten van Smeden, Anne-Laure Boulesteix, Jennifer Catherine Camaradou, Leo Anthony Celi, Spiros Denaxas, Alastair K Denniston, Ben Glocker, Robert M Golub, Hugh Harvey, Georg Heinze, Michael M Hoffman, Andre...

  7. [7]

    Collins, Johannes B

    Gary S. Collins, Johannes B. Reitsma, Douglas G. Altman, and Karel G.M. Moons. Trans- parent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): The tripod statement.Annals of Internal Medicine, 162(1):55?63, January 2015

  8. [8]

    Transparent reporting of multivariable pre- diction models developed or validated using clustered data: Tripod-cluster checklist.BMJ, 380:e071018, February 2023

    Thomas P A Debray, Gary S Collins, Richard D Riley, Kym I E Snell, Ben Van Calster, Johannes B Reitsma, and Karel G M Moons. Transparent reporting of multivariable pre- diction models developed or validated using clustered data: Tripod-cluster checklist.BMJ, 380:e071018, February 2023

  9. [9]

    Georgii.Stochastics: Introduction to Probability and Statistics

    H.O. Georgii.Stochastics: Introduction to Probability and Statistics. De Gruyter textbook. Walter De Gruyter, 2008

  10. [10]

    Strictly proper scoring rules, prediction, and esti- mation.Journal of the American Statistical Association, 102(477):359–378, 2007

    Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and esti- mation.Journal of the American Statistical Association, 102(477):359–378, 2007

  11. [11]

    David J. Hand. Assessing the performance of classification methods.International Statistical Review, 80(3):400–414, 2012

  12. [12]

    Harrell.Regression Modeling Strategies: With Applications to Linear Models, Logis- tic and Ordinal Regression, and Survival Analysis

    F.E. Harrell.Regression Modeling Strategies: With Applications to Linear Models, Logis- tic and Ordinal Regression, and Survival Analysis. Springer Series in Statistics. Springer International Publishing, 2015

  13. [13]

    R package version 5.2-3

    Frank E Harrell Jr.Hmisc: Harrell Miscellaneous, 2025. R package version 5.2-3. 12 LINARD HOESSLY

  14. [14]

    On misconceptions about the brier score in binary prediction models.Global Epidemiology, 11:100242, June 2026

    Linard Hoessly. On misconceptions about the brier score in binary prediction models.Global Epidemiology, 11:100242, June 2026

  15. [15]

    How to evaluate probabilistic prediction models: Key metrics.Journal of Clinical Epidemiology, page 112247, March 2026

    Linard Hoessly and Matthew Parry. How to evaluate probabilistic prediction models: Key metrics.Journal of Clinical Epidemiology, page 112247, March 2026

  16. [16]

    Kerr, Marshall D

    Kathleen F. Kerr, Marshall D. Brown, Kehao Zhu, and Holly Janes. Assessing the clinical impact of risk prediction models with decision curves: Guidance for correct interpretation and appropriate use.Journal of Clinical Oncology, 34(21):2534?2540, July 2016

  17. [17]

    Knaus, Frank E

    William A. Knaus, Frank E. Harrell, Joanne Lynn, Lee Goldman, Russell S. Phillips, Alfred F. Connors, Neal V. Dawson, William J. Fulkerson, Robert M. Califf, Norman Desbiens, Peter Layde, Robert K. Oye, Paul E. Bellamy, Rosemarie B. Hakim, and Douglas P. Wagner. The support prognostic model: Objective estimates of survival for seriously ill hospitalized a...

  18. [18]

    Kohn and Thomas B

    Michael A. Kohn and Thomas B. Newman. Visualizing the value of diagnostic tests and prediction models, part ii. net benefit graphs: net benefit as a function of the exchange rate. Journal of Clinical Epidemiology, 181:111690, May 2025

  19. [19]

    Lee, Lynn H

    Kerry L. Lee, Lynn H. Woodlief, Eric J. Topol, W. Douglas Weaver, Amadeo Betriu, Jacques Col, Maarten Simoons, Phil Aylward, Frans Van de Werf, and Robert M. Califf. Predictors of 30-day mortality in the era of reperfusion for acute myocardial infarction: Results from an international trial of 41 021 patients.Circulation, 91(6):1659?1668, March 1995

  20. [20]

    Springer International Publishing, December 2021

    Hendrik-Jan Mijderwijk and Daan Nieboer.Is My Clinical Prediction Model Clinically Use- ful? A Primer on Decision Curve Analysis, page 115?118. Springer International Publishing, December 2021

  21. [21]

    Pauker and Jerome P

    Stephen G. Pauker and Jerome P. Kassirer. Therapeutic decision making: A cost-benefit analysis.New England Journal of Medicine, 293(5):229?234, July 1975

  22. [22]

    Pepe, Jing Fan, Ziding Feng, Thomas Gerds, and Jorgen Hilden

    Margaret S. Pepe, Jing Fan, Ziding Feng, Thomas Gerds, and Jorgen Hilden. The net reclassi- fication index (nri): A misleading measure of prediction improvement even with independent test data sets.Statistics in Biosciences, 7(2):282?295, August 2014

  23. [23]

    Reilly and Arthur T

    Brendan M. Reilly and Arthur T. Evans. Translating clinical research into clinical prac- tice: Impact of using prediction rules to make decisions.Annals of Internal Medicine, 144(3):201?209, February 2006

  24. [24]

    How to measure the quality of credit scoring models

    Martin Rez´ aˇ c and Frantiˇ sek Rez´ aˇ c. How to measure the quality of credit scoring models. Finance a Uver: Czech Journal of Economics & Finance, 61(5), 2011

  25. [25]

    Valentin Rousson and Thomas Zumbrunn. Decision curve analysis revisited: overall net ben- efit, relationships to roc curve analysis, and application to case-control studies.BMC Medical Informatics and Decision Making, 11(1), June 2011

  26. [26]

    Use of brier score to assess binary predictions.Journal of Clinical Epidemi- ology, 63(8):938?939, August 2010

    Kaspar Rufibach. Use of brier score to assess binary predictions.Journal of Clinical Epidemi- ology, 63(8):938?939, August 2010

  27. [27]

    Sjoberg.dcurves: Decision Curve Analysis for Model Evaluation, 2024

    Daniel D. Sjoberg.dcurves: Decision Curve Analysis for Model Evaluation, 2024. R package version 0.5.0

  28. [28]

    Kym I E Snell, Brooke Levis, Johanna A A Damen, Paula Dhiman, Thomas P A Debray, Lotty Hooft, Johannes B Reitsma, Karel G M Moons, Gary S Collins, and Richard D Riley. Transparent reporting of multivariable prediction models for individual prognosis or diagnosis: checklist for systematic reviews and meta-analyses (tripod-srma).BMJ, 381:e073538, May 2023

  29. [29]

    T Sorahan and M S Gilthorpe. Non-differential misclassification of exposure always leads to an underestimate of risk: an incorrect conclusion.Occupational and Environmental Medicine, 51(12):839?840, December 1994

  30. [30]

    Steyerberg.Clinical Prediction Models: A Practical Approach to Development, Vali- dation, and Updating

    E.W. Steyerberg.Clinical Prediction Models: A Practical Approach to Development, Vali- dation, and Updating. Statistics for Biology and Health. Springer International Publishing, 2019

  31. [31]

    Assessing the performance of prediction models: a framework for traditional and novel measures.Epidemiology, 21(1):128– 138, January 2010

    Ewout W Steyerberg, Andrew J Vickers, Nancy R Cook, Thomas Gerds, Mithat Gonen, Nancy Obuchowski, Michael J Pencina, and Michael W Kattan. Assessing the performance of prediction models: a framework for traditional and novel measures.Epidemiology, 21(1):128– 138, January 2010

  32. [32]

    Using the weighted area under the net benefit curve for decision curve analysis.BMC Medical Informatics and Decision Making, 16(1), July 2016

    Rajesh Talluri and Sanjay Shete. Using the weighted area under the net benefit curve for decision curve analysis.BMC Medical Informatics and Decision Making, 16(1), July 2016. DECISION CUR VES, PPV AND CALIBRATION 13

  33. [33]

    Verbeek, Jan Y

    Ben Van Calster, Laure Wynants, Jan F.M. Verbeek, Jan Y. Verbakel, Evangelia Christodoulou, Andrew J. Vickers, Monique J. Roobol, and Ewout W. Steyerberg. Report- ing and interpreting decision curve analysis: A guide for investigators.European Urology, 74(6):796?804, December 2018

  34. [34]

    Verbakel, Ewout W

    Jan Y. Verbakel, Ewout W. Steyerberg, Hajime Uno, Bavo De Cock, Laure Wynants, Gary S. Collins, and Ben Van Calster. Roc curves for clinical prediction models part 1. roc plots showed no added value above the auc when evaluating the performance of clinical prediction models.Journal of Clinical Epidemiology, 126:207?216, October 2020

  35. [35]

    Decision analysis for the evaluation of diagnostic tests, prediction models, and molecular markers.The American Statistician, 62(4):314?320, November 2008

    Andrew J Vickers. Decision analysis for the evaluation of diagnostic tests, prediction models, and molecular markers.The American Statistician, 62(4):314?320, November 2008

  36. [36]

    Vickers and Elena B

    Andrew J. Vickers and Elena B. Elkin. Decision curve analysis: A novel method for evaluating prediction models.Medical Decision Making, 26(6):565?574, November 2006

  37. [37]

    Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests.BMJ, page i6, January 2016

    Andrew J Vickers, Ben Van Calster, and Ewout W Steyerberg. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests.BMJ, page i6, January 2016

  38. [38]

    Vickers, Ben van Calster, and Ewout W

    Andrew J. Vickers, Ben van Calster, and Ewout W. Steyerberg. A simple, step-by-step guide to interpreting decision curve analysis.Diagnostic and Prognostic Research, 3(1), October 2019

  39. [39]

    Framework for the impact anal- ysis and implementation of clinical prediction rules (cprs).BMC Medical Informatics and Decision Making, 11(1), October 2011

    Emma Wallace, Susan M Smith, Rafael Perera-Salazar, Paul Vaucher, Colin McCowan, Gary Collins, Jan Verbakel, Monica Lakhanpaul, and Tom Fahey. Framework for the impact anal- ysis and implementation of clinical prediction rules (cprs).BMC Medical Informatics and Decision Making, 11(1), October 2011

  40. [40]

    max ( N B(t) N B(t)+ 1 t (I−N B(t)) (1−t) +t, N B(t) N B(t)+ 1 1−t (1−I) (1−t) +t ) ,1 # ,ifN B(t)>0, {0, t},ifN B(t) = 0,

    Qian M. Zhou, Lu Zhe, Russell J. Brooke, Melissa M. Hudson, and Yan Yuan. A relationship between the incremental values of area under the ROC curve and of area under the precision- recall curve.Diagnostic and Prognostic Research, 5(1):13, July 2021. AppendixA.Mathematical derivations A.1.Bounds on PPV implied by net benefit.For a fixed incidenceI, the fra...