pith. sign in

arxiv: 2606.02671 · v1 · pith:E52AJ42Cnew · submitted 2026-06-01 · 💻 cs.LG · cs.AI

Aligning Data-Driven Predictors with Allocation: A Decision-Focused Approach to Survival Analysis

Pith reviewed 2026-06-28 15:16 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords survival analysisdecision-focused learningorgan allocationNDCGright censorshipconcordance indexmachine learning predictorsallocation policy
0
0 comments X

The pith

Survival predictors optimized for C-index accuracy can produce allocation outcomes no better than random selection, but optimizing them for NDCG instead provides performance guarantees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard statistical metrics for survival models, such as the C-index, do not align with downstream allocation decisions like organ transplants and can lead to arbitrarily poor utility. It introduces a decision-focused approach that trains models by directly optimizing normalized discounted cumulative gain to bridge this gap. The authors prove that NDCG optimization yields allocation guarantees superior to uniform random selection. They also develop a bootstrapping method to optimize NDCG while handling right-censorship in ranking evaluation. On US heart transplant data, this yields 50-100% NDCG gains that translate to substantial increases in life years saved annually.

Core claim

Any algorithm that relies on survival predictors optimized for standard metrics such as the C-index can yield arbitrarily poor outcomes when used for allocation, failing to guarantee utility better than uniform random selection. A decision-focused learning approach based on optimizing NDCG translates to guarantees on allocation performance, and a bootstrapping method allows existing survival models to be optimized for this metric while addressing right censorship.

What carries the argument

NDCG optimization of survival models via bootstrapping, which directly ties ranking quality to allocation utility and handles censored data in evaluation.

If this is right

  • Allocation decisions based on these predictors achieve utility strictly better than random selection.
  • On historical US heart transplant data the method produces 50-100% higher NDCG scores than baselines.
  • The NDCG gains correspond to tens of thousands of additional life years gained annually in transplant allocation.
  • The framework extends to other decision-making settings that use survival or ranking predictions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same NDCG-based alignment could be tested on non-medical allocation problems such as resource scheduling.
  • Combining NDCG with other ranking-aware losses might improve robustness to different forms of censorship.
  • Deployment on live allocation systems would require checking whether the bootstrapping step scales to larger datasets.

Load-bearing premise

That NDCG optimization of survival models via the proposed bootstrapping method provides allocation performance guarantees under right-censorship.

What would settle it

A test on held-out transplant data where an NDCG-optimized model produces allocation utility no higher than random selection would falsify the translation from NDCG to allocation guarantees.

Figures

Figures reproduced from arXiv: 2606.02671 by Ioannis Anagnostides, Itai Zilberstein, Tuomas Sandholm.

Figure 1
Figure 1. Figure 1: Illustration of heart transplant allocation with predicted outcomes. The leftmost value [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Average % gain in NDCG@k of bootstrapped model over baseline predictor. lowest average error as k increases. In general, we expect some error, particularly due to the bias detailed in Section 4.2. Despite the moderate absolute error, the estimators demonstrate strong correlation when com￾pared to the ground-truth NDCG (Figure A2). The EY estimator, aggregated across all nuisance models, exhibits a high deg… view at source ↗
Figure 3
Figure 3. Figure 3: Average % gain in NDCG@k estimations of bootstrapped model over baseline predictor. 6.2 Evaluating on the full dataset Having validated our estimators and our bootstrapping approach under artificial censoring, we now apply our methods to the complete UNOS registry. Unlike the artificial setup, the survival times of censored patients in this dataset are truly unknown. We rely on the NDCG estimators to asses… view at source ↗
read the original abstract

Machine learning predictors have become essential tools for guiding automated decision making. However, a major misalignment persists: predictive models are typically optimized in terms of standard statistical metrics in isolation from the algorithmic tasks they inform. We highlight this incongruity in the high-stakes domain of organ allocation by demonstrating that any algorithm relying on (even highly accurate) survival predictors optimized for standard metrics -- such as the Concordance index (C-index) -- can yield arbitrarily poor outcomes when used for allocation, failing to guarantee utility better than a uniform random selection. To bridge the gap between survival analysis and policy optimization, we introduce a decision-focused learning approach based on optimizing normalized discounted cumulative gain (NDCG), a mainstay metric in information retrieval. We establish the utility of NDCG in survival analysis by proving that it translates to guarantees on the performance of allocation. Empirically, we propose a bootstrapping approach to optimize the NDCG of existing survival models. Unlike prior work, we also address the challenge of right censorship when evaluating ranking. On historical heart transplant data from the US, our method dramatically boosts the NDCG of baseline models by 50-100%, which translates to tens of thousands of additional life years gained annually when deployed for transplant allocation. We anticipate that our framework will find broader applications in decision making with predictions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that survival predictors optimized for standard metrics such as the C-index can produce allocation policies (e.g., organ transplant prioritization) whose expected utility is arbitrarily close to that of uniform random selection. It introduces a decision-focused framework that instead optimizes normalized discounted cumulative gain (NDCG), proves that NDCG optimization yields allocation guarantees strictly better than random, proposes a bootstrapping procedure to optimize existing survival models for NDCG while addressing right-censorship in ranking evaluation, and reports 50-100% NDCG gains on US heart-transplant data that translate to tens of thousands of additional life-years annually.

Significance. If the NDCG-to-allocation guarantee holds under realistic right-censorship and the empirical translation is robust, the work supplies a concrete mechanism for aligning predictive models with downstream policy utility in high-stakes allocation domains. The explicit proof relating NDCG to allocation performance and the explicit treatment of censorship in ranking evaluation are strengths that distinguish the contribution from purely empirical decision-focused learning papers.

major comments (2)
  1. [Proof section / Theorem on NDCG-allocation equivalence] Proof of NDCG utility (likely §3 or Theorem 1): the argument that NDCG optimization guarantees allocation performance better than random must be shown to survive right-censorship. The current statement appears to rely on fully observed event times or on censoring that does not alter top-k ordering; if the proof only covers the uncensored case or assumes independent non-informative censoring, the guarantee does not transfer to the organ-allocation setting where censoring is common and potentially informative.
  2. [Empirical evaluation / bootstrapping description] Bootstrapping procedure and censorship handling (empirical section): the method for optimizing NDCG on existing models must specify exactly how censored observations are treated when computing the ranking metric used for gradient or surrogate optimization. Without this detail it is impossible to verify that the reported 50-100% NDCG lift is not an artifact of the particular imputation or weighting scheme chosen for the censored cases.
minor comments (2)
  1. [Method] Notation for the NDCG surrogate loss should be introduced once and used consistently; the current text mixes the ideal NDCG definition with the differentiable approximation without a clear mapping.
  2. [Experiments / discussion] The abstract states 'tens of thousands of additional life years gained annually'; the corresponding calculation (population size, life-year conversion factor, confidence interval) should appear in the main text or appendix so readers can assess sensitivity to the assumed allocation policy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments on the proof's robustness under censoring and the need for explicit detail on the bootstrapping procedure. We address both points below and will revise the manuscript to strengthen clarity and reproducibility.

read point-by-point responses
  1. Referee: [Proof section / Theorem on NDCG-allocation equivalence] Proof of NDCG utility (likely §3 or Theorem 1): the argument that NDCG optimization guarantees allocation performance better than random must be shown to survive right-censorship. The current statement appears to rely on fully observed event times or on censoring that does not alter top-k ordering; if the proof only covers the uncensored case or assumes independent non-informative censoring, the guarantee does not transfer to the organ-allocation setting where censoring is common and potentially informative.

    Authors: Theorem 1 establishes the NDCG-to-allocation guarantee under the standard survival model with non-informative right-censoring (the maintained assumption throughout the paper and in the organ-allocation literature). The proof operates on the observed data distribution and shows that any ranking with higher NDCG yields strictly higher expected allocation utility than random selection; the NDCG itself is computed on the censored data via the ranking metric defined in Section 4. We will add an explicit remark after the theorem stating the non-informative censoring assumption and a short paragraph discussing the sensitivity of the guarantee to informative censoring. revision: partial

  2. Referee: [Empirical evaluation / bootstrapping description] Bootstrapping procedure and censorship handling (empirical section): the method for optimizing NDCG on existing models must specify exactly how censored observations are treated when computing the ranking metric used for gradient or surrogate optimization. Without this detail it is impossible to verify that the reported 50-100% NDCG lift is not an artifact of the particular imputation or weighting scheme chosen for the censored cases.

    Authors: Section 4 and the supplement describe the use of inverse-probability-of-censoring weighting (IPCW) when evaluating NDCG on right-censored data: each observation's contribution to the discounted cumulative gain is reweighted by the inverse of the estimated censoring survival function at the observed time. The bootstrapping procedure then optimizes this IPCW-NDCG surrogate. We will move the precise IPCW formula and the pseudocode for the weighted NDCG computation into the main text (currently only referenced) so that the optimization target is fully specified. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper's theoretical claim rests on an independent proof that NDCG optimization yields allocation performance guarantees, separate from any fitted parameters or prior self-citations. The empirical component applies bootstrapping to optimize NDCG on pre-existing survival models and directly addresses right-censorship in ranking evaluation, without reducing predictions to inputs by construction or relying on load-bearing self-citations. No self-definitional, fitted-input, or ansatz-smuggling patterns appear in the described derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of NDCG as a proxy for allocation utility and the bootstrapping method's ability to handle right-censorship without introducing bias; no specific free parameters or invented entities are detailed in the abstract.

axioms (1)
  • domain assumption Optimizing NDCG leads to allocation performance guarantees
    Established by proof in the paper as stated in abstract.

pith-pipeline@v0.9.1-grok · 5776 in / 1381 out tokens · 33112 ms · 2026-06-28T15:16:06.026054+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

118 extracted references · 6 canonical work pages

  1. [1]

    Berrevoets, Jeroen and Jordon, James and Bica, Ioana and van der Schaar, Mihaela , booktitle=NeurIPS, year=. Organ

  2. [2]

    Operations Research , volume=

    Fairness, efficiency, and flexibility in organ allocation for kidney transplantation , author=. Operations Research , volume=

  3. [3]

    2024 , month =

    Update on Continuous Distribution of Hearts , howpublished =. 2024 , month =

  4. [4]

    Learning to rank: from pairwise approach to listwise approach , booktitle=ICML, author=

  5. [5]

    Dynamic matching via weighted myopia with application to kidney exchange , author=

  6. [6]

    and Sandholm, Tuomas , booktitle=AAAI, year=

    Dickerson, John P. and Sandholm, Tuomas , booktitle=AAAI, year=

  7. [7]

    Matthew Fahrbach and Zhiyi Huang and Runzhou Tao and Morteza Zadimoghaddam , title =

  8. [8]

    1999 , publisher=

    Modern information retrieval , author=. 1999 , publisher=

  9. [9]

    2008 , publisher=

    Introduction to information retrieval , author=. 2008 , publisher=

  10. [10]

    Biometrika , volume=

    Asymptotic calibration , author=. Biometrika , volume=

  11. [11]

    Biometrika , volume=

    Concordance probability and discriminatory power in proportional hazards regression , author=. Biometrika , volume=

  12. [12]

    Uno, Hajime and Cai, Tianxi and Pencina, Michael J and D'Agostino, Ralph B and Wei, Lee-Jen , journal=. On the

  13. [13]

    Journal of the American Medical Association (JAMA) , volume=

    Evaluating the yield of medical tests , author=. Journal of the American Medical Association (JAMA) , volume=

  14. [14]

    Communications of the ACM , volume=

    Algorithms with predictions , author=. Communications of the ACM , volume=

  15. [15]

    Learning queueing policies for organ transplantation allocation using interpretable counterfactual survival analysis , author=

  16. [16]

    Learning matching representations for individualized organ transplantation allocation , author=

  17. [17]

    Personalized donor-recipient matching for organ transplantation , author=

  18. [18]

    Closing the loop in medical decision support by understanding clinical decision-making: A case study on organ transplantation , author=

  19. [19]

    Transplant Centers , year =

  20. [20]

    Ke, Guolin and Meng, Qi and Finley, Thomas and Wang, Taifeng and Chen, Wei and Ma, Weidong and Ye, Qiwei and Liu, Tie-Yan , booktitle=NeurIPS, year=

  21. [21]

    Annals of Statistics , pages=

    Greedy function approximation: a gradient boosting machine , author=. Annals of Statistics , pages=

  22. [22]

    An introduction to

    Fawcett, Tom , journal=. An introduction to

  23. [23]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume=

    Regression models and life-tables , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=

  24. [24]

    Journal of Machine Learning Research , volume=

    scikit-survival: A library for time-to-event analysis built on top of scikit-learn , author=. Journal of Machine Learning Research , volume=

  25. [25]

    Posttransplant outcomes , year =

  26. [26]

    International Statistical Review , pages=

    Analysis of survival data under the proportional hazards model , author=. International Statistical Review , pages=

  27. [27]

    Statistics in Medicine , volume=

    Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors , author=. Statistics in Medicine , volume=

  28. [28]

    Lee, Changhee and Zame, William and Yoon, Jinsung and van der Schaar, Mihaela , booktitle=AAAI, year=

  29. [29]

    The international heart transplant survival algorithm (

    Nilsson, Johan and Ohlsson, Mattias and H. The international heart transplant survival algorithm (. PloS one , volume=

  30. [30]

    Journal of Cardiac Surgery , volume=

    Using machine learning to improve survival prediction after heart transplantation , author=. Journal of Cardiac Surgery , volume=

  31. [31]

    International Conference on Learning and Intelligent Optimization (LION) , year=

    Sequential model-based optimization for general algorithm configuration , author=. International Conference on Learning and Intelligent Optimization (LION) , year=

  32. [32]

    Circulation , volume=

    Policy Optimization for Dynamic Heart Transplant Allocation , author=. Circulation , volume=

  33. [33]

    Management Science , volume=

    Dynamic matching: Characterizing and achieving constant regret , author=. Management Science , volume=

  34. [34]

    Theoretical Economics , volume=

    Free riding and participation in large scale, multi-hospital kidney exchange , author=. Theoretical Economics , volume=

  35. [35]

    Games and Economic Behavior , volume=

    Design and analysis of multi-hospital kidney exchange mechanisms using random graphs , author=. Games and Economic Behavior , volume=

  36. [36]

    Incentive-compatible kidney exchange in a slightly semi-random model , author=

  37. [37]

    Operations Research , volume=

    On the optimality of greedy policies in dynamic matching , author=. Operations Research , volume=

  38. [38]

    Algorithms with calibrated machine learning predictions , author=

  39. [39]

    2025 , month =

    OPTN , title =. 2025 , month =

  40. [40]

    JAMA cardiology , volume=

    Evolving trends in adult heart transplant with the 2018 heart allocation policy change , author=. JAMA cardiology , volume=

  41. [41]

    Changes in the

    Shore, Supriya and Golbus, Jessica R and Aaronson, Keith D and Nallamothu, Brahmajee K , journal=. Changes in the. 2020 , publisher=

  42. [42]

    Burges, Christopher JC , journal=

  43. [43]

    Learning to rank for optimal treatment allocation under resource constraints , author=

  44. [44]

    ACM SIGecom Exchanges , volume=

    Online matching: A brief survey , author=. ACM SIGecom Exchanges , volume=

  45. [45]

    Online vertex-weighted bipartite matching and single-bid budgeted allocations , author=

  46. [46]

    ACM Conference on Knowledge Discovery and Data Mining (KDD) , year=

    Optimizing search engines using clickthrough data , author=. ACM Conference on Knowledge Discovery and Data Mining (KDD) , year=

  47. [47]

    American Journal of Transplantation , volume=

    Understanding the Transplant Community's Priorities in Heart Allocation , author=. American Journal of Transplantation , volume=

  48. [48]

    Predicting clicks: estimating the click-through rate for new ads , author=

  49. [49]

    Machine Intelligence 15 , pages=

    A Framework for Behavioural Cloning , author=. Machine Intelligence 15 , pages=

  50. [50]

    Efficient reductions for imitation learning , author=

  51. [51]

    , author=

    Large margin methods for structured and interdependent output variables. , author=. Journal of Machine Learning Research , volume=

  52. [52]

    American Journal of Transplantation , volume=

    Risk prediction models for survival after heart transplantation: a systematic review , author=. American Journal of Transplantation , volume=

  53. [53]

    Frontiers in Cardiovascular Medicine , volume=

    Donor shortage in heart transplantation: How can we overcome this challenge? , author=. Frontiers in Cardiovascular Medicine , volume=

  54. [54]

    Operations Research , volume=

    Reshaping national organ allocation policy , author=. Operations Research , volume=

  55. [55]

    An early investigation of outcomes with the new 2018 donor heart allocation system in the

    Cogswell, Rebecca and John, Ranjit and Estep, Jerry D and Duval, Sue and Tedford, Ryan J and Pagani, Francis D and Martin, Cindy M and Mehra, Mandeep R , journal=. An early investigation of outcomes with the new 2018 donor heart allocation system in the

  56. [56]

    Manufacturing & Service Operations Management , volume=

    Patient choice in kidney allocation: The role of the queueing discipline , author=. Manufacturing & Service Operations Management , volume=

  57. [57]

    Donti and J

    Priya L. Donti and J. Zico Kolter and Brandon Amos , title =

  58. [58]

    The well-calibrated

    Dawid, A Philip , journal=. The well-calibrated

  59. [59]

    Performative prediction , author=

  60. [60]

    arXiv:2308.01222 , year=

    Calibration in deep learning: A survey of the state-of-the-art , author=. arXiv:2308.01222 , year=

  61. [61]

    arXiv:2601.16581 , year=

    Necessary Optimality Conditions for Integrated Learning and Optimization Problem in Contextual Optimization , author=. arXiv:2601.16581 , year=

  62. [62]

    arXiv:2505.13564 , year=

    Online Decision-Focused Learning , author=. arXiv:2505.13564 , year=

  63. [63]

    Decision-focused learning: Foundations, state of the art, benchmark and future opportunities , author=

  64. [64]

    predict, then optimize

    Smart "predict, then optimize" , author=. Management Science , volume=

  65. [65]

    Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization , author=

  66. [66]

    Journal of Hepatology , volume=

    Transplant benefit-based offering of deceased donor livers in the United Kingdom , author=. Journal of Hepatology , volume=. 2024 , publisher=

  67. [67]

    Journal of the American Medical Association (JAMA) , volume=

    Development and validation of a risk score predicting death without transplant in adult heart transplant candidates , author=. Journal of the American Medical Association (JAMA) , volume=

  68. [68]

    An optimal algorithm for on-line bipartite matching , author=

  69. [69]

    The adwords problem: Online keyword matching with budgeted bidders under random permutations , author=

  70. [70]

    Automated channel abstraction for advertising auctions , author=

  71. [71]

    Scalable segment abstraction method for advertising campaign admission and inventory allocation optimization , author=

  72. [72]

    2005 , booktitle=

    Optimize-and-dispatch architecture for expressive ad auctions , author=. 2005 , booktitle=

  73. [73]

    Online stochastic optimization in the large: Application to kidney exchange , author=

  74. [74]

    2009 , organization=

    Online ad assignment with free disposal , author=. 2009 , organization=

  75. [75]

    Adwords and generalized online matching , author=

  76. [76]

    ACM Computing Surveys (CSUR) , volume=

    Reinforcement learning in healthcare: A survey , author=. ACM Computing Surveys (CSUR) , volume=. 2021 , publisher=

  77. [77]

    Learning-based planning for improving science return of

    Breitfeld, Abigail and Candela, Alberto and Delfa, Juan and Kangaslahti, Akseli and Zilberstein, Itai and Chien, Steve and Wettergreen, David , booktitle=. Learning-based planning for improving science return of

  78. [78]

    ACM Computing Surveys (CSUR) , volume=

    Imitation learning: A survey of learning methods , author=. ACM Computing Surveys (CSUR) , volume=. 2017 , publisher=

  79. [79]

    Robotics and Autonomous Systems , volume=

    A survey of robot learning from demonstration , author=. Robotics and Autonomous Systems , volume=

  80. [80]

    The International Journal of Robotics Research , volume=

    Imitation learning for agile autonomous driving , author=. The International Journal of Robotics Research , volume=

Showing first 80 references.