Explaining an increase in predicted risk for clinical alerts
Pith reviewed 2026-05-24 23:30 UTC · model grok-4.3
The pith
Methods lift static attribution techniques to explain risk increases in dynamical models by attributing them to specific past inputs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We develop methods to lift static attribution techniques to the dynamical setting, where we identify and address challenges specific to dynamics. When the estimated risk increases, the goal of the explanation is to attribute the increase to a few relevant inputs from the past, enabling concise triage of clinical alerts.
What carries the argument
Lifting static attribution techniques to the dynamical setting to attribute risk increases to a few past inputs while addressing statefulness and sequential challenges.
If this is right
- Clinicians receive concise explanations that highlight the inputs responsible for a risk increase.
- The same lifted attribution methods apply to any dynamical model producing sequential risk estimates.
- Challenges unique to the dynamical setting, such as handling state across time steps, are explicitly identified and mitigated.
- Expert evaluation provides direct feedback on whether the explanations support real triage decisions.
Where Pith is reading between the lines
- Similar attribution lifting could apply to non-clinical domains with sequential risk models, such as equipment failure prediction.
- If attributions prove reliable, they might reduce the volume of alerts that require full manual review.
- Future work could test whether these explanations improve actual clinical outcomes beyond expert ratings of utility.
Load-bearing premise
Concise attribution of risk increases to a few past inputs will enable clinicians to effectively triage alerts without needing the full patient history or additional context.
What would settle it
Expert clinicians reviewing the attributions find them unhelpful or misleading for deciding whether to intervene on alerts.
Figures
read the original abstract
Much work aims to explain a model's prediction on a static input. We consider explanations in a temporal setting where a stateful dynamical model produces a sequence of risk estimates given an input at each time step. When the estimated risk increases, the goal of the explanation is to attribute the increase to a few relevant inputs from the past. While our formal setup and techniques are general, we carry out an in-depth case study in a clinical setting. The goal here is to alert a clinician when a patient's risk of deterioration rises. The clinician then has to decide whether to intervene and adjust the treatment. Given a potentially long sequence of new events since she last saw the patient, a concise explanation helps her to quickly triage the alert. We develop methods to lift static attribution techniques to the dynamical setting, where we identify and address challenges specific to dynamics. We then experimentally assess the utility of different explanations of clinical alerts through expert evaluation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops methods to lift static attribution techniques to dynamical, stateful models that output sequences of risk estimates, with the goal of attributing increases in predicted risk to a small number of past inputs. It presents a clinical case study on alerting for patient deterioration, where concise explanations are intended to help clinicians triage alerts without reviewing the full history, and evaluates the resulting explanations via expert assessment.
Significance. If the lifted methods and evaluation hold, the work would offer a targeted approach to explaining temporal risk changes in clinical ML systems, addressing a practical need for concise, actionable alerts. The emphasis on dynamics-specific challenges and human-centered expert evaluation distinguishes it from purely static attribution literature.
major comments (1)
- [Abstract / experimental evaluation] Abstract and experimental evaluation section: the claim that concise attributions enable clinicians to triage alerts without needing the full patient history is central to the stated clinical utility, yet the expert evaluation only assesses perceived utility of the explanations and does not test whether they can substitute for full-history review in triage decisions.
minor comments (1)
- [Abstract] The abstract supplies no equations, validation details, or quantitative outcomes, making it difficult to assess technical soundness from the summary alone.
Simulated Author's Rebuttal
We thank the referee for their detailed review and constructive comments. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract / experimental evaluation] Abstract and experimental evaluation section: the claim that concise attributions enable clinicians to triage alerts without needing the full patient history is central to the stated clinical utility, yet the expert evaluation only assesses perceived utility of the explanations and does not test whether they can substitute for full-history review in triage decisions.
Authors: We agree that the expert evaluation focuses on perceived utility rather than directly testing whether the explanations can substitute for full patient history review in actual triage decisions. This represents a limitation in the strength of evidence for the clinical utility claim. In the revised manuscript, we will clarify this distinction in the abstract and experimental evaluation section, and add a discussion of this limitation along with suggestions for future work that could include controlled studies of triage performance. revision: yes
Circularity Check
No circularity: methods and evaluation are independent of inputs
full rationale
The paper describes lifting existing static attribution techniques to a dynamical setting and evaluating the resulting explanations via expert review on clinical alerts. No equations, fitted parameters, or self-citations are shown to reduce any claimed result or prediction to the inputs by construction. The core steps—identifying dynamical challenges and performing expert utility assessment—are methodological extensions and empirical checks that stand apart from the data or prior fits. This is the normal case of a self-contained applied-methods paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Finding patients before they crash: the next major opportunity to improve patient safety
David W Bates and Eyal Zimlichman. Finding patients before they crash: the next major opportunity to improve patient safety. BMJ Qual. Saf. , 24(1):1–3, January 2015
work page 2015
-
[2]
Big data in health care: using analytics to identify and manage high-risk and high-cost patients
David W Bates, Suchi Saria, Lucila Ohno-Machado, Anand Shah, and Gabriel Escobar. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff., 33(7):1123–1131, July 2014
work page 2014
-
[3]
Vincent Liu, Yan S. Kim, Benjamin J. Turk, Arona Ragins, Brian A. Dummett, Carnen L. Adams, Elizabeth A. Scruth, and Patricia Kipnis. Early Detection of Impending Deterioration Outside The ICU: A Difference- in-Differences (DiD) Study, pages A7614–A7614. 2016
work page 2016
-
[4]
Barbara J Drew, Patricia Harris, Jessica K Z` egre-Hemsey, Tina Mammone, Daniel Schindler, Rebeca Salas-Boni, Yong Bai, Adelita Tinoco, Quan Ding, and Xiao Hu. Insights into the problem of alarm fatigue with physio- logic monitor devices: a comprehensive observational study of consecutive intensive care unit patients. PLoS One, 9(10):e110274, October 2014
work page 2014
-
[5]
Shannon M Fernando, Alexandre Tran, Monica Taljaard, Wei Cheng, Bram Rochwerg, Andrew J E Seely, and Jeffrey J Perry. Prognostic accuracy of the quick sequential organ failure assessment for mortality in patients with suspected infection: A systematic review and meta-analysis. Ann. Intern. Med., February 2018
work page 2018
-
[6]
How to explain individual classification decisions
David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawan- abe, Katja Hansen, and Klaus-Robert M¨ uller. How to explain individual classification decisions. J. Mach. Learn. Res. , 11:1803–1831, August 2010
work page 2010
-
[7]
Visualizing higher-layer features of a deep network
Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. Visualizing higher-layer features of a deep network. Technical Report 1341, University of Montreal, June 2009. Also presented at the ICML 2009 Workshop on Learning Feature Hierarchies, Montr´ eal, Canada
work page 2009
-
[8]
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR, abs/1312.6034, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[9]
Axiomatic attribution for deep networks
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In ICML, 2017
work page 2017
-
[10]
Neural machine translation by jointly learning to align and translate
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. September 2014
work page 2014
-
[11]
Intelligible models for healthcare: Predicting pneumonia 13 risk and hospital 30-day readmission
Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. Intelligible models for healthcare: Predicting pneumonia 13 risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Min- ing, KDD ’15, pages 1721–1730, New York, NY, USA, 2015. ACM
work page 2015
-
[12]
Michael J Rothman, Steven I Rothman, and Joseph Beals, 4th. Development and validation of a continuous measure of patient condition using the electronic medical record. J. Biomed. Inform., 46(5):837–848, October 2013
work page 2013
-
[13]
Towards a rigorous science of inter- pretable machine learning
Finale Doshi-Velez and Been Kim. Towards a rigorous science of inter- pretable machine learning. arXiv, 2017
work page 2017
-
[14]
Lauren Block, Robert Habicht, Albert W Wu, Sanjay V Desai, Kevin Wang, Kathryn Novello Silva, Timothy Niessen, Nora Oliver, and Leonard Feldman. In the wake of the 2003 and 2011 duty hours regulations, how do internal medicine interns spend their time? J. Gen. Intern. Med. , 28(8):1042–1047, August 2013
work page 2003
-
[15]
Lena Mamykina, David K Vawdrey, and George Hripcsak. How do residents spend their shift time? a time and motion study with a particular focus on the use of computers. Acad. Med., 91(6):827–832, June 2016
work page 2016
-
[16]
Recurrent Neural Networks for Multivariate Time Series with Missing Values
Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values. CoRR, abs/1606.01865, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[17]
Zachary C. Lipton, David C. Kale, and Randall C. Wetzel. Directly modeling missing data in sequences with rnns: Improved classification of clinical time series. In MLHC, 2016
work page 2016
-
[18]
Clinical intervention prediction and understanding using deep networks
Harini Suresh, Nathan Hunt, Alistair Johnson, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. Clinical intervention prediction and understanding using deep networks. May 2017
work page 2017
-
[19]
Denis Agniel, Isaac S Kohane, and Griffin M Weber. Biases in electronic health record data due to processes within the healthcare system: retro- spective observational study. BMJ, 361:k1479, April 2018
work page 2018
-
[20]
Possible sources of bias in primary care electronic health record data use and reuse
Robert A Verheij, Vasa Curcin, Brendan C Delaney, and Mark M McGilchrist. Possible sources of bias in primary care electronic health record data use and reuse. J Med Internet Res , 20(5):e185, May 2018
work page 2018
-
[21]
Sampling rate causes bias in apache ii and saps ii scores
Matti Suistomaa, Aarno Kari, Esko Ruokonen, and Jukka Takala. Sampling rate causes bias in apache ii and saps ii scores. Intensive Care Medicine, 26(12):1773–1778, Dec 2000
work page 2000
-
[22]
Supriyo Chakraborty, Richard Tomsett, Ramya Raghavendra, Daniel Har- borne, Moustafa Alzantot, Federico Cerutti, Mani Srivastava, Alun Preece, Julier Simon, Raghuveer M. Rao, Troy D. Kelley, David Braines, Murat Sensoy, Christopher J. Willis, and Prudhvi Gurram. Interpretability of deep learning models: a survey of results. In IEEE Smart World Congress 20...
work page 2017
-
[23]
D. Smilkov, N. Thorat, B. Kim, F. Vi´ egas, and M. Wattenberg. SmoothGrad: removing noise by adding noise. ICML workshop on visualization for deep learning, June 2017
work page 2017
-
[24]
Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolu- tional networks. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision – ECCV 2014 , pages 818–833, Cham,
work page 2014
-
[25]
Springer International Publishing
-
[26]
Striving for Simplicity: The All Convolutional Net
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. Striving for simplicity: The all convolutional net. CoRR, abs/1412.6806, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[27]
Pramod Kaushik Mudrakarta, Ankur Taly, Mukund Sundararajan, and Kedar Dhamdhere. Did the model understand the question? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers , pages 1896–1906, 2018
work page 2018
-
[28]
”why should i trust you?”: Explaining the predictions of any classifier
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”why should i trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD ’16, pages 1135–1144, New York, NY, USA, 2016. ACM
work page 2016
-
[29]
A unified approach to interpreting model predictions
Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 4765–4774. Curran Associates, Inc., 2017
work page 2017
-
[30]
Learning to explain: An information-theoretic perspective on model interpretation
Jianbo Chen, Le Song, Martin Wainwright, and Michael Jordan. Learning to explain: An information-theoretic perspective on model interpretation. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th In- ternational Conference on Machine Learning , volume 80 of Proceedings of Machine Learning Research, pages 883–892, Stockholmsmssan, Stockholm...
work page 2018
-
[31]
Lionel A Mandell, Richard G Wunderink, Antonio Anzueto, John G Bartlett, G Douglas Campbell, Nathan C Dean, Scott F Dowell, Thomas M File, Jr, Daniel M Musher, Michael S Niederman, Antonio Torres, Cynthia G Whitney, Infectious Diseases Society of America, and American Thoracic Society. Infectious diseases society of America/American thoracic society conse...
work page 2007
-
[32]
W S Lim, D L Smith, M P Wise, and S A Welham. British thoracic society community acquired pneumonia guideline and the NICE pneumonia guideline: how they fit together. BMJ Open Respir Res , 2(1):e000091, May 2015
work page 2015
-
[33]
Moving beyond Single-Parameter early warning scores for rapid response system activation
Matthew M Churpek and Dana P Edelson. Moving beyond Single-Parameter early warning scores for rapid response system activation. Crit. Care Med., 44(12):2283–2285, December 2016
work page 2016
-
[34]
Sustained effectiveness of a primary-team-based rapid response system
Michael D Howell, Long Ngo, Patricia Folcarelli, Julius Yang, Lawrence Mottley, Edward R Marcantonio, Kenneth E Sands, Donald Moorman, and Mark D Aronson. Sustained effectiveness of a primary-team-based rapid response system. Crit. Care Med., 40(9):2562–2568, September 2012
work page 2012
-
[35]
Electronic health record adoption in US hospitals: Progress continues, but challenges persist
Julia Adler-Milstein, Catherine M DesRoches, Peter Kralovec, Gregory Foster, Chantal Worzala, Dustin Charles, Talisha Searcy, and Ashish K Jha. Electronic health record adoption in US hospitals: Progress continues, but challenges persist. Health Aff., 34(12):2174–2180, December 2015
work page 2015
-
[36]
Benjamin Shickel, Patrick Tighe, Azra Bihorac, and Parisa Rashidi. Deep ehr: A survey of recent advances in deep learning techniques for electronic health record (ehr) analysis. IEEE Journal of Biomedical and Health Informatics, PP(99):1–1, 2017
work page 2017
-
[37]
Clinical intervention prediction and understanding with deep neural networks
Harini Suresh, Nathan Hunt, Alistair Johnson, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. Clinical intervention prediction and understanding with deep neural networks. In Finale Doshi-Velez, Jim Fackler, David Kale, Rajesh Ranganath, Byron Wallace, and Jenna Wiens, editors, Proceedings of the 2nd Machine Learning for Healthcare Conference, vo...
work page 2017
-
[38]
Deepr: A convolutional net for medical records
Phuoc Nguyen, Truyen Tran, Nilmini Wickramasinghe, and Svetha Venkatesh. Deepr: A convolutional net for medical records. July 2016
work page 2016
-
[39]
Doctor AI: Predicting clinical events via recurrent neural networks
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stew- art, and Jimeng Sun. Doctor AI: Predicting clinical events via recurrent neural networks. In Proceedings of the 1st Machine Learning for Healthcare Conference, pages 301–318. jmlr.org, 2016
work page 2016
-
[40]
Deep computational phenotyping
Zhengping Che, David Kale, Wenzhe Li, Mohammad Taha Bahadori, and Yan Liu. Deep computational phenotyping. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, pages 507–516, New York, NY, USA, 2015. ACM
work page 2015
-
[41]
Thomas A. Lasko, Joshua C. Denny, and Mia A. Levy. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PLoS One, 8(6), 2013. 16
work page 2013
-
[42]
Truyen Tran, Tu Dinh Nguyen, Dinh Phung, and Svetha Venkatesh. Learn- ing vector representation of medical objects via emr-driven nonnegative restricted boltzmann machines (enrbm). J. of Biomedical Informatics , 54(C):96–105, April 2015
work page 2015
-
[43]
Comorbidity clusters in autism spectrum disorders: An electronic health record time-series analysis
Finale Doshi-Velez, Yaorong Ge, and Isaac Kohane. Comorbidity clusters in autism spectrum disorders: An electronic health record time-series analysis. Pediatrics, 133(1):e54–e63, 2014
work page 2014
-
[44]
Mathieu Guillame-Bert, Artur Dubrawski, Donghan Wang, Marilyn Hrav- nak, Gilles Clermont, and Michael R. Pinsky. Learning temporal rules to forecast instability in continuously monitored patients. JAMIA, 24(1):47–53, 2017
work page 2017
-
[45]
Thomas McCoy Roy Perlis Finale Doshi-Velez Michael C. Hughes, Huseyin Melih Elibol. Supervised topic models for clinical interpretability. In Proceedings of the 1st Machine Learning for Healthcare Conference , 2016
work page 2016
-
[46]
Improving palliative care with deep learning
Anand Avati, Kenneth Jung, Stephanie Harman, Lance Downing, Andrew Ng, and Nigam H Shah. Improving palliative care with deep learning. November 2017
work page 2017
-
[47]
Ying Sha and May D. Wang. Interpretable predictions of clinical outcomes with an attention-based recurrent neural network. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biol- ogy,and Health Informatics, ACM-BCB ’17, pages 233–240, New York, NY, USA, 2017. ACM
work page 2017
-
[48]
Retain: An interpretable predictive model for healthcare using reverse time attention mechanism
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29 , pages 3504–3512. Curran Associa...
work page 2016
-
[49]
Interpretable deep models for icu outcome prediction
Zhengping Che, Sanjay Purushotham, Robinder Khemani, and Yan Liu. Interpretable deep models for icu outcome prediction. 2016:371–380, 02 2017
work page 2016
-
[50]
Zhengping Che, Sanjay Purushotham, Robinder G. Khemani, and Yan Liu. Distilling knowledge from deep networks with applications to healthcare domain. In NIPS Workshop on Machine Learning for Healthcare , 2015
work page 2015
-
[51]
Differentiation of discrete multidimensional signals
Hany Farid and Eero Simoncelli. Differentiation of discrete multidimensional signals. IEEE Transactions on Image Processing , 13(4):496–508, 4 2004
work page 2004
-
[52]
H. Farid and E. P. Simoncelli. Differentiation of discrete multidimensional signals. IEEE Transactions on Image Processing, 13(4):496–508, April 2004. 17
work page 2004
-
[53]
Long Short-Term memory.Neural Comput., 9(8):1735–1780, November 1997
Sepp Hochreiter and J¨ urgen Schmidhuber. Long Short-Term memory.Neural Comput., 9(8):1735–1780, November 1997
work page 1997
-
[54]
Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. Mimic-iii, a freely accessible critical care database. Scientific Data, 2016
work page 2016
-
[55]
G. Michael Felker, Larry A. Allen, Stuart J. Pocock, Linda K. Shaw, John J.V. McMurray, Marc A. Pfeffer, Karl Swedberg, Duolao Wang, Salim Yusuf, Eric L. Michelson, and Christopher B. Granger. Red cell distribution width as a novel prognostic marker in heart failure: Data from the charm program and the duke databank. Journal of the American College of Card...
work page 2007
-
[56]
Scalable and accurate deep learning for electronic health records
Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M Dai, Nissan Hajaj, Pe- ter J Liu, Xiaobing Liu, Mimi Sun, Patrik Sundberg, Hector Yee, Kun Zhang, Gavin E Duggan, Gerardo Flores, Michaela Hardt, Jamie Irvine, Quoc Le, Kurt Litsch, Jake Marcus, Alexander Mossin, Justin Tansuwan, De Wang, James Wexler, Jimbo Wilson, Dana Ludwig, Samuel L Volchen- boum, Katheri...
work page 2018
-
[57]
A theoretically grounded application of dropout in recurrent neural networks
Yarin Gal and Zoubin Ghahramani. A theoretically grounded application of dropout in recurrent neural networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 1019–1027. Curran Associates, Inc., 2016
work page 2016
-
[58]
Adaptive subgradient methods for online learning and stochastic optimization
John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. , 12:2121–2159, July 2011
work page 2011
-
[59]
J. A. Kellum and N. Lameire. Diagnosis, evaluation, and management of acute kidney injury: a KDIGO summary (Part 1). Crit Care, 17(1):204, Feb 2013
work page 2013
-
[60]
P.-J. Kindermans, S. Hooker, J. Adebayo, M. Alber, K. T. Sch¨ utt, S. D¨ ahne, D. Erhan, and B. Kim. The (Un)reliability of saliency methods. NIPS workshop on Explaining and Visualizing Deep Learning , 2017
work page 2017
-
[61]
The mythos of model interpretability
Zachary Chase Lipton. The mythos of model interpretability. In ICML Workshop on Human Interpretability , 2016
work page 2016
-
[62]
Aaron Springer, Victoria Hollis, and Steve Whittaker. Dice in the black box: User experiences with an inscrutable algorithm, 2017. 18 8 Appendix 8.1 Experiments - Features For the experiments we restricted the lab tests to the following list of roughly 40 target harmonized features: blood pressure, pulse, respiratory rate, oxygen saturation, blood pressur...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.