Modeling the Uncertainty in Electronic Health Records: a Bayesian Deep Learning Approach

Michael Dulin; Mirsad Hadzikadic; Riyi Qiu; Xi Niu; Xin Wang; Yugang Jia

arxiv: 1907.06162 · v1 · pith:YUFIRJ64new · submitted 2019-07-14 · 💻 cs.LG · stat.ML

Modeling the Uncertainty in Electronic Health Records: a Bayesian Deep Learning Approach

Riyi Qiu , Yugang Jia , Mirsad Hadzikadic , Michael Dulin , Xi Niu , Xin Wang This is my paper

Pith reviewed 2026-05-24 21:48 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords uncertainty quantificationBayesian deep learningelectronic health recordsdata noisepredictive healthcare modelspatient stratificationmodel confidence

0 comments

The pith

Bayesian neural networks model data noise uncertainty in electronic health records to identify cases where reducing noise improves predictions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Bayesian neural network approach for predicting uncertainty caused by data noise in electronic health records. This uncertainty measure adds confidence to model predictions in healthcare tasks. High-uncertainty instances are shown to negatively impact model performance. Examining the distributions of predictions and uncertainties allows identification of patient groups that would benefit most from data noise reduction through timely intervention.

Core claim

A Bayesian neural network is used to predict uncertainty induced by data noise in EHR data. Instances with high uncertainty harm model performance. Distributions of model prediction and uncertainty can identify a group of patients for intervention such that decreasing data noise benefits prediction accuracy for these patients.

What carries the argument

Bayesian Neural Network that estimates uncertainty corresponding to data noise in the input records.

If this is right

Instances with high uncertainty reduce overall model performance.
Distributions of predictions and uncertainty reveal specific patient groups.
Targeted reduction of data noise for those groups increases prediction accuracy.
Uncertainty estimates provide an additional confidence level for each prediction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar uncertainty modeling could apply to other domains with noisy data like financial records or sensor readings.
Healthcare systems might prioritize data quality improvements for high-uncertainty patient cohorts to optimize resources.
If the mapping from uncertainty to noise holds, real-time uncertainty monitoring could guide data collection practices.

Load-bearing premise

The uncertainty output by the Bayesian neural network corresponds specifically to data noise in the EHR rather than model misspecification or other sources.

What would settle it

Measuring the correlation between the model's uncertainty values and independently quantified data noise levels in the EHR instances, or testing whether noise reduction in high-uncertainty cases fails to improve accuracy.

Figures

Figures reproduced from arXiv: 1907.06162 by Michael Dulin, Mirsad Hadzikadic, Riyi Qiu, Xi Niu, Xin Wang, Yugang Jia.

**Figure 3.** Figure 3: The trend of AUC as more data are removed, aver [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

read the original abstract

Deep learning models have exhibited superior performance in predictive tasks with the explosively increasing Electronic Health Records (EHR). However, due to the lack of transparency, behaviors of deep learning models are difficult to interpret. Without trustworthiness, deep learning models will not be able to assist in the real-world decision-making process of healthcare issues. We propose a deep learning model based on Bayesian Neural Networks (BNN) to predict uncertainty induced by data noise. The uncertainty is introduced to provide model predictions with an extra level of confidence. Our experiments verify that instances with high uncertainty are harmful to model performance. Moreover, by investigating the distributions of model prediction and uncertainty, we show that it is possible to identify a group of patients for timely intervention, such that decreasing data noise will benefit more on the prediction accuracy for these patients.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies BNN uncertainty estimation to EHR but the abstract shows no quantitative results, baselines, or validation that the uncertainty tracks data noise specifically.

read the letter

This paper applies Bayesian neural networks to produce uncertainty estimates for predictions on electronic health records. The abstract states that high-uncertainty instances harm performance and that examining prediction and uncertainty distributions can flag patients where reducing data noise would improve accuracy most. That is the central pitch. Bayesian neural nets for uncertainty are not new, so the contribution is the EHR application and the practical framing around trustworthiness in healthcare models. The authors correctly note that deep learning needs confidence measures to be usable in real medical decisions, and BNNs are a reasonable existing tool for that. The paper does not add new algorithms, theory, or derivations. The abstract claims experiments verify the performance harm from high uncertainty, yet supplies none of the numbers, error bars, dataset details, or baselines needed to check the claim. It also assumes without evidence that the uncertainty output isolates data noise rather than epistemic uncertainty or model limits. The stress-test note is correct on this: the intervention recommendation does not follow from the given text. No equations or ablations appear to separate those sources. This is aimed at applied researchers or engineers already working on EHR predictive models who might want a simple example of adding uncertainty. Readers seeking rigorous validation, reproducible methods, or methodological advances will not find value here. I would not bring it to a reading group or cite it. It does not deserve peer review in this form because the evidence for the main claims is missing from the abstract.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes a Bayesian Neural Network (BNN) approach to model uncertainty induced by data noise in Electronic Health Records (EHR) for predictive tasks. It claims that this provides an extra level of confidence in predictions, that experiments verify high-uncertainty instances harm model performance, and that analyzing distributions of predictions and uncertainty enables identification of patient groups for timely intervention where reducing data noise would most improve accuracy.

Significance. If the central claims hold with proper validation, the work would offer a practical method for uncertainty quantification in healthcare ML that ties estimates specifically to data noise, potentially improving model trustworthiness and guiding efficient data-quality interventions in EHR systems. This addresses a key barrier to clinical adoption of deep learning and aligns with needs for interpretable, reliable predictions in high-stakes domains.

major comments (3)

Abstract: the claim that 'experiments verify that instances with high uncertainty are harmful to model performance' provides no quantitative results, baselines, error bars, dataset details, or performance metrics, rendering the central empirical claim unevaluable from the manuscript.
Abstract: no derivation, architectural specification, or ablation is given to show how the BNN isolates uncertainty induced by data noise (aleatoric) from epistemic uncertainty or model misspecification; this separation is load-bearing for the intervention recommendation but is assumed without evidence.
Abstract: the suggestion that investigating prediction and uncertainty distributions 'makes it possible to identify a group of patients for timely intervention' lacks any concrete method, threshold, or validation showing that high-uncertainty cases exhibit measurably higher label/feature noise rather than other sources.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below, clarifying aspects of the manuscript and indicating where revisions will be made.

read point-by-point responses

Referee: Abstract: the claim that 'experiments verify that instances with high uncertainty are harmful to model performance' provides no quantitative results, baselines, error bars, dataset details, or performance metrics, rendering the central empirical claim unevaluable from the manuscript.

Authors: The abstract is a concise summary. The full manuscript details the experiments, including quantitative performance metrics, baselines, error bars, and dataset information (e.g., on MIMIC-III) showing degraded performance for high-uncertainty instances. We will revise the abstract to include key quantitative highlights from the experiments section. revision: yes
Referee: Abstract: no derivation, architectural specification, or ablation is given to show how the BNN isolates uncertainty induced by data noise (aleatoric) from epistemic uncertainty or model misspecification; this separation is load-bearing for the intervention recommendation but is assumed without evidence.

Authors: Section 3 specifies the BNN architecture applied to EHR data to model uncertainty from data noise. The approach targets predictive uncertainty attributable to noisy inputs rather than providing an explicit derivation separating aleatoric from epistemic components. We will add a clarifying paragraph on this modeling choice in the revision. revision: partial
Referee: Abstract: the suggestion that investigating prediction and uncertainty distributions 'makes it possible to identify a group of patients for timely intervention' lacks any concrete method, threshold, or validation showing that high-uncertainty cases exhibit measurably higher label/feature noise rather than other sources.

Authors: The manuscript uses distribution analysis in the experiments to identify high-uncertainty patient groups for potential intervention. We acknowledge the absence of explicit thresholds and direct empirical validation against measured noise levels. We will incorporate a concrete identification method and supporting validation in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard BNN properties without reduction to inputs

full rationale

The paper introduces a BNN model whose uncertainty output is presented as directly modeling data noise in EHR, with experiments confirming that high-uncertainty instances harm performance. No equations, derivations, or parameter-fitting steps are exhibited that would make the uncertainty prediction equivalent to its inputs by construction. The architecture follows established BNN methods for uncertainty quantification, and the mapping to data noise is an interpretive claim rather than a self-referential definition or fitted renaming. No self-citations are load-bearing for the core result, and the work remains self-contained against external benchmarks for BNN uncertainty.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no free parameters, axioms, or invented entities are explicitly introduced or quantified.

pith-pipeline@v0.9.0 · 5678 in / 966 out tokens · 14408 ms · 2026-05-24T21:48:02.334929+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 2 internal anchors

[1]

Yu Cheng, Fei Wang, Ping Zhang, and Jianying Hu. 2016. Risk prediction with electronic health records: A deep learning approach. In Proceedings of the 2016 SIAM International Conference on Data Mining . SIAM, 432–440

work page 2016
[2]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural lan- guage processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning . ACM, 160–167

work page 2008
[3]

Martin Cooke, Phil Green, Ljubomir Josifovski, and Ascension Vizinho. 2001. Robust automatic speech recognition with missing and unreliable acoustic data. Speech communication 34, 3 (2001), 267–285

work page 2001
[4]

Armen Der Kiureghian and Ove Ditlevsen. 2009. Aleatory or epistemic? Does it matter? Structural Safety 31, 2 (2009), 105–112

work page 2009
[5]

Yarin Gal. 2016. Uncertainty in deep learning . Ph.D. Dissertation. PhD thesis, University of Cambridge

work page 2016
[6]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. 1050–1059

work page 2016
[7]

Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks. In Advances in neural information processing systems. 1019–1027

work page 2016
[8]

Zoubin Ghahramani. 2015. Probabilistic machine learning and artificial intelli- gence. Nature 521, 7553 (2015), 452

work page 2015
[9]

Omer Gottesman, Fredrik Johansson, Matthieu Komorowski, Aldo Faisal, David Sontag, Finale Doshi-Velez, and Leo Anthony Celi. 2019. Guidelines for reinforce- ment learning in healthcare. Nature medicine 25, 1 (2019), 16–18

work page 2019
[10]

Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, Greg Ver Steeg, and Aram Galstyan. 2017. Multitask learning and benchmarking with clinical time series data. arXiv preprint arXiv:1703.07771 (2017)

work page arXiv 2017
[11]

Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database.Scientific data 3 (2016), 160035

work page 2016
[12]

Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision?. In Advances in neural information processing systems. 5574–5584

work page 2017
[13]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifica- tion with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105

work page 2012
[14]

Martin Krzywinski and Naomi Altman. 2013. Points of significance: Importance of being uncertain

work page 2013
[15]

Quoc V Le, Alex J Smola, and Stéphane Canu. 2005. Heteroscedastic Gaussian process regression. In Proceedings of the 22nd international conference on Machine learning. ACM, 489–496

work page 2005
[16]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436

work page 2015
[17]

Sergey Levine, Peter Pastor, Alex Krizhevsky, Julian Ibarz, and Deirdre Quillen

work page
[18]

The International Journal of Robotics Research 37, 4-5 (2018), 421–436

Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research 37, 4-5 (2018), 421–436

work page 2018
[19]

Radford M Neal. 1993. Bayesian learning via stochastic dynamics. In Advances in neural information processing systems . 475–482

work page 1993
[20]

Sanjay Purushotham, Chuizheng Meng, Zhengping Che, and Yan Liu. 2017. Bench- mark of deep learning models on large healthcare mimic datasets. arXiv preprint arXiv:1710.08531 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

Sanjay Purushotham, Chuizheng Meng, Zhengping Che, and Yan Liu. 2018. Bench- marking deep learning models on large healthcare datasets. Journal of biomedical informatics 83 (2018), 112–134

work page 2018
[22]

Aditya Siddhant and Zachary C Lipton. 2018. Deep Bayesian active learning for natural language processing: Results of a large-scale empirical study. arXiv preprint arXiv:1808.05697 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

Synced. 2018. 2018 in Review: 10 AI Failures. Retrieved May 17, 2019 from https: //medium.com/syncedreview/2018-in-review-10-ai-failures-c18faadf5983

work page 2018

[1] [1]

Yu Cheng, Fei Wang, Ping Zhang, and Jianying Hu. 2016. Risk prediction with electronic health records: A deep learning approach. In Proceedings of the 2016 SIAM International Conference on Data Mining . SIAM, 432–440

work page 2016

[2] [2]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural lan- guage processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning . ACM, 160–167

work page 2008

[3] [3]

Martin Cooke, Phil Green, Ljubomir Josifovski, and Ascension Vizinho. 2001. Robust automatic speech recognition with missing and unreliable acoustic data. Speech communication 34, 3 (2001), 267–285

work page 2001

[4] [4]

Armen Der Kiureghian and Ove Ditlevsen. 2009. Aleatory or epistemic? Does it matter? Structural Safety 31, 2 (2009), 105–112

work page 2009

[5] [5]

Yarin Gal. 2016. Uncertainty in deep learning . Ph.D. Dissertation. PhD thesis, University of Cambridge

work page 2016

[6] [6]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. 1050–1059

work page 2016

[7] [7]

Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks. In Advances in neural information processing systems. 1019–1027

work page 2016

[8] [8]

Zoubin Ghahramani. 2015. Probabilistic machine learning and artificial intelli- gence. Nature 521, 7553 (2015), 452

work page 2015

[9] [9]

Omer Gottesman, Fredrik Johansson, Matthieu Komorowski, Aldo Faisal, David Sontag, Finale Doshi-Velez, and Leo Anthony Celi. 2019. Guidelines for reinforce- ment learning in healthcare. Nature medicine 25, 1 (2019), 16–18

work page 2019

[10] [10]

Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, Greg Ver Steeg, and Aram Galstyan. 2017. Multitask learning and benchmarking with clinical time series data. arXiv preprint arXiv:1703.07771 (2017)

work page arXiv 2017

[11] [11]

Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database.Scientific data 3 (2016), 160035

work page 2016

[12] [12]

Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision?. In Advances in neural information processing systems. 5574–5584

work page 2017

[13] [13]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifica- tion with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105

work page 2012

[14] [14]

Martin Krzywinski and Naomi Altman. 2013. Points of significance: Importance of being uncertain

work page 2013

[15] [15]

Quoc V Le, Alex J Smola, and Stéphane Canu. 2005. Heteroscedastic Gaussian process regression. In Proceedings of the 22nd international conference on Machine learning. ACM, 489–496

work page 2005

[16] [16]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436

work page 2015

[17] [17]

Sergey Levine, Peter Pastor, Alex Krizhevsky, Julian Ibarz, and Deirdre Quillen

work page

[18] [18]

The International Journal of Robotics Research 37, 4-5 (2018), 421–436

Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research 37, 4-5 (2018), 421–436

work page 2018

[19] [19]

Radford M Neal. 1993. Bayesian learning via stochastic dynamics. In Advances in neural information processing systems . 475–482

work page 1993

[20] [20]

Sanjay Purushotham, Chuizheng Meng, Zhengping Che, and Yan Liu. 2017. Bench- mark of deep learning models on large healthcare mimic datasets. arXiv preprint arXiv:1710.08531 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

Sanjay Purushotham, Chuizheng Meng, Zhengping Che, and Yan Liu. 2018. Bench- marking deep learning models on large healthcare datasets. Journal of biomedical informatics 83 (2018), 112–134

work page 2018

[22] [22]

Aditya Siddhant and Zachary C Lipton. 2018. Deep Bayesian active learning for natural language processing: Results of a large-scale empirical study. arXiv preprint arXiv:1808.05697 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [23]

Synced. 2018. 2018 in Review: 10 AI Failures. Retrieved May 17, 2019 from https: //medium.com/syncedreview/2018-in-review-10-ai-failures-c18faadf5983

work page 2018