Modeling the Uncertainty in Electronic Health Records: a Bayesian Deep Learning Approach
Pith reviewed 2026-05-24 21:48 UTC · model grok-4.3
The pith
Bayesian neural networks model data noise uncertainty in electronic health records to identify cases where reducing noise improves predictions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A Bayesian neural network is used to predict uncertainty induced by data noise in EHR data. Instances with high uncertainty harm model performance. Distributions of model prediction and uncertainty can identify a group of patients for intervention such that decreasing data noise benefits prediction accuracy for these patients.
What carries the argument
Bayesian Neural Network that estimates uncertainty corresponding to data noise in the input records.
If this is right
- Instances with high uncertainty reduce overall model performance.
- Distributions of predictions and uncertainty reveal specific patient groups.
- Targeted reduction of data noise for those groups increases prediction accuracy.
- Uncertainty estimates provide an additional confidence level for each prediction.
Where Pith is reading between the lines
- Similar uncertainty modeling could apply to other domains with noisy data like financial records or sensor readings.
- Healthcare systems might prioritize data quality improvements for high-uncertainty patient cohorts to optimize resources.
- If the mapping from uncertainty to noise holds, real-time uncertainty monitoring could guide data collection practices.
Load-bearing premise
The uncertainty output by the Bayesian neural network corresponds specifically to data noise in the EHR rather than model misspecification or other sources.
What would settle it
Measuring the correlation between the model's uncertainty values and independently quantified data noise levels in the EHR instances, or testing whether noise reduction in high-uncertainty cases fails to improve accuracy.
Figures
read the original abstract
Deep learning models have exhibited superior performance in predictive tasks with the explosively increasing Electronic Health Records (EHR). However, due to the lack of transparency, behaviors of deep learning models are difficult to interpret. Without trustworthiness, deep learning models will not be able to assist in the real-world decision-making process of healthcare issues. We propose a deep learning model based on Bayesian Neural Networks (BNN) to predict uncertainty induced by data noise. The uncertainty is introduced to provide model predictions with an extra level of confidence. Our experiments verify that instances with high uncertainty are harmful to model performance. Moreover, by investigating the distributions of model prediction and uncertainty, we show that it is possible to identify a group of patients for timely intervention, such that decreasing data noise will benefit more on the prediction accuracy for these patients.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a Bayesian Neural Network (BNN) approach to model uncertainty induced by data noise in Electronic Health Records (EHR) for predictive tasks. It claims that this provides an extra level of confidence in predictions, that experiments verify high-uncertainty instances harm model performance, and that analyzing distributions of predictions and uncertainty enables identification of patient groups for timely intervention where reducing data noise would most improve accuracy.
Significance. If the central claims hold with proper validation, the work would offer a practical method for uncertainty quantification in healthcare ML that ties estimates specifically to data noise, potentially improving model trustworthiness and guiding efficient data-quality interventions in EHR systems. This addresses a key barrier to clinical adoption of deep learning and aligns with needs for interpretable, reliable predictions in high-stakes domains.
major comments (3)
- Abstract: the claim that 'experiments verify that instances with high uncertainty are harmful to model performance' provides no quantitative results, baselines, error bars, dataset details, or performance metrics, rendering the central empirical claim unevaluable from the manuscript.
- Abstract: no derivation, architectural specification, or ablation is given to show how the BNN isolates uncertainty induced by data noise (aleatoric) from epistemic uncertainty or model misspecification; this separation is load-bearing for the intervention recommendation but is assumed without evidence.
- Abstract: the suggestion that investigating prediction and uncertainty distributions 'makes it possible to identify a group of patients for timely intervention' lacks any concrete method, threshold, or validation showing that high-uncertainty cases exhibit measurably higher label/feature noise rather than other sources.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below, clarifying aspects of the manuscript and indicating where revisions will be made.
read point-by-point responses
-
Referee: Abstract: the claim that 'experiments verify that instances with high uncertainty are harmful to model performance' provides no quantitative results, baselines, error bars, dataset details, or performance metrics, rendering the central empirical claim unevaluable from the manuscript.
Authors: The abstract is a concise summary. The full manuscript details the experiments, including quantitative performance metrics, baselines, error bars, and dataset information (e.g., on MIMIC-III) showing degraded performance for high-uncertainty instances. We will revise the abstract to include key quantitative highlights from the experiments section. revision: yes
-
Referee: Abstract: no derivation, architectural specification, or ablation is given to show how the BNN isolates uncertainty induced by data noise (aleatoric) from epistemic uncertainty or model misspecification; this separation is load-bearing for the intervention recommendation but is assumed without evidence.
Authors: Section 3 specifies the BNN architecture applied to EHR data to model uncertainty from data noise. The approach targets predictive uncertainty attributable to noisy inputs rather than providing an explicit derivation separating aleatoric from epistemic components. We will add a clarifying paragraph on this modeling choice in the revision. revision: partial
-
Referee: Abstract: the suggestion that investigating prediction and uncertainty distributions 'makes it possible to identify a group of patients for timely intervention' lacks any concrete method, threshold, or validation showing that high-uncertainty cases exhibit measurably higher label/feature noise rather than other sources.
Authors: The manuscript uses distribution analysis in the experiments to identify high-uncertainty patient groups for potential intervention. We acknowledge the absence of explicit thresholds and direct empirical validation against measured noise levels. We will incorporate a concrete identification method and supporting validation in the revised manuscript. revision: yes
Circularity Check
No significant circularity; derivation relies on standard BNN properties without reduction to inputs
full rationale
The paper introduces a BNN model whose uncertainty output is presented as directly modeling data noise in EHR, with experiments confirming that high-uncertainty instances harm performance. No equations, derivations, or parameter-fitting steps are exhibited that would make the uncertainty prediction equivalent to its inputs by construction. The architecture follows established BNN methods for uncertainty quantification, and the mapping to data noise is an interpretive claim rather than a self-referential definition or fitted renaming. No self-citations are load-bearing for the core result, and the work remains self-contained against external benchmarks for BNN uncertainty.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Yu Cheng, Fei Wang, Ping Zhang, and Jianying Hu. 2016. Risk prediction with electronic health records: A deep learning approach. In Proceedings of the 2016 SIAM International Conference on Data Mining . SIAM, 432–440
work page 2016
-
[2]
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural lan- guage processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning . ACM, 160–167
work page 2008
-
[3]
Martin Cooke, Phil Green, Ljubomir Josifovski, and Ascension Vizinho. 2001. Robust automatic speech recognition with missing and unreliable acoustic data. Speech communication 34, 3 (2001), 267–285
work page 2001
-
[4]
Armen Der Kiureghian and Ove Ditlevsen. 2009. Aleatory or epistemic? Does it matter? Structural Safety 31, 2 (2009), 105–112
work page 2009
-
[5]
Yarin Gal. 2016. Uncertainty in deep learning . Ph.D. Dissertation. PhD thesis, University of Cambridge
work page 2016
-
[6]
Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. 1050–1059
work page 2016
-
[7]
Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks. In Advances in neural information processing systems. 1019–1027
work page 2016
-
[8]
Zoubin Ghahramani. 2015. Probabilistic machine learning and artificial intelli- gence. Nature 521, 7553 (2015), 452
work page 2015
-
[9]
Omer Gottesman, Fredrik Johansson, Matthieu Komorowski, Aldo Faisal, David Sontag, Finale Doshi-Velez, and Leo Anthony Celi. 2019. Guidelines for reinforce- ment learning in healthcare. Nature medicine 25, 1 (2019), 16–18
work page 2019
- [10]
-
[11]
Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database.Scientific data 3 (2016), 160035
work page 2016
-
[12]
Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision?. In Advances in neural information processing systems. 5574–5584
work page 2017
-
[13]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifica- tion with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105
work page 2012
-
[14]
Martin Krzywinski and Naomi Altman. 2013. Points of significance: Importance of being uncertain
work page 2013
-
[15]
Quoc V Le, Alex J Smola, and Stéphane Canu. 2005. Heteroscedastic Gaussian process regression. In Proceedings of the 22nd international conference on Machine learning. ACM, 489–496
work page 2005
-
[16]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436
work page 2015
-
[17]
Sergey Levine, Peter Pastor, Alex Krizhevsky, Julian Ibarz, and Deirdre Quillen
-
[18]
The International Journal of Robotics Research 37, 4-5 (2018), 421–436
Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research 37, 4-5 (2018), 421–436
work page 2018
-
[19]
Radford M Neal. 1993. Bayesian learning via stochastic dynamics. In Advances in neural information processing systems . 475–482
work page 1993
-
[20]
Sanjay Purushotham, Chuizheng Meng, Zhengping Che, and Yan Liu. 2017. Bench- mark of deep learning models on large healthcare mimic datasets. arXiv preprint arXiv:1710.08531 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[21]
Sanjay Purushotham, Chuizheng Meng, Zhengping Che, and Yan Liu. 2018. Bench- marking deep learning models on large healthcare datasets. Journal of biomedical informatics 83 (2018), 112–134
work page 2018
-
[22]
Aditya Siddhant and Zachary C Lipton. 2018. Deep Bayesian active learning for natural language processing: Results of a large-scale empirical study. arXiv preprint arXiv:1808.05697 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[23]
Synced. 2018. 2018 in Review: 10 AI Failures. Retrieved May 17, 2019 from https: //medium.com/syncedreview/2018-in-review-10-ai-failures-c18faadf5983
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.