pith. sign in

arxiv: 1907.06162 · v1 · pith:YUFIRJ64new · submitted 2019-07-14 · 💻 cs.LG · stat.ML

Modeling the Uncertainty in Electronic Health Records: a Bayesian Deep Learning Approach

Pith reviewed 2026-05-24 21:48 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords uncertainty quantificationBayesian deep learningelectronic health recordsdata noisepredictive healthcare modelspatient stratificationmodel confidence
0
0 comments X

The pith

Bayesian neural networks model data noise uncertainty in electronic health records to identify cases where reducing noise improves predictions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Bayesian neural network approach for predicting uncertainty caused by data noise in electronic health records. This uncertainty measure adds confidence to model predictions in healthcare tasks. High-uncertainty instances are shown to negatively impact model performance. Examining the distributions of predictions and uncertainties allows identification of patient groups that would benefit most from data noise reduction through timely intervention.

Core claim

A Bayesian neural network is used to predict uncertainty induced by data noise in EHR data. Instances with high uncertainty harm model performance. Distributions of model prediction and uncertainty can identify a group of patients for intervention such that decreasing data noise benefits prediction accuracy for these patients.

What carries the argument

Bayesian Neural Network that estimates uncertainty corresponding to data noise in the input records.

If this is right

  • Instances with high uncertainty reduce overall model performance.
  • Distributions of predictions and uncertainty reveal specific patient groups.
  • Targeted reduction of data noise for those groups increases prediction accuracy.
  • Uncertainty estimates provide an additional confidence level for each prediction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar uncertainty modeling could apply to other domains with noisy data like financial records or sensor readings.
  • Healthcare systems might prioritize data quality improvements for high-uncertainty patient cohorts to optimize resources.
  • If the mapping from uncertainty to noise holds, real-time uncertainty monitoring could guide data collection practices.

Load-bearing premise

The uncertainty output by the Bayesian neural network corresponds specifically to data noise in the EHR rather than model misspecification or other sources.

What would settle it

Measuring the correlation between the model's uncertainty values and independently quantified data noise levels in the EHR instances, or testing whether noise reduction in high-uncertainty cases fails to improve accuracy.

Figures

Figures reproduced from arXiv: 1907.06162 by Michael Dulin, Mirsad Hadzikadic, Riyi Qiu, Xi Niu, Xin Wang, Yugang Jia.

Figure 1
Figure 1. Figure 1: The AUC comparison between low uncertainty [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: The trend of AUC as more data are removed, aver [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
read the original abstract

Deep learning models have exhibited superior performance in predictive tasks with the explosively increasing Electronic Health Records (EHR). However, due to the lack of transparency, behaviors of deep learning models are difficult to interpret. Without trustworthiness, deep learning models will not be able to assist in the real-world decision-making process of healthcare issues. We propose a deep learning model based on Bayesian Neural Networks (BNN) to predict uncertainty induced by data noise. The uncertainty is introduced to provide model predictions with an extra level of confidence. Our experiments verify that instances with high uncertainty are harmful to model performance. Moreover, by investigating the distributions of model prediction and uncertainty, we show that it is possible to identify a group of patients for timely intervention, such that decreasing data noise will benefit more on the prediction accuracy for these patients.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes a Bayesian Neural Network (BNN) approach to model uncertainty induced by data noise in Electronic Health Records (EHR) for predictive tasks. It claims that this provides an extra level of confidence in predictions, that experiments verify high-uncertainty instances harm model performance, and that analyzing distributions of predictions and uncertainty enables identification of patient groups for timely intervention where reducing data noise would most improve accuracy.

Significance. If the central claims hold with proper validation, the work would offer a practical method for uncertainty quantification in healthcare ML that ties estimates specifically to data noise, potentially improving model trustworthiness and guiding efficient data-quality interventions in EHR systems. This addresses a key barrier to clinical adoption of deep learning and aligns with needs for interpretable, reliable predictions in high-stakes domains.

major comments (3)
  1. Abstract: the claim that 'experiments verify that instances with high uncertainty are harmful to model performance' provides no quantitative results, baselines, error bars, dataset details, or performance metrics, rendering the central empirical claim unevaluable from the manuscript.
  2. Abstract: no derivation, architectural specification, or ablation is given to show how the BNN isolates uncertainty induced by data noise (aleatoric) from epistemic uncertainty or model misspecification; this separation is load-bearing for the intervention recommendation but is assumed without evidence.
  3. Abstract: the suggestion that investigating prediction and uncertainty distributions 'makes it possible to identify a group of patients for timely intervention' lacks any concrete method, threshold, or validation showing that high-uncertainty cases exhibit measurably higher label/feature noise rather than other sources.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below, clarifying aspects of the manuscript and indicating where revisions will be made.

read point-by-point responses
  1. Referee: Abstract: the claim that 'experiments verify that instances with high uncertainty are harmful to model performance' provides no quantitative results, baselines, error bars, dataset details, or performance metrics, rendering the central empirical claim unevaluable from the manuscript.

    Authors: The abstract is a concise summary. The full manuscript details the experiments, including quantitative performance metrics, baselines, error bars, and dataset information (e.g., on MIMIC-III) showing degraded performance for high-uncertainty instances. We will revise the abstract to include key quantitative highlights from the experiments section. revision: yes

  2. Referee: Abstract: no derivation, architectural specification, or ablation is given to show how the BNN isolates uncertainty induced by data noise (aleatoric) from epistemic uncertainty or model misspecification; this separation is load-bearing for the intervention recommendation but is assumed without evidence.

    Authors: Section 3 specifies the BNN architecture applied to EHR data to model uncertainty from data noise. The approach targets predictive uncertainty attributable to noisy inputs rather than providing an explicit derivation separating aleatoric from epistemic components. We will add a clarifying paragraph on this modeling choice in the revision. revision: partial

  3. Referee: Abstract: the suggestion that investigating prediction and uncertainty distributions 'makes it possible to identify a group of patients for timely intervention' lacks any concrete method, threshold, or validation showing that high-uncertainty cases exhibit measurably higher label/feature noise rather than other sources.

    Authors: The manuscript uses distribution analysis in the experiments to identify high-uncertainty patient groups for potential intervention. We acknowledge the absence of explicit thresholds and direct empirical validation against measured noise levels. We will incorporate a concrete identification method and supporting validation in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard BNN properties without reduction to inputs

full rationale

The paper introduces a BNN model whose uncertainty output is presented as directly modeling data noise in EHR, with experiments confirming that high-uncertainty instances harm performance. No equations, derivations, or parameter-fitting steps are exhibited that would make the uncertainty prediction equivalent to its inputs by construction. The architecture follows established BNN methods for uncertainty quantification, and the mapping to data noise is an interpretive claim rather than a self-referential definition or fitted renaming. No self-citations are load-bearing for the core result, and the work remains self-contained against external benchmarks for BNN uncertainty.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no free parameters, axioms, or invented entities are explicitly introduced or quantified.

pith-pipeline@v0.9.0 · 5678 in / 966 out tokens · 14408 ms · 2026-05-24T21:48:02.334929+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 2 internal anchors

  1. [1]

    Yu Cheng, Fei Wang, Ping Zhang, and Jianying Hu. 2016. Risk prediction with electronic health records: A deep learning approach. In Proceedings of the 2016 SIAM International Conference on Data Mining . SIAM, 432–440

  2. [2]

    Ronan Collobert and Jason Weston. 2008. A unified architecture for natural lan- guage processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning . ACM, 160–167

  3. [3]

    Martin Cooke, Phil Green, Ljubomir Josifovski, and Ascension Vizinho. 2001. Robust automatic speech recognition with missing and unreliable acoustic data. Speech communication 34, 3 (2001), 267–285

  4. [4]

    Armen Der Kiureghian and Ove Ditlevsen. 2009. Aleatory or epistemic? Does it matter? Structural Safety 31, 2 (2009), 105–112

  5. [5]

    Yarin Gal. 2016. Uncertainty in deep learning . Ph.D. Dissertation. PhD thesis, University of Cambridge

  6. [6]

    Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. 1050–1059

  7. [7]

    Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks. In Advances in neural information processing systems. 1019–1027

  8. [8]

    Zoubin Ghahramani. 2015. Probabilistic machine learning and artificial intelli- gence. Nature 521, 7553 (2015), 452

  9. [9]

    Omer Gottesman, Fredrik Johansson, Matthieu Komorowski, Aldo Faisal, David Sontag, Finale Doshi-Velez, and Leo Anthony Celi. 2019. Guidelines for reinforce- ment learning in healthcare. Nature medicine 25, 1 (2019), 16–18

  10. [10]

    Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, Greg Ver Steeg, and Aram Galstyan. 2017. Multitask learning and benchmarking with clinical time series data. arXiv preprint arXiv:1703.07771 (2017)

  11. [11]

    Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database.Scientific data 3 (2016), 160035

  12. [12]

    Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision?. In Advances in neural information processing systems. 5574–5584

  13. [13]

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifica- tion with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105

  14. [14]

    Martin Krzywinski and Naomi Altman. 2013. Points of significance: Importance of being uncertain

  15. [15]

    Quoc V Le, Alex J Smola, and Stéphane Canu. 2005. Heteroscedastic Gaussian process regression. In Proceedings of the 22nd international conference on Machine learning. ACM, 489–496

  16. [16]

    Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436

  17. [17]

    Sergey Levine, Peter Pastor, Alex Krizhevsky, Julian Ibarz, and Deirdre Quillen

  18. [18]

    The International Journal of Robotics Research 37, 4-5 (2018), 421–436

    Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research 37, 4-5 (2018), 421–436

  19. [19]

    Radford M Neal. 1993. Bayesian learning via stochastic dynamics. In Advances in neural information processing systems . 475–482

  20. [20]

    Sanjay Purushotham, Chuizheng Meng, Zhengping Che, and Yan Liu. 2017. Bench- mark of deep learning models on large healthcare mimic datasets. arXiv preprint arXiv:1710.08531 (2017)

  21. [21]

    Sanjay Purushotham, Chuizheng Meng, Zhengping Che, and Yan Liu. 2018. Bench- marking deep learning models on large healthcare datasets. Journal of biomedical informatics 83 (2018), 112–134

  22. [22]

    Aditya Siddhant and Zachary C Lipton. 2018. Deep Bayesian active learning for natural language processing: Results of a large-scale empirical study. arXiv preprint arXiv:1808.05697 (2018)

  23. [23]

    Synced. 2018. 2018 in Review: 10 AI Failures. Retrieved May 17, 2019 from https: //medium.com/syncedreview/2018-in-review-10-ai-failures-c18faadf5983