pith. sign in

arxiv: 2512.20288 · v2 · submitted 2025-12-23 · 💻 cs.CV · cs.AI

UbiQVision: Quantifying Uncertainty in XAI for Image Recognition

Pith reviewed 2026-05-16 20:07 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords uncertainty quantificationSHAP explanationsmedical imagingXAIDempster-Shafer theoryDirichlet samplingimage recognitionepistemic uncertainty
0
0 comments X

The pith

Dirichlet posterior sampling and Dempster-Shafer theory can quantify instability in SHAP explanations for medical image classifiers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method to measure how unreliable SHAP explanations become when models face uncertainty in medical imaging. It combines Dirichlet sampling to model posterior distributions with Dempster-Shafer theory to assign belief and plausibility values to explanation features. This matters because doctors rely on these explanations to trust AI predictions, yet unstable ones can lead to wrong decisions. The approach generates fusion maps that highlight uncertain regions in the explanation. Evaluation on pathology, ophthalmology, and radiology datasets shows it can track uncertainty arising from image noise and varying resolutions.

Core claim

The central discovery is a framework called UbiQVision that uses Dirichlet posterior sampling to capture epistemic and aleatoric uncertainty in SHAP values, then applies Dempster-Shafer theory to compute belief maps, plausibility maps, and fusion maps, providing a quantitative measure of explanation uncertainty without requiring ground-truth labels.

What carries the argument

Dirichlet posterior sampling fused with Dempster-Shafer belief and plausibility functions to produce uncertainty maps from SHAP explanations.

If this is right

  • Clinicians can identify which parts of a SHAP explanation are trustworthy in noisy medical scans.
  • The method allows statistical comparison of uncertainty levels across different imaging modalities.
  • It provides a way to fuse multiple uncertain explanations into a single reliable visualization.
  • Models with high uncertainty in explanations can be flagged for further review before clinical use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This could extend to other XAI methods beyond SHAP, such as LIME, by applying the same sampling and fusion process.
  • In non-medical domains like autonomous driving, it might help quantify trust in visual explanations under sensor noise.
  • Future work could test if these uncertainty scores correlate with actual model error rates on held-out test sets.

Load-bearing premise

The assumption that Dirichlet sampling combined with Dempster-Shafer theory yields a faithful measure of SHAP instability without adding new biases or needing separate validation data.

What would settle it

Run the framework on a dataset where SHAP explanations are known to be stable, such as synthetic images with no noise, and check if the uncertainty scores remain near zero.

Figures

Figures reproduced from arXiv: 2512.20288 by Akshat Dubey, Aleksandar An\v{z}el, Bahar \.Ilgen, Georges Hattab.

Figure 1
Figure 1. Figure 1: On the left, the plot demonstrates the impact of the temperature parameter (T) on the model weights for Models A, B, and C, as evaluated on the test dataset using a specific metric on a logarithmic scale. A low T value means that a model with even a slightly higher F1 score receives the highest weight, while a high T value means that the models will be allotted equal weight, irrespective of their performan… view at source ↗
Figure 2
Figure 2. Figure 2: The visualization shows the SHAP values for the two classes of the malaria dataset from the predictions of three models: a custom CNN model, a ResNet model, and a ViT model, with weights of 0.37, 0.32, and 0.31, respectively. The "uninfected" class receives positive attribution (𝜙 > 0) from the models for the erythrocyte’s interior regions with low-frequency spatial variation. Specifically, the smooth, hom… view at source ↗
Figure 3
Figure 3. Figure 3: This figure shows the fusion of SHAP explanations from the weighted model ensemble (Custom CNN, ResNet, and ViT) for a malaria-infected and uninfected sample. The figure (a) is of the parasitized sample. The belief mass (support) map shows a concentrated area of high belief (dark green), which precisely localizes the parasite. This indicates that the models found strong, consistent evidence at this locatio… view at source ↗
Figure 4
Figure 4. Figure 4: Figure (a) shows the distribution of evidence and model confidence for the parasitized sample. It presents the statistical analysis of the fusion process for the infected erythrocyte. The left panel shows the kernel density estimation (KDE) of pixel-wise mass values. The belief mass (green) is heavily skewed toward zero, but it has a noticeable tail that extends into higher values. This statistically confi… view at source ↗
Figure 5
Figure 5. Figure 5: This visualization shows the SHAP attribution maps (𝜙) for each dementia stage. It details the additive feature attribution scores for the ensemble members, where the color intensity corresponds to the impact on the model’s log-odds output. Red pixels denote positive SHAP values (phi > 0), indicating morphological regions, such as enlarged ventricles or cortical atrophy, that drive classification toward a … view at source ↗
Figure 6
Figure 6. Figure 6: This figure illustrates the correlation between feature distinctness and model confidence by presenting the pixel-wise fusion of SHAP explanations from the weighted model ensemble (Custom CNN, ResNet, and ViT) across four stages of dementia. (a) Mild Dementia: The belief mass (support) map shows localized clusters of evidence (green) that correspond to emerging pathological features. The uncertainty map sh… view at source ↗
Figure 7
Figure 7. Figure 7: This figure shows the quantitative analytics of the fusion engine by contrasting the statistical evidence distribution on the left with the assigned Bayesian model confidence on the right. The bar charts confirm that Custom CNN (𝑤 = 0.362) has the greatest influence on the ensemble, followed closely by ResNet and ViT. This indicates a preference for local texture features over global dependencies. The kern… view at source ↗
Figure 8
Figure 8. Figure 8: This figure illustrates the fused SHAP explanations and quantitative analytics for the diabetic retinopathy (DR) classification ensemble at different severity levels. The attribution maps in the top rows visualize the pixel-wise contribution of each model (Custom CNN, ResNet, and ViT). Red indicates positive evidence for the target class, and blue indicates suppression. In advanced stages, such as Prolifer… view at source ↗
Figure 9
Figure 9. Figure 9: This figure illustrates the Dempster-Shafer fusion of model explanations for classifying diabetic retinopathy (DR), tracking the evolution of evidential support across five distinct severity levels. (a) Healthy: The fusion maps for healthy retinas exhibit minimal belief mass (pale/empty) and high, uniform uncertainty (bright yellow). This indicates that the ensemble’s decision is driven by the absence of p… view at source ↗
Figure 10
Figure 10. Figure 10: This figure shows the fusion maps of SHAP explanations produced by a Bayesian-weighted model ensemble (Custom CNN, ResNet, and ViT) at different levels of diabetic retinopathy (DR) severity. The individual attribution maps demonstrate that advanced disease states, such as severe and proliferative DR, result in dense, widespread positive contributions (red pixels) across the various architectures. In contr… view at source ↗
read the original abstract

Recent advances in deep learning have led to its widespread adoption across diverse domains, including medical imaging. This progress is driven by increasingly sophisticated model architectures, such as ResNets, Vision Transformers, and Hybrid Convolutional Neural Networks, that offer enhanced performance at the cost of greater complexity. This complexity often compromises model explainability and interpretability. SHAP has emerged as a prominent method for providing interpretable visualizations that aid domain experts in understanding model predictions. However, SHAP explanations can be unstable and unreliable in the presence of epistemic and aleatoric uncertainty. In this study, we address this challenge by using Dirichlet posterior sampling and Dempster-Shafer theory to quantify the uncertainty that arises from these unstable explanations in medical imaging applications. The framework uses a belief, plausible, and fusion map approach alongside statistical quantitative analysis to produce quantification of uncertainty in SHAP. Furthermore, we evaluated our framework on three medical imaging datasets with varying class distributions, image qualities, and modality types which introduces noise due to varying image resolutions and modality-specific aspect covering the examples from pathology, ophthalmology, and radiology, introducing significant epistemic uncertainty.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces UbiQVision, a framework that applies Dirichlet posterior sampling combined with Dempster-Shafer theory to quantify uncertainty arising from unstable SHAP explanations in deep learning models for medical image recognition. It generates belief, plausibility, and fusion maps, performs statistical quantitative analysis, and evaluates the approach on three medical imaging datasets spanning pathology, ophthalmology, and radiology that vary in class distribution, image quality, and modality.

Significance. If the method can be shown to produce maps that reliably track actual SHAP instability, the work would address a practical gap in deploying XAI for high-stakes medical decisions where explanation variance can undermine trust. The use of DST belief/plausibility constructs on top of Dirichlet sampling is a plausible direction, but the current manuscript supplies no equations, implementation details, or validation experiments, so the significance cannot yet be assessed.

major comments (2)
  1. [Abstract/Methods] Abstract and Methods: the central claim that Dirichlet posterior sampling plus Dempster-Shafer theory yields a faithful quantification of SHAP instability is unsupported because no equations, sampling procedure, or fusion rule are provided; without these it is impossible to determine whether the belief/plausibility maps reflect epistemic instability in the explanations or merely modeling artifacts.
  2. [Results] Results/Evaluation: no empirical check is reported that the produced belief or fusion maps correlate with observable SHAP variance (e.g., across random seeds, input perturbations, or repeated explanations on identical images), leaving open the possibility that the maps capture aleatoric noise rather than the targeted explanation instability.
minor comments (2)
  1. [Abstract] The abstract refers to 'statistical quantitative analysis' without naming the specific metrics, confidence intervals, or hypothesis tests employed.
  2. [Experiments] Dataset descriptions mention 'varying image resolutions and modality-specific aspect' but do not report exact image sizes, preprocessing steps, or how these factors were controlled in the uncertainty quantification.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: [Abstract/Methods] Abstract and Methods: the central claim that Dirichlet posterior sampling plus Dempster-Shafer theory yields a faithful quantification of SHAP instability is unsupported because no equations, sampling procedure, or fusion rule are provided; without these it is impossible to determine whether the belief/plausibility maps reflect epistemic instability in the explanations or merely modeling artifacts.

    Authors: We agree that the current manuscript lacks explicit equations and procedural details for the Dirichlet posterior sampling and the Dempster-Shafer fusion rules. This omission makes it difficult for readers to fully assess the method. In the revised version, we will expand the Methods section to include the full mathematical formulation: the Dirichlet distribution used for posterior sampling of explanation weights, the Monte Carlo sampling procedure to generate an ensemble of SHAP maps, and the specific combination rules for computing belief and plausibility from the sampled explanations. These additions will demonstrate that the resulting maps specifically capture the variance due to SHAP instability. revision: yes

  2. Referee: [Results] Results/Evaluation: no empirical check is reported that the produced belief or fusion maps correlate with observable SHAP variance (e.g., across random seeds, input perturbations, or repeated explanations on identical images), leaving open the possibility that the maps capture aleatoric noise rather than the targeted explanation instability.

    Authors: We acknowledge the importance of empirical validation to confirm that the uncertainty maps track SHAP instability rather than other sources of noise. The current evaluation focuses on qualitative and statistical analysis across datasets, but does not include direct correlation studies. We will add new experiments in the revised manuscript: for a subset of images, we will generate multiple SHAP explanations under controlled variations (different seeds, slight input perturbations), compute the variance in the explanation values, and show that the belief and fusion maps have high correlation with these variance measures, supported by quantitative metrics such as Pearson correlation coefficients. revision: yes

Circularity Check

0 steps flagged

No circularity detected in the derivation chain

full rationale

The abstract describes a framework that applies Dirichlet posterior sampling and Dempster-Shafer theory to produce belief, plausibility, and fusion maps for quantifying SHAP instability. No equations, parameter-fitting steps, or self-citations are shown that would reduce any claimed output to an input by construction. The approach introduces new constructs (belief/plausibility maps plus statistical analysis) rather than re-labeling fitted quantities or importing uniqueness results from prior self-work. Without load-bearing reductions visible in the provided text, the derivation chain is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only access prevents identification of any concrete free parameters, axioms, or invented entities; full text would be required to audit these elements.

pith-pipeline@v0.9.0 · 5507 in / 1205 out tokens · 33298 ms · 2026-05-16T20:07:24.305629+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

94 extracted references · 94 canonical work pages · 1 internal anchor

  1. [1]

    Vision transformers in medical imaging: a comprehensive review of advancements and applications across multiple diseases

    Aburass, S., Dorgham, O., Al Shaqsi, J., Abu Rumman, M., Al-Kadi, O., 2025. Vision transformers in medical imaging: a comprehensive review of advancements and applications across multiple diseases. Journal of Imaging Informatics in Medicine , 1–44

  2. [2]

    The role of big data in healthcare: A review of implications for patient outcomes and treatment personalization

    Adeghe, E.P., Okolo, C.A., Ojeyinka, O.T., 2024. The role of big data in healthcare: A review of implications for patient outcomes and treatment personalization. World Journal of Biology Pharmacy and Health Sciences 17, 198–204

  3. [3]

    Improvingmalariadiagnosisthroughinterpretablecustomizedcnnsarchitectures

    Ahamed,M.F.,Nahiduzzaman,M.,Mahmud,G.,Shafi,F.B.,Ayari,M.A.,Khandakar,A.,Abdullah-Al-Wadud,M.,Islam, S.R.,2025. Improvingmalariadiagnosisthroughinterpretablecustomizedcnnsarchitectures. ScientificReports15,6484

  4. [4]

    Early cancer detection using deep learning and medical imaging: A survey

    Ahmad, I., Alqurashi, F., 2024. Early cancer detection using deep learning and medical imaging: A survey. Critical Reviews in Oncology/Hematology 204, 104528

  5. [5]

    Hippocampal atrophy and ventricular enlargement in normal aging, mild cognitive impairment (mci), and alzheimer disease

    Apostolova, L.G., Green, A.E., Babakchanian, S., Hwang, K.S., Chou, Y.Y., Toga, A.W., Thompson, P.M., 2012. Hippocampal atrophy and ventricular enlargement in normal aging, mild cognitive impairment (mci), and alzheimer disease. Alzheimer Disease & Associated Disorders 26, 17–27

  6. [6]

    Detection and grading of diabetic retinopathy in retinal images using deep intelligent systems: A comprehensive review

    Asha Gnana Priya, H., Anitha, J., Popescu, D.E., Asokan, A., Jude Hemanth, D., Son, L.H., 2021. Detection and grading of diabetic retinopathy in retinal images using deep intelligent systems: A comprehensive review. Computers, Materials & Continua 66

  7. [7]

    Atad, M., Schinz, D., Moeller, H., Graf, R., Wiestler, B., Rueckert, D., Navab, N., Kirschke, J.S., Keicher, M., et al.,

  8. [8]

    Machine Learning for Biomedical Imaging 2, 2103–2125

    Counterfactual explanations for medical image classification and regression using diffusion autoencoder. Machine Learning for Biomedical Imaging 2, 2103–2125

  9. [9]

    Ba,W.,Wu,H.,Chen,W.W.,Wang,S.H.,Zhang,Z.Y.,Wei,X.J.,Wang,W.J.,Yang,L.,Zhou,D.M.,Zhuang,Y.X.,etal.,

  10. [10]

    European Journal of Cancer 169, 156–165

    Convolutional neural network assistance significantly improves dermatologists’ diagnosis of cutaneous tumours using clinical images. European Journal of Cancer 169, 156–165

  11. [11]

    Uncertainty quantification in medical image synthesis, in: Biomedical image synthesis and simulation

    Barbano, R., Arridge, S., Jin, B., Tanno, R., 2022. Uncertainty quantification in medical image synthesis, in: Biomedical image synthesis and simulation. Elsevier, pp. 601–641

  12. [12]

    Evaluatingtheexplainabilityofvisiontransformersinmedicalimaging

    Barekatain,L.,Glocker,B.,2025. Evaluatingtheexplainabilityofvisiontransformersinmedicalimaging. arXivpreprint arXiv:2510.12021

  13. [13]

    Some aspects of dempster-shafer evidence theory for classification of multi-modality medical images taking partial volume effect into account

    Bloch, I., 1996. Some aspects of dempster-shafer evidence theory for classification of multi-modality medical images taking partial volume effect into account. Pattern Recognition Letters 17, 905–919

  14. [14]

    Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks, in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE

    Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N., 2018. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks, in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE. pp. 839–847

  15. [15]

    Explaining a series of models by propagating shapley values

    Chen, H., Lundberg, S.M., Lee, S.I., 2022. Explaining a series of models by propagating shapley values. Nature communications 13, 4512

  16. [16]

    Evidence-based uncertainty-aware semi-supervised medical image segmentation

    Chen, Y., Yang, Z., Shen, C., Wang, Z., Zhang, Z., Qin, Y., Wei, X., Lu, J., Liu, Y., Zhang, Y., 2024. Evidence-based uncertainty-aware semi-supervised medical image segmentation. Computers in Biology and Medicine 170, 108004

  17. [17]

    Uncertainty propagation in xai: A comparison of analytical and empirical estimators, in: World Conference on Explainable Artificial Intelligence, Springer

    Chiaburu, T., Bießmann, F., Haußer, F., 2025. Uncertainty propagation in xai: A comparison of analytical and empirical estimators, in: World Conference on Explainable Artificial Intelligence, Springer. pp. 390–411

  18. [18]

    Chromatin-mediated epigenetic regulation in the malaria parasite plasmodium falciparum

    Cui, L., Miao, J., 2010. Chromatin-mediated epigenetic regulation in the malaria parasite plasmodium falciparum. Eukaryotic cell 9, 1138–1149

  19. [19]

    Explainable artificial intelligence (xai) in radiology and nuclear medicine: a literature review

    DeVries,B.M.,Zwezerijnen,G.J.,Burchell,G.L.,vanVelden,F.H.,Menke-vanderHouvenvanOordt,C.W.,Boellaard, R., 2023. Explainable artificial intelligence (xai) in radiology and nuclear medicine: a literature review. Frontiers in medicine 10, 1180773

  20. [20]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Dosovitskiy, A., 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  21. [21]

    Ubiqtree:UncertaintyquantificationinXAIwithtreeensembles

    Dubey,A.,Anžel,A.,İlgen,B.,Hattab,G.,2025. Ubiqtree:UncertaintyquantificationinXAIwithtreeensembles. arXiv preprint arXiv:2508.09639

  22. [22]

    Ai readiness in healthcare through storytelling XAI, in: EXPLIMED@ ECAI

    Dubey, A., Yang, Z., Hattab, G., 2024a. Ai readiness in healthcare through storytelling XAI, in: EXPLIMED@ ECAI

  23. [23]

    A nested model for AI design and validation

    Dubey, A., Yang, Z., Hattab, G., 2024b. A nested model for AI design and validation. Iscience 27

  24. [24]

    Diabetic retinopathy detection (2015)

    Dugas, E., Jared, J., Cukierski, W., . Diabetic retinopathy detection (2015). URL https://kaggle. com/competitions/diabetic-retinopathy-detection 7

  25. [25]

    Deep learning applications in medical image analysis: Advancements, challenges, and future directions

    Eli, A.A., Ali, A., 2024. Deep learning applications in medical image analysis: Advancements, challenges, and future directions. arXiv preprint arXiv:2410.14131

  26. [26]

    Esteban, L.M., Borque-Fernando, Á., Escorihuela, M.E., Esteban-Escaño, J., Abascal, J.M., Servian, P., Morote, J.,

  27. [27]

    Scientific reports 15, 4261

    Integrating radiological and clinical data for clinically significant prostate cancer detection with machine learning techniques. Scientific reports 15, 4261

  28. [28]

    Deep learning-enabled medical computer vision

    Esteva, A., Chou, K., Yeung, S., Naik, N., Madani, A., Mottaghi, A., Liu, Y., Topol, E., Dean, J., Socher, R., 2021. Deep learning-enabled medical computer vision. NPJ digital medicine 4, 5

  29. [29]

    Quantifying uncertainty in deep learning of radiologic images

    Faghani, S., Moassefi, M., Rouzrokh, P., Khosravi, B., Baffour, F.I., Ringler, M.D., Erickson, B.J., 2023. Quantifying uncertainty in deep learning of radiologic images. Radiology 308, e222217. :Preprint submitted to ElsevierPage 29 of 32

  30. [30]

    Dirichletprocesses,in:StochasticIntegrals:ProceedingsoftheLMSDurhamSymposium,July7–17, 1980, Springer

    Föllmer,H.,2006. Dirichletprocesses,in:StochasticIntegrals:ProceedingsoftheLMSDurhamSymposium,July7–17, 1980, Springer. pp. 476–478

  31. [31]

    Axiom-based grad-cam: Towards accurate visualization and explanation of cnns

    Fu, R., Hu, Q., Dong, X., Guo, Y., Gao, Y., Li, B., 2020. Axiom-based grad-cam: Towards accurate visualization and explanation of cnns. arXiv e-prints , arXiv–2008

  32. [32]

    Dropout as a bayesian approximation: Representing model uncertainty in deep learning, in: international conference on machine learning, PMLR

    Gal, Y., Ghahramani, Z., 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning, in: international conference on machine learning, PMLR. pp. 1050–1059

  33. [33]

    Uncertainty-aware visualization in medical imaging-a survey, in: Computer Graphics Forum, Wiley Online Library

    Gillmann, C., Saur, D., Wischgoll, T., Scheuermann, G., 2021. Uncertainty-aware visualization in medical imaging-a survey, in: Computer Graphics Forum, Wiley Online Library. pp. 665–689

  34. [34]

    Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp

    He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

  35. [35]

    Explainable ai for medical data: Current methods, limitations, and future directions

    Hossain, M.I., Zamzmi, G., Mouton, P.R., Salekin, M.S., Sun, Y., Goldgof, D., 2025. Explainable ai for medical data: Current methods, limitations, and future directions. ACM Computing Surveys 57, 1–46

  36. [36]

    Deepevidentialfusionwithuncertaintyquantificationandreliability learning for multimodal medical image segmentation

    Huang,L.,Ruan,S.,Decazes,P.,Denœux,T.,2025. Deepevidentialfusionwithuncertaintyquantificationandreliability learning for multimodal medical image segmentation. Information Fusion 113, 102648

  37. [37]

    A review of uncertainty quantification in medical image analysis: Probabilistic and non-probabilistic methods

    Huang, L., Ruan, S., Xing, Y., Feng, M., 2024. A review of uncertainty quantification in medical image analysis: Probabilistic and non-probabilistic methods. Medical Image Analysis 97, 103223

  38. [38]

    Layercam: Exploring hierarchical class activation maps for localization

    Jiang, P.T., Zhang, C.B., Hou, Q., Cheng, M.M., Wei, Y., 2021. Layercam: Exploring hierarchical class activation maps for localization. IEEE transactions on image processing 30, 5875–5888

  39. [39]

    Onuncertainty,tempering,anddataaugmentationinbayesian classification

    Kapoor,S.,Maddox,W.J.,Izmailov,P.,Wilson,A.G.,2022. Onuncertainty,tempering,anddataaugmentationinbayesian classification. Advances in neural information processing systems 35, 18211–18225

  40. [40]

    Diagnosing malaria patients with plasmodium falciparum and vivax using deep learning for thick smear images

    Kassim, Y.M., Yang, F., Yu, H., Maude, R.J., Jaeger, S., 2021. Diagnosing malaria patients with plasmodium falciparum and vivax using deep learning for thick smear images. Diagnostics 11, 1994

  41. [41]

    Interpretabilitybeyondfeatureattribution: Quantitative testing with concept activation vectors (tcav), in: International conference on machine learning, PMLR

    Kim,B.,Wattenberg,M.,Gilmer,J.,Cai,C.,Wexler,J.,Viegas,F.,etal.,2018. Interpretabilitybeyondfeatureattribution: Quantitative testing with concept activation vectors (tcav), in: International conference on machine learning, PMLR. pp. 2668–2677

  42. [42]

    Simple and scalable predictive uncertainty estimation using deep ensembles

    Lakshminarayanan, B., Pritzel, A., Blundell, C., 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems 30

  43. [43]

    Trustworthy clinical ai solutions: A unified review of uncertainty quantification in deep learning models for medical image analysis

    Lambert, B., Forbes, F., Doyle, S., Dehaene, H., Dojat, M., 2024. Trustworthy clinical ai solutions: A unified review of uncertainty quantification in deep learning models for medical image analysis. Artif. Intell. Medicine 150, 102830

  44. [44]

    Oasis-3:longitudinalneuroimaging,clinical,andcognitivedatasetfornormalagingand alzheimer disease

    LaMontagne, P.J., Benzinger, T.L., Morris, J.C., Keefe, S., Hornbeck, R., Xiong, C., Grant, E., Hassenstab, J., Moulder, K.,Vlassenko,A.G.,etal.,2019. Oasis-3:longitudinalneuroimaging,clinical,andcognitivedatasetfornormalagingand alzheimer disease. medrxiv , 2019–12

  45. [45]

    Biophysical profiling of red blood cells from thin-film blood smears using deep learning

    Lamoureux, E.S., Cheng, Y., Islamzada, E., Matthews, K., Duffy, S.P., Ma, H., 2024. Biophysical profiling of red blood cells from thin-film blood smears using deep learning. Heliyon 10

  46. [46]

    Gradient-based learning applied to document recognition

    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 2002. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324

  47. [47]

    Deep learning in medical imaging: general overview

    Lee, J.G., Jun, S., Cho, Y.W., Lee, H., Kim, G.B., Seo, J.B., Kim, N., 2017. Deep learning in medical imaging: general overview. Korean journal of radiology 18, 570–584

  48. [48]

    Medical image analysis using deep learning algorithms

    Li, M., Jiang, Y., Zhang, Y., Zhu, H., 2023. Medical image analysis using deep learning algorithms. Frontiers in public health 11, 1273253

  49. [49]

    Li, X., Zhou, Y., Dvornek, N.C., Gu, Y., Ventola, P., Duncan, J.S., 2020. Efficient shapley explanation for features importance estimation under uncertainty, in: International Conference on Medical Image Computing and Computer- Assisted Intervention, Springer. pp. 792–801

  50. [50]

    A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis

    Liu, X., Faes, L., Kale, A.U., Wagner, S.K., Fu, D.J., Bruynseels, A., Mahendiran, T., Moraes, G., Shamdas, M., Kern, C., et al., 2019. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The lancet digital health 1, e271–e297

  51. [51]

    Enhancingmedicalimagesegmentationviacomplementarycnn-transformer fusion and boundary perception

    Liu,X.,Tian,J.,Huang,S.,Shen,W.,2025. Enhancingmedicalimagesegmentationviacomplementarycnn-transformer fusion and boundary perception. Frontiers in Computer Science 7, 1677905

  52. [52]

    Uncertainty-aware deep learning in healthcare: a scoping review

    Loftus,T.J.,Shickel,B.,Ruppert,M.M.,Balch,J.A.,Ozrazgat-Baslanti,T.,Tighe,P.J.,Efron,P.A.,Hogan,W.R.,Rashidi, P., Upchurch Jr, G.R., et al., 2022. Uncertainty-aware deep learning in healthcare: a scoping review. PLOS digital health 1, e0000085

  53. [53]

    Towardsaleatoricandepistemicuncertaintyinmedicalimageclassification, in: International Conference on Artificial Intelligence in Medicine, Springer

    Löhr,T.,Ingrisch,M.,Hüllermeier,E.,2024. Towardsaleatoricandepistemicuncertaintyinmedicalimageclassification, in: International Conference on Artificial Intelligence in Medicine, Springer. pp. 145–155

  54. [54]

    A unified approach to interpreting model predictions

    Lundberg, S.M., Lee, S.I., 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30

  55. [55]

    A comprehensive reviewofdeepneuralnetworksformedicalimageprocessing:Recentdevelopmentsandfutureopportunities

    Mall, P.K., Singh, P.K., Srivastav, S., Narayan, V., Paprzycki, M., Jaworska, T., Ganzha, M., 2023. A comprehensive reviewofdeepneuralnetworksformedicalimageprocessing:Recentdevelopmentsandfutureopportunities. Healthcare Analytics 4, 100216

  56. [56]

    Ventricular features as reliable differentiators between bvftd and other dementias

    Manera, A.L., Dadar, M., Collins, D.L., Ducharme, S., Initiative, F.L.D.N., (ADNI, A.D.N.I., et al., 2022. Ventricular features as reliable differentiators between bvftd and other dementias. NeuroImage: Clinical 33, 102947. :Preprint submitted to ElsevierPage 30 of 32

  57. [57]

    Deep learning–based detection of diabetic macular edema using optical coherence tomography and fundus images: A meta-analysis

    Manikandan, S., Raman, R., Rajalakshmi, R., Tamilselvi, S., Surya, R.J., 2023. Deep learning–based detection of diabetic macular edema using optical coherence tomography and fundus images: A meta-analysis. Indian Journal of Ophthalmology 71, 1783–1796

  58. [58]

    Openaccessseriesofimaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults

    Marcus,D.S.,Wang,T.H.,Parker,J.,Csernansky,J.G.,Morris,J.C.,Buckner,R.L.,2007. Openaccessseriesofimaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults. Journal of cognitive neuroscience 19, 1498–1507

  59. [59]

    Ganterfactual—counterfactual explanations for medical non-experts using generative adversarial learning

    Mertes, S., Huber, T., Weitz, K., Heimerl, A., André, E., 2022. Ganterfactual—counterfactual explanations for medical non-experts using generative adversarial learning. Frontiers in artificial intelligence 5, 825565

  60. [60]

    Monte carlo dropout for uncertainty estimation and motor imagery classification

    Milanés-Hermosilla,D.,TrujilloCodorniú,R.,López-Baracaldo,R.,Sagaró-Zamora,R.,Delisle-Rodriguez,D.,Villarejo- Mayor, J.J., Nunez-Alvarez, J.R., 2021. Monte carlo dropout for uncertainty estimation and motor imagery classification. Sensors 21, 7241

  61. [61]

    Reinventing radiology: big data and the future of medical imaging

    Morris, M.A., Saboury, B., Burkett, B., Gao, J., Siegel, E.L., 2018. Reinventing radiology: big data and the future of medical imaging. Journal of thoracic imaging 33, 4–16

  62. [62]

    Computational and structural biotechnology journal 24, 542–560

    Muhammad,D.,Bendechache,M.,2024.Unveilingtheblackbox:Asystematicreviewofexplainableartificialintelligence in medical image analysis. Computational and structural biotechnology journal 24, 542–560

  63. [63]

    Survey of explainable artificial intelligence techniques for biomedical imaging with deep neural networks

    Nazir, S., Dickson, D.M., Akram, M.U., 2023. Survey of explainable artificial intelligence techniques for biomedical imaging with deep neural networks. Computers in Biology and Medicine 156, 106668

  64. [64]

    Nguyen, V.P., Trinh, N.H., Nguyen, D.M.L., Nguyen, P.L., Tran, Q.L., 2025. Aleatoric uncertainty medical image segmentation estimation via flow matching, in: International Workshop on Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, Springer. pp. 134–144

  65. [65]

    Emerging trends in ai-powered medical imaging: enhancing diagnostic accuracy and treatment decisions

    Oyeniyi, J., Oluwaseyi, P., 2024. Emerging trends in ai-powered medical imaging: enhancing diagnostic accuracy and treatment decisions. International Journal of Enhanced Research In Science Technology & Engineering 13, 81–94

  66. [66]

    Parcalabescu, L., Frank, A., 2023. Mm-shap: A performance-agnostic metric for measuring multimodal contributions in vision and language models & tasks, in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 4032–4059

  67. [67]

    Practical guide to shap analysis: Explaining supervised machine learning model predictions in drug development

    Ponce-Bobadilla, A.V., Schmitt, V., Maier, C.S., Mensing, S., Stodtmann, S., 2024. Practical guide to shap analysis: Explaining supervised machine learning model predictions in drug development. Clinical and translational science 17, e70056

  68. [68]

    Revolutionizing healthcare: a comparative insight into deep learning’s role in medical imaging

    Prasad, V.K., Verma, A., Bhattacharya, P., Shah, S., Chowdhury, S., Bhavsar, M., Aslam, S., Ashraf, N., 2024. Revolutionizing healthcare: a comparative insight into deep learning’s role in medical imaging. Scientific Reports 14, 30273

  69. [69]

    Enhanced mri brain tumor detection using deep learning in conjunction with explainable ai shap based diverse and multi feature analysis

    Rahman, A., Hayat, M., Iqbal, N., Alarfaj, F.K., Alkhalaf, S., Alturise, F., 2025. Enhanced mri brain tumor detection using deep learning in conjunction with explainable ai shap based diverse and multi feature analysis. Scientific Reports 15, 29411

  70. [70]

    Ramaswamy, H.G., et al., 2020. Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization, in: proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 983–991

  71. [71]

    Herecomestheexplanation:Ashapley perspective on multi-contrast medical image segmentation

    Ren,T.,Rivera,J.H.,Oswal,H.,Pan,Y.,Chopra,A.,Ruzevick,J.,Kurt,M.,2025. Herecomestheexplanation:Ashapley perspective on multi-contrast medical image segmentation. arXiv preprint arXiv:2504.04645

  72. [72]

    why should i trust you?

    Ribeiro, M.T., Singh, S., Guestrin, C., 2016. " why should i trust you?" explaining the predictions of any classifier, in: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135– 1144

  73. [73]

    Aperspective on explainable artificial intelligence methods: Shap and lime

    Salih,A.M.,Raisi-Estabragh,Z.,Galazzo,I.B.,Radeva,P.,Petersen,S.E.,Lekadir,K.,Menegaz,G.,2025. Aperspective on explainable artificial intelligence methods: Shap and lime. Advanced Intelligent Systems 7, 2400304

  74. [74]

    Explainability and uncertainty: Two sides of the same coin for enhancing the interpretability of deep learning models in healthcare

    Salvi, M., Seoni, S., Campagner, A., Gertych, A., Acharya, U.R., Molinari, F., Cabitza, F., 2025. Explainability and uncertainty: Two sides of the same coin for enhancing the interpretability of deep learning models in healthcare. International Journal of Medical Informatics 197, 105846

  75. [75]

    Grad-cam: Visual explanations from deep networks via gradient-based localization, in: Proceedings of the IEEE international conference on computer vision, pp

    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D., 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization, in: Proceedings of the IEEE international conference on computer vision, pp. 618–626

  76. [76]

    Combination of evidence in dempster-shafer theory

    Sentz, K., Ferson, S., 2002. Combination of evidence in dempster-shafer theory

  77. [77]

    Application of uncertainty quantification to artificial intelligence in healthcare: A review of last decade (2013–2023)

    Seoni, S., Jahmunah, V., Salvi, M., Barua, P.D., Molinari, F., Acharya, U.R., 2023. Application of uncertainty quantification to artificial intelligence in healthcare: A review of last decade (2013–2023). Computers in Biology and Medicine 165, 107441

  78. [78]

    Dempster-shafer theory

    Shafer, G., 1992. Dempster-shafer theory. Encyclopedia of artificial intelligence 1, 330–331

  79. [79]

    ProbabilisticModelingandUncertaintyAwarenessinDeepLearning

    Shen,Y.,2025. ProbabilisticModelingandUncertaintyAwarenessinDeepLearning. Ph.D.thesis.TechnischeUniversität München

  80. [80]

    Studying ventricular abnormalities in mild cognitive impairment with hyperbolic ricci flow and tensor-based morphometry

    Shi, J., Stonnington, C.M., Thompson, P.M., Chen, K., Gutman, B., Reschke, C., Baxter, L.C., Reiman, E.M., Caselli, R.J., Wang, Y., et al., 2015. Studying ventricular abnormalities in mild cognitive impairment with hyperbolic ricci flow and tensor-based morphometry. Neuroimage 104, 1–20. :Preprint submitted to ElsevierPage 31 of 32

Showing first 80 references.