Uncertainty-Aware Information Pursuit for Interpretable and Reliable Medical Image Analysis

Alireza Bab-Hadiashar; Feng Xia; Md Nahiduzzaman; Ruwan Tennakoon; Steven Korevaar; Zongyuan Ge

arxiv: 2506.16742 · v3 · submitted 2025-06-20 · 💻 cs.CV

Uncertainty-Aware Information Pursuit for Interpretable and Reliable Medical Image Analysis

Md Nahiduzzaman , Steven Korevaar , Zongyuan Ge , Feng Xia , Alireza Bab-Hadiashar , Ruwan Tennakoon This is my paper

Pith reviewed 2026-05-19 08:42 UTC · model grok-4.3

classification 💻 cs.CV

keywords uncertainty-aware modelsinterpretable-by-designmedical image analysisconcept selectionvariational information pursuitrobust AIexplainable AI

0 comments

The pith

Integrating uncertainty into concept selection makes interpretable medical AI more accurate and concise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper seeks to improve upon Variational Information Pursuit by adding awareness of uncertainty in the predicted concepts used for decisions on medical images. If true, this would allow AI systems to avoid basing diagnoses on unreliable image features and instead use only the most dependable concepts for each case, which matters for building trust in clinical settings where mistakes carry high costs. The approach leads to models that automatically choose a small set of trustworthy concepts without external help, resulting in both higher performance and easier-to-understand outputs. Readers would care because it tackles the gap between interpretability and reliability in AI for healthcare.

Core claim

The central claim is that by incorporating upstream uncertainty estimates into the V-IP process, the IUAV-IP model prioritizes reliable concepts implicitly during query selection while EUAV-IP masks uncertain ones, achieving state-of-the-art accuracy among interpretable-by-design methods on four of five medical imaging datasets and generating more concise explanations with fewer concepts.

What carries the argument

The key machinery is the uncertainty-aware V-IP querying process that uses per-sample uncertainty estimates to either mask or re-weight concept selections for more robust predictions.

If this is right

Models produce more concise explanations by selecting fewer concepts.
Achieves leading accuracy on dermoscopy, X-ray, ultrasound, and blood cell datasets.
Decisions rely on sample-specific reliable concepts without human input.
Overall robustness increases by avoiding uncertain features in ambiguous images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could extend to other safety-critical fields like radiology or pathology for similar gains.
Combining it with other uncertainty techniques might further enhance clinical alignment of explanations.
Evaluating on real-world deployment scenarios would test if the per-sample tailoring holds under varied conditions.

Load-bearing premise

The assumption that upstream uncertainty estimates are accurate and that using them to filter concepts does not discard key diagnostic information for any sample.

What would settle it

A concrete falsifier would be if, on the evaluated medical datasets, the proposed IUAV-IP model selected more concepts or achieved lower accuracy than the original V-IP baseline.

Figures

Figures reproduced from arXiv: 2506.16742 by Alireza Bab-Hadiashar, Feng Xia, Md Nahiduzzaman, Ruwan Tennakoon, Steven Korevaar, Zongyuan Ge.

**Figure 2.** Figure 2: Relationship between accuracy and the number [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

To be adopted in safety-critical domains like medical image analysis, AI systems must provide human-interpretable decisions. Variational Information Pursuit (V-IP) offers an interpretable-by-design framework by sequentially querying input images for human-understandable concepts, using their presence or absence to make predictions. However, existing V-IP methods overlook sample-specific uncertainty in concept predictions, which can arise from ambiguous features or model limitations, leading to suboptimal query selection and reduced robustness. In this paper, we propose an interpretable and uncertainty-aware framework for medical imaging that addresses these limitations by accounting for upstream uncertainties in concept-based, interpretable-by-design models. Specifically, we introduce two uncertainty-aware models, EUAV-IP and IUAV-IP, that integrate uncertainty estimates into the V-IP querying process to prioritize more reliable concepts per sample. EUAV-IP skips uncertain concepts via masking, while IUAV-IP incorporates uncertainty into query selection implicitly for more informed and clinically aligned decisions. Our approach allows models to make reliable decisions based on a subset of concepts tailored to each individual sample, without human intervention, while maintaining overall interpretability. We evaluate our methods on five medical imaging datasets across four modalities: dermoscopy, X-ray, ultrasound, and blood cell imaging. The proposed IUAV-IP model achieves state-of-the-art accuracy among interpretable-by-design approaches on four of the five datasets, and generates more concise explanations by selecting fewer yet more informative concepts. These advances enable more reliable and clinically meaningful outcomes, enhancing model trustworthiness and supporting safer AI deployment in healthcare.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends V-IP with two uncertainty-aware variants for concept selection in medical imaging and reports better accuracy plus shorter explanations on four of five datasets, but the abstract gives almost no evaluation details.

read the letter

The core addition is straightforward: they take the existing V-IP querying process and inject per-sample uncertainty estimates so the model skips or down-weights unreliable concepts before making a decision. EUAV-IP does this by masking, IUAV-IP by implicit re-weighting. That produces the two new models they evaluate on dermoscopy, X-ray, ultrasound, and blood-cell images. The claim is that IUAV-IP hits higher accuracy than prior interpretable-by-design methods on four datasets while using fewer concepts overall. That is the concrete new result and it lines up with the practical need for reliable explanations in clinical settings. The work is honest about building directly on V-IP rather than claiming a new paradigm. The multi-modality evaluation is also a plus; it shows the approach is not tied to one imaging type. The main weakness is the lack of supporting numbers. The abstract states SOTA accuracy and fewer concepts but supplies no baseline values, no statistical tests, no error bars, and no description of how the upstream uncertainty estimates were obtained or validated. Without those, it is difficult to judge whether the reported gains are robust or whether the uncertainty module actually improves robustness on hard samples. The assumption that masking or re-weighting uncertain concepts preserves diagnostic information is plausible but untested in the summary provided. This paper is for groups already working on concept-based or information-pursuit models who need a medical imaging test bed. A reader who wants to see a simple, incremental fix to an existing framework will find it useful; someone looking for a large theoretical advance will not. The work is coherent on its own terms and the extension is technically modest but relevant, so it deserves a serious referee to check the missing experimental details and confirm the uncertainty calibration.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes two uncertainty-aware extensions to Variational Information Pursuit (V-IP) for interpretable medical image analysis: EUAV-IP, which masks uncertain concepts during querying, and IUAV-IP, which incorporates uncertainty estimates implicitly into the selection process. The methods aim to produce per-sample concept selections that are more reliable and concise. Evaluation is performed on five medical imaging datasets spanning dermoscopy, X-ray, ultrasound, and blood cell modalities, with the claim that IUAV-IP attains state-of-the-art accuracy among interpretable-by-design approaches on four of the five datasets while using fewer concepts.

Significance. If the performance and robustness claims hold after detailed validation, the work could meaningfully advance reliable interpretable AI for safety-critical medical applications by mitigating the impact of uncertain concept predictions. The multi-modality evaluation and focus on sample-specific, human-understandable decisions without manual intervention represent practical strengths that could support greater clinical trust and adoption.

major comments (2)

[Abstract and §4] Abstract and §4 (Experiments): The claim of state-of-the-art accuracy among interpretable-by-design methods on four of five datasets is presented without quantitative details on the specific baselines, performance tables with error bars, statistical significance tests, or how uncertainty estimates were calibrated and validated. These omissions make it impossible to assess whether the reported gains are substantive or merely incremental.
[§3] §3 (Method): The central assumption that upstream uncertainty estimates for individual concepts are sufficiently accurate for masking (EUAV-IP) or implicit re-weighting (IUAV-IP) to improve robustness without discarding diagnostically critical information on any sample is not accompanied by sensitivity analysis, failure-case examination, or ablation on uncertainty quality. This assumption is load-bearing for the reliability claims.

minor comments (1)

[§4.1] Figure captions and §4.1 could more explicitly state the number of concepts selected per method and per dataset to support the conciseness claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important areas for strengthening the experimental validation and analysis of uncertainty assumptions. We address each major comment below and have revised the manuscript to incorporate additional quantitative details, statistical tests, sensitivity analyses, and failure-case examinations.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): The claim of state-of-the-art accuracy among interpretable-by-design methods on four of five datasets is presented without quantitative details on the specific baselines, performance tables with error bars, statistical significance tests, or how uncertainty estimates were calibrated and validated. These omissions make it impossible to assess whether the reported gains are substantive or merely incremental.

Authors: We agree that more explicit quantitative support is needed to substantiate the SOTA claims. In the revised manuscript, we have expanded Section 4 with a new comprehensive table (Table 2) listing all interpretable-by-design baselines (e.g., CBM, ProtoPNet, and standard V-IP variants), reporting mean accuracy ± standard deviation over five random seeds, and including paired t-test p-values for significance. We have also added a subsection on uncertainty calibration, reporting Expected Calibration Error (ECE) values for the upstream concept predictors across all datasets to validate estimate quality. revision: yes
Referee: [§3] §3 (Method): The central assumption that upstream uncertainty estimates for individual concepts are sufficiently accurate for masking (EUAV-IP) or implicit re-weighting (IUAV-IP) to improve robustness without discarding diagnostically critical information on any sample is not accompanied by sensitivity analysis, failure-case examination, or ablation on uncertainty quality. This assumption is load-bearing for the reliability claims.

Authors: We acknowledge that this assumption requires stronger empirical support. The revised manuscript now includes a dedicated sensitivity analysis in Section 3 and a new Appendix subsection that varies the uncertainty threshold for EUAV-IP masking, reports its effect on both accuracy and explanation conciseness, and examines failure cases where high-uncertainty concepts carried diagnostic value. We further add an ablation comparing model performance when using estimated uncertainties versus oracle (ground-truth) concept uncertainties to directly assess sensitivity to uncertainty quality. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper extends Variational Information Pursuit by adding upstream uncertainty estimates to guide per-sample concept selection in EUAV-IP (masking) and IUAV-IP (implicit re-weighting). The derivation chain consists of standard supervised training of a concept predictor, separate uncertainty estimation, and then a modified query selection rule; none of these steps are shown to reduce by construction to the final accuracy numbers or to any self-citation. Evaluation is performed on held-out test splits across five external medical datasets, and the reported SOTA claim among interpretable-by-design methods is an empirical outcome rather than a definitional or fitted-input tautology. No uniqueness theorem, ansatz smuggling, or renaming of known results is invoked in a load-bearing way.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the method implicitly assumes that concept-level uncertainty can be estimated reliably from the upstream model and that this estimate is a valid proxy for decision reliability.

pith-pipeline@v0.9.0 · 5828 in / 1091 out tokens · 36407 ms · 2026-05-19T08:42:29.905613+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

UA V-IP integrates uncertainty quantification into the V-IP process... EUAV-IP skips uncertain concepts via masking, while IUAV-IP incorporates uncertainty into query selection implicitly
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

modified objective... min γ,η E[DKL(P(Y|X) || fη(Y | gγ(H[1:t], Ω), H[1:t]))]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

Can we open the black box of ai? Nature News, 538(7623):20, 2016

Davide Castelvecchi. Can we open the black box of ai? Nature News, 538(7623):20, 2016. 6 A PREPRINT - SEPTEMBER 2, 2025

work page 2016
[2]

Opening the black box of deep neural networks via information.Information Flow in Deep Neural Networks, page 24, 2022

Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information.Information Flow in Deep Neural Networks, page 24, 2022

work page 2022
[3]

Grad-cam++: General- ized gradient-based visual explanations for deep convolutional networks

Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. Grad-cam++: General- ized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV), pages 839–847. IEEE, 2018

work page 2018
[4]

Grad-cam: Visual explanations from deep networks via gradient-based localization

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017

work page 2017
[5]

Visualizing and understanding convolutional networks

Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In Computer Vision– ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13 , pages 818–833. Springer, 2014

work page 2014
[6]

why should i trust you?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016

work page 2016
[7]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5):206–215, 2019

work page 2019
[8]

Sanity checks for saliency maps

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. Advances in neural information processing systems, 31, 2018

work page 2018
[9]

Concept bottleneck models

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In International conference on machine learning, pages 5338–5348. PMLR, 2020

work page 2020
[10]

Comprehensible convolutional neural networks via guided concept learning

Sandareka Wickramanayake, Wynne Hsu, and Mong Li Lee. Comprehensible convolutional neural networks via guided concept learning. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021

work page 2021
[11]

Coherent concept-based explanations in medical image and its application to skin lesion diagnosis

Cristiano Patrício, João C Neves, and Luis F Teixeira. Coherent concept-based explanations in medical image and its application to skin lesion diagnosis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3799–3808, 2023

work page 2023
[12]

Interpretable by design: Learning predictors by composing interpretable queries

Aditya Chattopadhyay, Stewart Slocum, Benjamin D Haeffele, Rene Vidal, and Donald Geman. Interpretable by design: Learning predictors by composing interpretable queries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6):7430–7443, 2022

work page 2022
[13]

Variational information pursuit for interpretable predictions

Aditya Chattopadhyay, Kwan Ho Ryan Chan, Benjamin David Haeffele, Donald Geman, and Rene Vidal. Variational information pursuit for interpretable predictions. In The Eleventh International Conference on Learning Representations, 2023

work page 2023
[14]

Bootstrapping variational information pursuit with large language and vision models for interpretable image classification

Aditya Chattopadhyay, Kwan Ho Ryan Chan, and Rene Vidal. Bootstrapping variational information pursuit with large language and vision models for interpretable image classification. In The Twelfth International Conference on Learning Representations, 2024

work page 2024
[15]

An active testing model for tracking roads in satellite images

Donald Geman and Bruno Jedynak. An active testing model for tracking roads in satellite images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(1):1–14, 1996

work page 1996
[16]

Aleatoric and epistemic uncertainty in machine learning: An introduc- tion to concepts and methods

Eyke Hüllermeier and Willem Waegeman. Aleatoric and epistemic uncertainty in machine learning: An introduc- tion to concepts and methods. Machine learning, 110(3):457–506, 2021

work page 2021
[17]

Label-wise aleatoric and epistemic uncertainty quantification

Yusuf Sale, Paul Hofman, Timo Löhr, Lisa Wimmer, Thomas Nagler, and Eyke Hüllermeier. Label-wise aleatoric and epistemic uncertainty quantification. In The 40th Conference on Uncertainty in Artificial Intelligence, 2024

work page 2024
[18]

Seeing health with eyes: Feature combination for image-based human bmi estimation

Junjia Huang, Chenming Shang, Aolin Xiong, Yuxian Pang, and Zhi Jin. Seeing health with eyes: Feature combination for image-based human bmi estimation. In 2021 ieee international conference on multimedia and expo (icme), pages 1–6. IEEE, 2021

work page 2021
[19]

Evidential uncertainty quantification: A variance-based perspective

Ruxiao Duan, Brian Caffo, Harrison X Bai, Haris I Sair, and Craig Jones. Evidential uncertainty quantification: A variance-based perspective. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2132–2141, 2024

work page 2024
[20]

Evidential concept embedding models: Towards reliable concept explanations for skin disease diagnosis

Yibo Gao, Zheyao Gao, Xin Gao, Yuanye Liu, Bomin Wang, and Xiahai Zhuang. Evidential concept embedding models: Towards reliable concept explanations for skin disease diagnosis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 308–317. Springer, 2024

work page 2024
[21]

Probabilistic concept bottleneck models

Eunji Kim, Dahuin Jung, Sangha Park, Siwon Kim, and Sungroh Yoon. Probabilistic concept bottleneck models. In International Conference on Machine Learning, pages 16521–16540. PMLR, 2023. 7 A PREPRINT - SEPTEMBER 2, 2025

work page 2023
[22]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021

work page 2021
[23]

Label-free concept bottleneck models

Tuomas Oikarinen, Subhro Das, Lam M Nguyen, and Tsui-Wei Weng. Label-free concept bottleneck models. In The Eleventh International Conference on Learning Representations, 2023

work page 2023
[24]

Language in a bottle: Language model guided concept bottlenecks for interpretable image classification

Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, and Mark Yatskar. Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19187–19197, 2023

work page 2023
[25]

A survey of uncertainty in deep neural networks

Jakob Gawlikowski, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, Rudolph Triebel, Peter Jung, Ribana Roscher, et al. A survey of uncertainty in deep neural networks. Artificial Intelligence Review, 56(Suppl 1):1513–1589, 2023

work page 2023
[26]

Dropout as a bayesian approximation: Representing model uncertainty in deep learning

Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016

work page 2016
[27]

Ph 2-a dermoscopic image database for research and benchmarking

Teresa Mendonça, Pedro M Ferreira, Jorge S Marques, André RS Marcal, and Jorge Rozeira. Ph 2-a dermoscopic image database for research and benchmarking. In 2013 35th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pages 5437–5440. IEEE, 2013

work page 2013
[28]

Seven-point checklist and skin lesion classification using multitask multimodal neural nets

Jeremy Kawahara, Sara Daneshvar, Giuseppe Argenziano, and Ghassan Hamarneh. Seven-point checklist and skin lesion classification using multitask multimodal neural nets. IEEE journal of biomedical and health informatics, 23(2):538–546, 2018

work page 2018
[29]

Curated benchmark dataset for ultrasound based breast lesion analysis

Anna Pawłowska, Anna ´Cwierz-Pie´nkowska, Agnieszka Domalik, Dominika Jagu ´s, Piotr Kasprzak, Rafał Matkowski, Łukasz Fura, Andrzej Nowicki, and Norbert ˙Zołek. Curated benchmark dataset for ultrasound based breast lesion analysis. Scientific Data, 11(1):148, 2024

work page 2024
[30]

Skincon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis

Roxana Daneshjou, Mert Yuksekgonul, Zhuo Ran Cai, Roberto Novoa, and James Y Zou. Skincon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis. Advances in Neural Information Processing Systems, 35:18157–18167, 2022

work page 2022
[31]

Concept complement bottleneck model for interpretable medical image diagnosis

Hongmei Wang, Junlin Hou, and Hao Chen. Concept complement bottleneck model for interpretable medical image diagnosis. arXiv preprint arXiv:2410.15446, 2024. 8

work page arXiv 2024

[1] [1]

Can we open the black box of ai? Nature News, 538(7623):20, 2016

Davide Castelvecchi. Can we open the black box of ai? Nature News, 538(7623):20, 2016. 6 A PREPRINT - SEPTEMBER 2, 2025

work page 2016

[2] [2]

Opening the black box of deep neural networks via information.Information Flow in Deep Neural Networks, page 24, 2022

Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information.Information Flow in Deep Neural Networks, page 24, 2022

work page 2022

[3] [3]

Grad-cam++: General- ized gradient-based visual explanations for deep convolutional networks

Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. Grad-cam++: General- ized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV), pages 839–847. IEEE, 2018

work page 2018

[4] [4]

Grad-cam: Visual explanations from deep networks via gradient-based localization

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017

work page 2017

[5] [5]

Visualizing and understanding convolutional networks

Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In Computer Vision– ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13 , pages 818–833. Springer, 2014

work page 2014

[6] [6]

why should i trust you?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016

work page 2016

[7] [7]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5):206–215, 2019

work page 2019

[8] [8]

Sanity checks for saliency maps

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. Advances in neural information processing systems, 31, 2018

work page 2018

[9] [9]

Concept bottleneck models

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In International conference on machine learning, pages 5338–5348. PMLR, 2020

work page 2020

[10] [10]

Comprehensible convolutional neural networks via guided concept learning

Sandareka Wickramanayake, Wynne Hsu, and Mong Li Lee. Comprehensible convolutional neural networks via guided concept learning. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021

work page 2021

[11] [11]

Coherent concept-based explanations in medical image and its application to skin lesion diagnosis

Cristiano Patrício, João C Neves, and Luis F Teixeira. Coherent concept-based explanations in medical image and its application to skin lesion diagnosis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3799–3808, 2023

work page 2023

[12] [12]

Interpretable by design: Learning predictors by composing interpretable queries

Aditya Chattopadhyay, Stewart Slocum, Benjamin D Haeffele, Rene Vidal, and Donald Geman. Interpretable by design: Learning predictors by composing interpretable queries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6):7430–7443, 2022

work page 2022

[13] [13]

Variational information pursuit for interpretable predictions

Aditya Chattopadhyay, Kwan Ho Ryan Chan, Benjamin David Haeffele, Donald Geman, and Rene Vidal. Variational information pursuit for interpretable predictions. In The Eleventh International Conference on Learning Representations, 2023

work page 2023

[14] [14]

Bootstrapping variational information pursuit with large language and vision models for interpretable image classification

Aditya Chattopadhyay, Kwan Ho Ryan Chan, and Rene Vidal. Bootstrapping variational information pursuit with large language and vision models for interpretable image classification. In The Twelfth International Conference on Learning Representations, 2024

work page 2024

[15] [15]

An active testing model for tracking roads in satellite images

Donald Geman and Bruno Jedynak. An active testing model for tracking roads in satellite images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(1):1–14, 1996

work page 1996

[16] [16]

Aleatoric and epistemic uncertainty in machine learning: An introduc- tion to concepts and methods

Eyke Hüllermeier and Willem Waegeman. Aleatoric and epistemic uncertainty in machine learning: An introduc- tion to concepts and methods. Machine learning, 110(3):457–506, 2021

work page 2021

[17] [17]

Label-wise aleatoric and epistemic uncertainty quantification

Yusuf Sale, Paul Hofman, Timo Löhr, Lisa Wimmer, Thomas Nagler, and Eyke Hüllermeier. Label-wise aleatoric and epistemic uncertainty quantification. In The 40th Conference on Uncertainty in Artificial Intelligence, 2024

work page 2024

[18] [18]

Seeing health with eyes: Feature combination for image-based human bmi estimation

Junjia Huang, Chenming Shang, Aolin Xiong, Yuxian Pang, and Zhi Jin. Seeing health with eyes: Feature combination for image-based human bmi estimation. In 2021 ieee international conference on multimedia and expo (icme), pages 1–6. IEEE, 2021

work page 2021

[19] [19]

Evidential uncertainty quantification: A variance-based perspective

Ruxiao Duan, Brian Caffo, Harrison X Bai, Haris I Sair, and Craig Jones. Evidential uncertainty quantification: A variance-based perspective. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2132–2141, 2024

work page 2024

[20] [20]

Evidential concept embedding models: Towards reliable concept explanations for skin disease diagnosis

Yibo Gao, Zheyao Gao, Xin Gao, Yuanye Liu, Bomin Wang, and Xiahai Zhuang. Evidential concept embedding models: Towards reliable concept explanations for skin disease diagnosis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 308–317. Springer, 2024

work page 2024

[21] [21]

Probabilistic concept bottleneck models

Eunji Kim, Dahuin Jung, Sangha Park, Siwon Kim, and Sungroh Yoon. Probabilistic concept bottleneck models. In International Conference on Machine Learning, pages 16521–16540. PMLR, 2023. 7 A PREPRINT - SEPTEMBER 2, 2025

work page 2023

[22] [22]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021

work page 2021

[23] [23]

Label-free concept bottleneck models

Tuomas Oikarinen, Subhro Das, Lam M Nguyen, and Tsui-Wei Weng. Label-free concept bottleneck models. In The Eleventh International Conference on Learning Representations, 2023

work page 2023

[24] [24]

Language in a bottle: Language model guided concept bottlenecks for interpretable image classification

Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, and Mark Yatskar. Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19187–19197, 2023

work page 2023

[25] [25]

A survey of uncertainty in deep neural networks

Jakob Gawlikowski, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, Rudolph Triebel, Peter Jung, Ribana Roscher, et al. A survey of uncertainty in deep neural networks. Artificial Intelligence Review, 56(Suppl 1):1513–1589, 2023

work page 2023

[26] [26]

Dropout as a bayesian approximation: Representing model uncertainty in deep learning

Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016

work page 2016

[27] [27]

Ph 2-a dermoscopic image database for research and benchmarking

Teresa Mendonça, Pedro M Ferreira, Jorge S Marques, André RS Marcal, and Jorge Rozeira. Ph 2-a dermoscopic image database for research and benchmarking. In 2013 35th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pages 5437–5440. IEEE, 2013

work page 2013

[28] [28]

Seven-point checklist and skin lesion classification using multitask multimodal neural nets

Jeremy Kawahara, Sara Daneshvar, Giuseppe Argenziano, and Ghassan Hamarneh. Seven-point checklist and skin lesion classification using multitask multimodal neural nets. IEEE journal of biomedical and health informatics, 23(2):538–546, 2018

work page 2018

[29] [29]

Curated benchmark dataset for ultrasound based breast lesion analysis

Anna Pawłowska, Anna ´Cwierz-Pie´nkowska, Agnieszka Domalik, Dominika Jagu ´s, Piotr Kasprzak, Rafał Matkowski, Łukasz Fura, Andrzej Nowicki, and Norbert ˙Zołek. Curated benchmark dataset for ultrasound based breast lesion analysis. Scientific Data, 11(1):148, 2024

work page 2024

[30] [30]

Skincon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis

Roxana Daneshjou, Mert Yuksekgonul, Zhuo Ran Cai, Roberto Novoa, and James Y Zou. Skincon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis. Advances in Neural Information Processing Systems, 35:18157–18167, 2022

work page 2022

[31] [31]

Concept complement bottleneck model for interpretable medical image diagnosis

Hongmei Wang, Junlin Hou, and Hao Chen. Concept complement bottleneck model for interpretable medical image diagnosis. arXiv preprint arXiv:2410.15446, 2024. 8

work page arXiv 2024