On the notion of missingness for path attribution explainability methods in medical settings: Guiding the selection of medically meaningful baselines

Alexander Geiger; Alissa Jell; Daniel Rueckert; Dirk Wilhelm; Lars Wagner

arxiv: 2508.14482 · v3 · submitted 2025-08-20 · 💻 cs.LG

On the notion of missingness for path attribution explainability methods in medical settings: Guiding the selection of medically meaningful baselines

Alexander Geiger , Lars Wagner , Daniel Rueckert , Dirk Wilhelm , Alissa Jell This is my paper

Pith reviewed 2026-05-18 22:04 UTC · model grok-4.3

classification 💻 cs.LG

keywords semantic missingnesscounterfactual baselinespath attributionintegrated gradientsmedical imaging explainabilityfaithful attributionsgenerative models

0 comments

The pith

Counterfactual baselines for path attribution methods yield more faithful explanations in medical imaging by satisfying semantic missingness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In medical settings, path attribution methods like Integrated Gradients need a baseline that represents the absence of informative features. Standard choices such as all-zero images often lack clinical meaning because pixel intensities have diagnostic value. The paper formalizes semantic missingness, requiring the baseline to depict a clinically plausible state without the disease-related features. This leads to a counterfactual-guided approach where a generated normal variant of the input serves as the baseline. Theoretical guarantees and experiments on three datasets show these baselines produce attributions that are more faithful and medically relevant than standard alternatives.

Core claim

The central claim is that counterfactual baselines, created by generative models such as VAEs and diffusion models to represent clinically normal versions of pathological inputs, satisfy a stricter semantic missingness condition and thereby deliver more faithful attributions for path attribution methods compared to conventional baselines. This is supported by derived theoretical guarantees of improved faithfulness and validated empirically across three diverse medical datasets, where the approach also outperforms using the counterfactual directly as an explanation.

What carries the argument

Semantic missingness: a baseline that represents a clinically plausible state in which disease-related features are absent, which motivates and justifies the use of synthetically generated counterfactuals as the reference input for attribution calculations.

If this is right

Attributions become more aligned with actual disease manifestations rather than baseline artifacts.
The framework applies to any path attribution method and any suitable counterfactual generator.
Empirical superiority holds across varied medical imaging datasets.
Bridging attribution and counterfactual paradigms improves overall explainability.
Clinically, this supports better trust and transparency in deep learning models for diagnosis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Clinicians could use these explanations to verify AI decisions against expected pathology locations.
The idea might extend to non-imaging medical data like time series or tabular records where semantic absence matters.
Comparing to other explainability techniques could reveal broader applicability.
Real-world deployment would require validation that generated counterfactuals match expert judgment of normality.

Load-bearing premise

Synthetically generated counterfactual images must accurately represent a clinically plausible state without the disease features for the faithfulness improvement to hold.

What would settle it

Observing that Integrated Gradients attributions using the counterfactual baseline fail to better highlight known disease regions or agree with expert annotations than those using a zero baseline on a held-out medical dataset would falsify the improved faithfulness claim.

Figures

Figures reproduced from arXiv: 2508.14482 by Alexander Geiger, Alissa Jell, Daniel Rueckert, Dirk Wilhelm, Lars Wagner.

**Figure 2.** Figure 2: The models used in our setup (left), as well as the approach to find the counterfactual [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Examples of attributions obtained using different baselines. Typical colormaps for the [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Results of the mass-center ablation test on the three data sets. The metric shows the drop [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Results of measuring the overlap of the attributions (for the different evaluated baselines) to [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Results of the Top-k ablation tests A.2 Spatial spread of attribution pixels Another way of measuring the quality of pixel attributions is the spread of the attributions, where we expect to have a low spread when the attributions are good, since the medical features that make up a certain class should be constrained to very condensed areas. We calculate the spread using the intensity-weighted distance from… view at source ↗

**Figure 7.** Figure 7: Spatial spread of attributions 15 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: The final counterfactual is reached once the specified confidence threshold is reached. [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Five examples from each data set for a qualitative comparison of the different baseline [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Loss on training and validation sets during training of the VAE on the three data sets [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Loss on training and validation sets during training of the classifier (first row). Confusion [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

read the original abstract

The explainability of deep learning models remains a significant challenge, particularly in the medical domain where interpretable outputs are essential for clinical trust and transparency. Path attribution methods such as Integrated Gradients rely on a baseline that represents the absence of informative features, a notion commonly referred to as missingness. Standard baselines, such as all-zero inputs, are often semantically meaningless in medical contexts, where intensity values carry clinical significance. In this work, we revisit the notion of missingness for medical imaging, expose the limitations of standard baselines in this setting, and formalize a stricter missingness we term semantic missingness: a baseline must not merely lack signal, but must represent a clinically plausible state in which the disease-related features are absent. This formulation motivates a counterfactual-guided approach to baseline selection, in which a synthetically generated counterfactual (i.e. a clinically normal variant of the pathological input) serves as a principled and semantically meaningful reference. We derive theoretical guarantees showing that counterfactual baselines yield more faithful attributions than standard alternatives, and empirically validate this with two complementary counterfactual generative models, a VAE and a diffusion model, though the concept is model-agnostic and compatible with any suitable counterfactual method. Across three diverse medical datasets, counterfactual baselines produce more faithful and medically relevant attributions, outperforming standard baseline choices as well as related methods. Notably, we also compare against using the counterfactual directly as an explanation (an established paradigm in its own) and show that employing it as a baseline for Integrated Gradients yields superior results, thereby bridging two complementary explainability paradigms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper formalizes semantic missingness and shows counterfactual baselines can improve Integrated Gradients faithfulness in medical imaging, but the gains depend heavily on how cleanly the generators remove pathology.

read the letter

The main takeaway is that standard baselines like all-zero images break down in medical settings because they lack clinical meaning, and the authors propose using generated counterfactuals as a fix for path attribution methods such as Integrated Gradients. They formalize this as semantic missingness and claim theoretical advantages plus better empirical results on three datasets compared to both conventional baselines and direct counterfactual explanations.

Referee Report

2 major / 2 minor

Summary. The paper claims that standard baselines for path attribution methods like Integrated Gradients are semantically meaningless in medical imaging and proposes 'semantic missingness' as a stricter criterion: baselines must represent clinically plausible states with disease-related features absent. It motivates using synthetically generated counterfactuals (via VAE or diffusion models) as baselines, derives theoretical guarantees for improved faithfulness over zero/mean baselines, and empirically shows superior performance on three medical datasets while also outperforming direct use of counterfactuals as explanations.

Significance. If the central claims hold, the work could meaningfully advance explainable AI for medical applications by providing a principled baseline selection strategy that aligns with clinical semantics and bridges attribution-based and counterfactual explanation paradigms. The use of two distinct generative models and multiple datasets adds robustness to the empirical component, and the model-agnostic framing increases potential impact if the faithfulness improvements are independently verifiable.

major comments (2)

[§4] §4 (Theoretical Guarantees): The derivation of theoretical guarantees for superior faithfulness of counterfactual baselines presupposes that the generated counterfactuals satisfy semantic missingness exactly (i.e., no residual disease signals or anatomical distortions). No quantitative bounds, error analysis, or formal verification of this property are provided for the VAE and diffusion outputs; this assumption is load-bearing for both the guarantees and the claim of outperformance over standard baselines.
[§5] §5 (Empirical Evaluation, e.g. faithfulness metrics and Table 2): The reported superiority in faithfulness metrics may be circular if those metrics are computed with respect to the same counterfactual generation process used to create the baselines. The manuscript should explicitly demonstrate independence between the evaluation criteria and the generative models, or provide an external validation (e.g., clinician ratings of attribution plausibility) to support the central empirical claim.

minor comments (2)

[Abstract and §3] The abstract states that the approach is 'model-agnostic and compatible with any suitable counterfactual method,' but the main text provides limited discussion of failure modes or requirements for alternative generators beyond VAE and diffusion.
[Figures] Figure captions and axis labels in the attribution visualization panels could be expanded to explicitly indicate which baseline was used in each column for easier cross-comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below, indicating planned revisions where appropriate while maintaining the integrity of our theoretical and empirical contributions.

read point-by-point responses

Referee: [§4] §4 (Theoretical Guarantees): The derivation of theoretical guarantees for superior faithfulness of counterfactual baselines presupposes that the generated counterfactuals satisfy semantic missingness exactly (i.e., no residual disease signals or anatomical distortions). No quantitative bounds, error analysis, or formal verification of this property are provided for the VAE and diffusion outputs; this assumption is load-bearing for both the guarantees and the claim of outperformance over standard baselines.

Authors: We agree that the theoretical results in §4 are derived under the assumption of exact semantic missingness. In the revised manuscript we will (i) state this assumption explicitly at the beginning of the theoretical section, (ii) add a dedicated paragraph discussing the practical deviation from the ideal case, and (iii) include quantitative diagnostics (e.g., residual disease-signal scores obtained from an independent downstream classifier) that bound the approximation error of both the VAE and diffusion counterfactuals. These additions will clarify the scope of the guarantees without changing their formal statements. revision: partial
Referee: [§5] §5 (Empirical Evaluation, e.g. faithfulness metrics and Table 2): The reported superiority in faithfulness metrics may be circular if those metrics are computed with respect to the same counterfactual generation process used to create the baselines. The manuscript should explicitly demonstrate independence between the evaluation criteria and the generative models, or provide an external validation (e.g., clinician ratings of attribution plausibility) to support the central empirical claim.

Authors: The faithfulness metrics reported in §5 (insertion/deletion AUC and ROAR) are standard, model-agnostic attribution-evaluation protocols that measure the predictive model’s output change under feature occlusion; they do not invoke the VAE or diffusion generators at evaluation time. We will add an explicit paragraph in §5 that documents this separation and confirms that the evaluation pipeline shares no parameters or data with the counterfactual generators. While clinician ratings would constitute valuable supplementary evidence, the current multi-dataset, multi-generator quantitative results already provide independent support for the claims; we will note clinician validation as a worthwhile direction for follow-up work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation remains independent of inputs

full rationale

The paper formalizes semantic missingness as a clinically plausible state without disease features and claims independent theoretical guarantees that counterfactual baselines produce more faithful attributions than zero or mean baselines. No equations or definitions in the provided text reduce the faithfulness metric or the guarantees to the counterfactual generation process by construction; the guarantees are presented as first-principles results following from the missingness definition rather than tautological restatements. Empirical comparisons use separate VAE and diffusion generators on three datasets and are not framed as predictions fitted to the same data used for the theory. No self-citation chains, ansatz smuggling, or renaming of known results appear as load-bearing steps. The central claim therefore retains independent content and does not collapse to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the ability to generate accurate counterfactuals that satisfy semantic missingness and on the validity of the faithfulness metrics used for comparison; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Counterfactual generative models can produce images that represent clinically normal variants without disease features
Invoked when defining semantic missingness and when claiming superior faithfulness for counterfactual baselines.

pith-pipeline@v0.9.0 · 5828 in / 1211 out tokens · 28812 ms · 2026-05-18T22:04:38.437076+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

[1]

Sanity Checks for Saliency Maps

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity Checks for Saliency Maps. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

work page 2018
[2]

Towards better understanding of gradient-based attribution methods for Deep Neural Networks

Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. Towards better understanding of gradient-based attribution methods for Deep Neural Networks. In 6th International Conference on Learning Representations, ICLR 2018, 2018. doi: 10.3929/ethz-b-000249929

work page doi:10.3929/ethz-b-000249929 2018
[3]

Angelov, Eduardo A

Plamen P. Angelov, Eduardo A. Soares, Richard Jiang, Nicholas I. Arnold, and Peter M. Atkinson. Explainable artificial intelligence: An analytical review. WIREs Data Mining and Knowledge Discovery, 11(5):e1424, 2021. doi: 10.1002/widm.1424

work page doi:10.1002/widm.1424 2021
[4]

Kirschke, and Matthias Keicher

Matan Atad, David Schinz, Hendrik Moeller, Robert Graf, Benedikt Wiestler, Daniel Rueckert, Nassir Navab, Jan S. Kirschke, and Matthias Keicher. Counterfactual Explanations for Medical Image Classifica- tion and Regression using Diffusion Autoencoder. Machine Learning for Biomedical Imaging, 2(iMIMIC 2023 special issue):2103–2125, September 2024. doi: 10....

work page doi:10.59275/j.melba.2024-4862 2023
[5]

ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping

Cher Bass, Mariana da Silva, Carole Sudre, Petru-Daniel Tudosiu, Stephen Smith, and Emma Robinson. ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping. In Advances in Neural Information Processing Systems, volume 33, pages 7697–7709. Curran Associates, Inc., 2020

work page 2020
[6]

Baumgartner, Lisa M

Christian F. Baumgartner, Lisa M. Koch, Kerem Can Tezcan, Jia Xi Ang, and Ender Konukoglu. Visual Feature Attribution Using Wasserstein GANs. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8309–8319, Salt Lake City, UT, USA, June 2018. IEEE. ISBN 978-1-5386- 6420-9. doi: 10.1109/CVPR.2018.00867

work page doi:10.1109/cvpr.2018.00867 2018
[7]

Brandenburg, Beat P

Johanna M. Brandenburg, Beat P. Müller-Stich, Martin Wagner, and Mihaela van der Schaar. Can surgeons trust AI? Perspectives on machine learning in surgery and the importance of eXplainable Artificial Intelligence (XAI). Langenbeck’s Archives of Surgery, 410(1), 2025. doi: 10.1007/s00423-025-03626-7

work page doi:10.1007/s00423-025-03626-7 2025
[8]

Association of genomic subtypes of lower-grade gliomas with shape features automatically extracted by a deep learning algorithm

Mateusz Buda, Ashirbani Saha, and Maciej Mazurowski. Association of genomic subtypes of lower-grade gliomas with shape features automatically extracted by a deep learning algorithm. Computers in Biology and Medicine, 109, May 2019. doi: 10.1016/j.compbiomed.2019.05.002

work page doi:10.1016/j.compbiomed.2019.05.002 2019
[9]

What Do Different Evaluation Metrics Tell Us About Saliency Models? IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(3):740–757, March 2019

Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, and Fredo Durand. What Do Different Evaluation Metrics Tell Us About Saliency Models? IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(3):740–757, March 2019. doi: 10.1109/TPAMI.2018.2815601

work page doi:10.1109/tpami.2018.2815601 2019
[10]

Survey of Explainable AI Techniques in Healthcare

Ahmad Chaddad, Jihao Peng, Jian Xu, and Ahmed Bouridane. Survey of Explainable AI Techniques in Healthcare. Sensors, 23(2):634, January 2023. doi: 10.3390/s23020634

work page doi:10.3390/s23020634 2023
[11]

Clough, Ilkay Oksuz, Esther Puyol-Antón, Bram Ruijsink, Andrew P

James R. Clough, Ilkay Oksuz, Esther Puyol-Antón, Bram Ruijsink, Andrew P. King, and Julia A. Schnabel. Global and Local Interpretability for Cardiac MRI Classification. In Dinggang Shen, Tianming Liu, Terry M. Peters, Lawrence H. Staib, Caroline Essert, Sean Zhou, Pew-Thian Yap, and Ali Khan, editors,Medical Image Computing and Computer Assisted Interven...

work page doi:10.1007/978-3-030-32251-9_72 2019
[12]

Real Time Image Saliency for Black Box Classifiers

Piotr Dabkowski and Yarin Gal. Real Time Image Saliency for Black Box Classifiers. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

work page 2017
[13]

Bercea, Emily Chan, and Julia A

Maxime Di Folco, Cosmin I. Bercea, Emily Chan, and Julia A. Schnabel. Interpretable Representation Learning of Cardiac MRI via Attribute Regularization. In Marius George Linguraru, Qi Dou, Aasa Feragen, Stamatia Giannarou, Ben Glocker, Karim Lekadir, and Julia A. Schnabel, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2024, ...

work page doi:10.1007/978-3-031-72117-5_46 2024
[14]

Lugmayr, M

Amil Dravid, Florian Schiffers, Boqing Gong, and Aggelos K. Katsaggelos. medXGAN: Visual Expla- nations for Medical Classifiers through a Generative Latent Space. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2935–2944, New Orleans, LA, USA, June 2022. IEEE. ISBN 978-1-6654-8739-9. doi: 10.1109/CVPRW56347....

work page doi:10.1109/cvprw56347.2022.00331 2022
[15]

Janizek, Pascal Sturmfels, Scott M

Gabriel Erion, Joseph D. Janizek, Pascal Sturmfels, Scott M. Lundberg, and Su-In Lee. Improving performance of deep learning models with axiomatic attribution priors and expected gradients. Nature Machine Intelligence, 3(7):620–631, July 2021. doi: 10.1038/s42256-021-00343-w. 11

work page doi:10.1038/s42256-021-00343-w 2021
[16]

Fong and Andrea Vedaldi

Ruth C. Fong and Andrea Vedaldi. Interpretable Explanations of Black Boxes by Meaningful Perturbation. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 3449–3457, Venice, October

work page 2017
[17]

Human Uncertainty Makes Classification More Robust

IEEE. ISBN 978-1-5386-1032-9. doi: 10.1109/ICCV .2017.371

work page doi:10.1109/iccv 2017
[18]

A Benchmark for Interpretability Methods in Deep Neural Networks

Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. A Benchmark for Interpretability Methods in Deep Neural Networks. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

work page 2019
[19]

Mouton, Md Sirajus Salekin, Yu Sun, and Dmitry Goldgof

Md Imran Hossain, Ghada Zamzmi, Peter R. Mouton, Md Sirajus Salekin, Yu Sun, and Dmitry Goldgof. Explainable AI for Medical Data: Current Methods, Limitations, and Future Directions. ACM Computing Surveys, 57(6):1–46, June 2025. doi: 10.1145/3637487

work page doi:10.1145/3637487 2025
[20]

Evaluation Metrics for XAI: A Review, Taxonomy, and Practical Applications

Md Abdul Kadir, Amir Mosavi, and Daniel Sonntag. Evaluation Metrics for XAI: A Review, Taxonomy, and Practical Applications. In 2023 IEEE 27th International Conference on Intelligent Engineering Systems (INES), pages 000111–000124, Nairobi, Kenya, July 2023. IEEE. ISBN 979-8-3503-2851-6. doi: 10.1109/INES59282.2023.10297629

work page doi:10.1109/ines59282.2023.10297629 2023
[21]

In: CVPR

Andrei Kapishnikov, Subhashini Venugopalan, Besim Avci, Ben Wedin, Michael Terry, and Tolga Boluk- basi. Guided Integrated Gradients: An Adaptive Path Method for Removing Noise. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5048–5056, Nashville, TN, USA, June 2021. IEEE. ISBN 978-1-6654-4509-2. doi: 10.1109/CVPR46437...

work page doi:10.1109/cvpr46437.2021.00501 2021
[22]

Sunnie S. Y . Kim, Nicole Meister, Vikram V . Ramaswamy, Ruth Fong, and Olga Russakovsky. HIVE: Evaluating the Human Interpretability of Visual Explanations. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors,Computer Vision – ECCV 2022, pages 280–298, Cham, 2022. Springer Nature Switzerland. ISBN 978-3-03...

work page doi:10.1007/978-3-031-19775-8_ 2022
[23]

Schütt, Sven Dähne, Dumitru Erhan, and Been Kim

Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. The (Un)reliability of Saliency Methods. In Wojciech Samek, Grégoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Müller, editors,Explainable AI: Interpret- ing, Explaining and Visualizing Deep Learning, pages 26...

work page
[24]

doi: 10.1007/978-3-030-28954-6_14

ISBN 978-3-030-28954-6. doi: 10.1007/978-3-030-28954-6_14

work page doi:10.1007/978-3-030-28954-6_14
[25]

Kingma and Max Welling

Diederik P. Kingma and Max Welling. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014

work page 2014
[26]

An Evaluation of the Human-Interpretability of Explanation, August 2019

Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Been Kim, Sam Gershman, and Finale Doshi- Velez. An Evaluation of the Human-Interpretability of Explanation, August 2019

work page 2019
[27]

Masked feature prediction for self-supervised visual pre-training

Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, and Baining Guo. Swin Transformer V2: Scaling Up Capacity and Resolution. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11999–12009, June 2022. doi: 10.1109/CVPR52688.2022.01170

work page doi:10.1109/cvpr52688.2022.01170 2022
[28]

Lundstrom, Tianjian Huang, and Meisam Razaviyayn

Daniel D. Lundstrom, Tianjian Huang, and Meisam Razaviyayn. A Rigorous Study of Integrated Gradi- ents Method and Extensions to Internal Neuron Attributions. In Proceedings of the 39th International Conference on Machine Learning, pages 14485–14508. PMLR, June 2022

work page 2022
[29]

GANterfactual— Counterfactual Explanations for Medical Non-experts Using Generative Adversarial Learning

Silvan Mertes, Tobias Huber, Katharina Weitz, Alexander Heimerl, and Elisabeth André. GANterfactual— Counterfactual Explanations for Medical Non-experts Using Generative Adversarial Learning. Frontiers in Artificial Intelligence, 5, April 2022. doi: 10.3389/frai.2022.825565

work page doi:10.3389/frai.2022.825565 2022
[30]

Xiang Wang, Y

Dang Minh, H. Xiang Wang, Y . Fen Li, and Tan N. Nguyen. Explainable artificial intelligence: A comprehensive review. Artificial Intelligence Review , 55(5):3503–3568, June 2022. doi: 10.1007/ s10462-021-10088-y

work page 2022
[31]

Rehg, Mehul A

Supriya Nagesh, Nina Mishra, Yonatan Naamad, James M. Rehg, Mehul A. Shah, and Alexei Wagner. Explaining a machine learning decision to physicians via counterfactuals. InProceedings of the Conference on Health, Inference, and Learning, pages 556–577. PMLR, June 2023

work page 2023
[32]

How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human- Interpretability of Explanation, February 2018

Menaka Narayanan, Emily Chen, Jeffrey He, Been Kim, Sam Gershman, and Finale Doshi-Velez. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human- Interpretability of Explanation, February 2018

work page 2018
[33]

Counterfactual Explanations via Riemannian Latent Space Traversal, November 2024

Paraskevas Pegios, Aasa Feragen, Andreas Abildtrup Hansen, and Georgios Arvanitidis. Counterfactual Explanations via Riemannian Latent Space Traversal, November 2024. 12

work page 2024
[34]

Using StyleGAN for Visual Interpretability of Deep Learning Models on Medical Images, January 2021

Kathryn Schutte, Olivier Moindrot, Paul Hérent, Jean-Baptiste Schiratti, and Simon Jégou. Using StyleGAN for Visual Interpretability of Deep Learning Models on Medical Images, January 2021

work page 2021
[35]

Visualizing the Impact of Feature Attribution Baselines

Pascal Sturmfels, Scott Lundberg, and Su-In Lee. Visualizing the Impact of Feature Attribution Baselines. Distill, 5(1):e22, January 2020. doi: 10.23915/distill.00022

work page doi:10.23915/distill.00022 2020
[36]

Schuller

Qiyang Sun, Alican Akman, and Björn W. Schuller. Explainable Artificial Intelligence for Medical Applications: A Review. ACM Transactions on Computing for Healthcare, 6(2):1–31, April 2025. doi: 10.1145/3709367

work page doi:10.1145/3709367 2025
[37]

Axiomatic Attribution for Deep Networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic Attribution for Deep Networks. In Pro- ceedings of the 34th International Conference on Machine Learning , pages 3319–3328. PMLR, July 2017

work page 2017
[38]

Beyond Known Reality: Exploiting Counterfactual Explanations for Medical Research, February 2025

Toygar Tanyel, Serkan Ayvaz, and Bilgin Keserci. Beyond Known Reality: Exploiting Counterfactual Explanations for Medical Research, February 2025

work page 2025
[39]

Bas H. M. van der Velden, Hugo J. Kuijf, Kenneth G. A. Gilhuijs, and Max A. Viergever. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Medical Image Analysis, 79: 102470, July 2022. doi: 10.1016/j.media.2022.102470

work page doi:10.1016/j.media.2022.102470 2022
[40]

Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation

Nina Weng, Paraskevas Pegios, Eike Petersen, Aasa Feragen, and Siavash Bigdeli. Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation. In Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and Gül Varol, editors, European Conference on Computer Vision, volume 15144, pages 338–357, Cham, 2025. Springer Nature Sw...

work page doi:10.1007/978-3-031-73016-0_20 2025
[41]

On the (In)fidelity and Sensitivity of Explanations

Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Suggala, David I Inouye, and Pradeep K Ravikumar. On the (In)fidelity and Sensitivity of Explanations. In Advances in Neural Information Processing Systems , volume 32. Curran Associates, Inc., 2019

work page 2019
[42]

Siim-acr pneumothorax segmentation

Anna Zawacki, Carol Wu, George Shih, Julia Elliott, Mikhail Fomitchev, Mohannad Hussain, ParasLakhani, Phil Culliton, and Shunxing Bao. Siim-acr pneumothorax segmentation. https://kaggle.com/ competitions/siim-acr-pneumothorax-segmentation , 2019. Kaggle. 13 Appendix A Additional evaluation metrics 15 A.1 Top-k ablation . . . . . . . . . . . . . . . . . ....

work page 2019

[1] [1]

Sanity Checks for Saliency Maps

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity Checks for Saliency Maps. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

work page 2018

[2] [2]

Towards better understanding of gradient-based attribution methods for Deep Neural Networks

Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. Towards better understanding of gradient-based attribution methods for Deep Neural Networks. In 6th International Conference on Learning Representations, ICLR 2018, 2018. doi: 10.3929/ethz-b-000249929

work page doi:10.3929/ethz-b-000249929 2018

[3] [3]

Angelov, Eduardo A

Plamen P. Angelov, Eduardo A. Soares, Richard Jiang, Nicholas I. Arnold, and Peter M. Atkinson. Explainable artificial intelligence: An analytical review. WIREs Data Mining and Knowledge Discovery, 11(5):e1424, 2021. doi: 10.1002/widm.1424

work page doi:10.1002/widm.1424 2021

[4] [4]

Kirschke, and Matthias Keicher

Matan Atad, David Schinz, Hendrik Moeller, Robert Graf, Benedikt Wiestler, Daniel Rueckert, Nassir Navab, Jan S. Kirschke, and Matthias Keicher. Counterfactual Explanations for Medical Image Classifica- tion and Regression using Diffusion Autoencoder. Machine Learning for Biomedical Imaging, 2(iMIMIC 2023 special issue):2103–2125, September 2024. doi: 10....

work page doi:10.59275/j.melba.2024-4862 2023

[5] [5]

ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping

Cher Bass, Mariana da Silva, Carole Sudre, Petru-Daniel Tudosiu, Stephen Smith, and Emma Robinson. ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping. In Advances in Neural Information Processing Systems, volume 33, pages 7697–7709. Curran Associates, Inc., 2020

work page 2020

[6] [6]

Baumgartner, Lisa M

Christian F. Baumgartner, Lisa M. Koch, Kerem Can Tezcan, Jia Xi Ang, and Ender Konukoglu. Visual Feature Attribution Using Wasserstein GANs. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8309–8319, Salt Lake City, UT, USA, June 2018. IEEE. ISBN 978-1-5386- 6420-9. doi: 10.1109/CVPR.2018.00867

work page doi:10.1109/cvpr.2018.00867 2018

[7] [7]

Brandenburg, Beat P

Johanna M. Brandenburg, Beat P. Müller-Stich, Martin Wagner, and Mihaela van der Schaar. Can surgeons trust AI? Perspectives on machine learning in surgery and the importance of eXplainable Artificial Intelligence (XAI). Langenbeck’s Archives of Surgery, 410(1), 2025. doi: 10.1007/s00423-025-03626-7

work page doi:10.1007/s00423-025-03626-7 2025

[8] [8]

Association of genomic subtypes of lower-grade gliomas with shape features automatically extracted by a deep learning algorithm

Mateusz Buda, Ashirbani Saha, and Maciej Mazurowski. Association of genomic subtypes of lower-grade gliomas with shape features automatically extracted by a deep learning algorithm. Computers in Biology and Medicine, 109, May 2019. doi: 10.1016/j.compbiomed.2019.05.002

work page doi:10.1016/j.compbiomed.2019.05.002 2019

[9] [9]

What Do Different Evaluation Metrics Tell Us About Saliency Models? IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(3):740–757, March 2019

Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, and Fredo Durand. What Do Different Evaluation Metrics Tell Us About Saliency Models? IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(3):740–757, March 2019. doi: 10.1109/TPAMI.2018.2815601

work page doi:10.1109/tpami.2018.2815601 2019

[10] [10]

Survey of Explainable AI Techniques in Healthcare

Ahmad Chaddad, Jihao Peng, Jian Xu, and Ahmed Bouridane. Survey of Explainable AI Techniques in Healthcare. Sensors, 23(2):634, January 2023. doi: 10.3390/s23020634

work page doi:10.3390/s23020634 2023

[11] [11]

Clough, Ilkay Oksuz, Esther Puyol-Antón, Bram Ruijsink, Andrew P

James R. Clough, Ilkay Oksuz, Esther Puyol-Antón, Bram Ruijsink, Andrew P. King, and Julia A. Schnabel. Global and Local Interpretability for Cardiac MRI Classification. In Dinggang Shen, Tianming Liu, Terry M. Peters, Lawrence H. Staib, Caroline Essert, Sean Zhou, Pew-Thian Yap, and Ali Khan, editors,Medical Image Computing and Computer Assisted Interven...

work page doi:10.1007/978-3-030-32251-9_72 2019

[12] [12]

Real Time Image Saliency for Black Box Classifiers

Piotr Dabkowski and Yarin Gal. Real Time Image Saliency for Black Box Classifiers. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

work page 2017

[13] [13]

Bercea, Emily Chan, and Julia A

Maxime Di Folco, Cosmin I. Bercea, Emily Chan, and Julia A. Schnabel. Interpretable Representation Learning of Cardiac MRI via Attribute Regularization. In Marius George Linguraru, Qi Dou, Aasa Feragen, Stamatia Giannarou, Ben Glocker, Karim Lekadir, and Julia A. Schnabel, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2024, ...

work page doi:10.1007/978-3-031-72117-5_46 2024

[14] [14]

Lugmayr, M

Amil Dravid, Florian Schiffers, Boqing Gong, and Aggelos K. Katsaggelos. medXGAN: Visual Expla- nations for Medical Classifiers through a Generative Latent Space. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2935–2944, New Orleans, LA, USA, June 2022. IEEE. ISBN 978-1-6654-8739-9. doi: 10.1109/CVPRW56347....

work page doi:10.1109/cvprw56347.2022.00331 2022

[15] [15]

Janizek, Pascal Sturmfels, Scott M

Gabriel Erion, Joseph D. Janizek, Pascal Sturmfels, Scott M. Lundberg, and Su-In Lee. Improving performance of deep learning models with axiomatic attribution priors and expected gradients. Nature Machine Intelligence, 3(7):620–631, July 2021. doi: 10.1038/s42256-021-00343-w. 11

work page doi:10.1038/s42256-021-00343-w 2021

[16] [16]

Fong and Andrea Vedaldi

Ruth C. Fong and Andrea Vedaldi. Interpretable Explanations of Black Boxes by Meaningful Perturbation. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 3449–3457, Venice, October

work page 2017

[17] [17]

Human Uncertainty Makes Classification More Robust

IEEE. ISBN 978-1-5386-1032-9. doi: 10.1109/ICCV .2017.371

work page doi:10.1109/iccv 2017

[18] [18]

A Benchmark for Interpretability Methods in Deep Neural Networks

Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. A Benchmark for Interpretability Methods in Deep Neural Networks. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

work page 2019

[19] [19]

Mouton, Md Sirajus Salekin, Yu Sun, and Dmitry Goldgof

Md Imran Hossain, Ghada Zamzmi, Peter R. Mouton, Md Sirajus Salekin, Yu Sun, and Dmitry Goldgof. Explainable AI for Medical Data: Current Methods, Limitations, and Future Directions. ACM Computing Surveys, 57(6):1–46, June 2025. doi: 10.1145/3637487

work page doi:10.1145/3637487 2025

[20] [20]

Evaluation Metrics for XAI: A Review, Taxonomy, and Practical Applications

Md Abdul Kadir, Amir Mosavi, and Daniel Sonntag. Evaluation Metrics for XAI: A Review, Taxonomy, and Practical Applications. In 2023 IEEE 27th International Conference on Intelligent Engineering Systems (INES), pages 000111–000124, Nairobi, Kenya, July 2023. IEEE. ISBN 979-8-3503-2851-6. doi: 10.1109/INES59282.2023.10297629

work page doi:10.1109/ines59282.2023.10297629 2023

[21] [21]

In: CVPR

Andrei Kapishnikov, Subhashini Venugopalan, Besim Avci, Ben Wedin, Michael Terry, and Tolga Boluk- basi. Guided Integrated Gradients: An Adaptive Path Method for Removing Noise. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5048–5056, Nashville, TN, USA, June 2021. IEEE. ISBN 978-1-6654-4509-2. doi: 10.1109/CVPR46437...

work page doi:10.1109/cvpr46437.2021.00501 2021

[22] [22]

Sunnie S. Y . Kim, Nicole Meister, Vikram V . Ramaswamy, Ruth Fong, and Olga Russakovsky. HIVE: Evaluating the Human Interpretability of Visual Explanations. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors,Computer Vision – ECCV 2022, pages 280–298, Cham, 2022. Springer Nature Switzerland. ISBN 978-3-03...

work page doi:10.1007/978-3-031-19775-8_ 2022

[23] [23]

Schütt, Sven Dähne, Dumitru Erhan, and Been Kim

Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. The (Un)reliability of Saliency Methods. In Wojciech Samek, Grégoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Müller, editors,Explainable AI: Interpret- ing, Explaining and Visualizing Deep Learning, pages 26...

work page

[24] [24]

doi: 10.1007/978-3-030-28954-6_14

ISBN 978-3-030-28954-6. doi: 10.1007/978-3-030-28954-6_14

work page doi:10.1007/978-3-030-28954-6_14

[25] [25]

Kingma and Max Welling

Diederik P. Kingma and Max Welling. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014

work page 2014

[26] [26]

An Evaluation of the Human-Interpretability of Explanation, August 2019

Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Been Kim, Sam Gershman, and Finale Doshi- Velez. An Evaluation of the Human-Interpretability of Explanation, August 2019

work page 2019

[27] [27]

Masked feature prediction for self-supervised visual pre-training

Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, and Baining Guo. Swin Transformer V2: Scaling Up Capacity and Resolution. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11999–12009, June 2022. doi: 10.1109/CVPR52688.2022.01170

work page doi:10.1109/cvpr52688.2022.01170 2022

[28] [28]

Lundstrom, Tianjian Huang, and Meisam Razaviyayn

Daniel D. Lundstrom, Tianjian Huang, and Meisam Razaviyayn. A Rigorous Study of Integrated Gradi- ents Method and Extensions to Internal Neuron Attributions. In Proceedings of the 39th International Conference on Machine Learning, pages 14485–14508. PMLR, June 2022

work page 2022

[29] [29]

GANterfactual— Counterfactual Explanations for Medical Non-experts Using Generative Adversarial Learning

Silvan Mertes, Tobias Huber, Katharina Weitz, Alexander Heimerl, and Elisabeth André. GANterfactual— Counterfactual Explanations for Medical Non-experts Using Generative Adversarial Learning. Frontiers in Artificial Intelligence, 5, April 2022. doi: 10.3389/frai.2022.825565

work page doi:10.3389/frai.2022.825565 2022

[30] [30]

Xiang Wang, Y

Dang Minh, H. Xiang Wang, Y . Fen Li, and Tan N. Nguyen. Explainable artificial intelligence: A comprehensive review. Artificial Intelligence Review , 55(5):3503–3568, June 2022. doi: 10.1007/ s10462-021-10088-y

work page 2022

[31] [31]

Rehg, Mehul A

Supriya Nagesh, Nina Mishra, Yonatan Naamad, James M. Rehg, Mehul A. Shah, and Alexei Wagner. Explaining a machine learning decision to physicians via counterfactuals. InProceedings of the Conference on Health, Inference, and Learning, pages 556–577. PMLR, June 2023

work page 2023

[32] [32]

How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human- Interpretability of Explanation, February 2018

Menaka Narayanan, Emily Chen, Jeffrey He, Been Kim, Sam Gershman, and Finale Doshi-Velez. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human- Interpretability of Explanation, February 2018

work page 2018

[33] [33]

Counterfactual Explanations via Riemannian Latent Space Traversal, November 2024

Paraskevas Pegios, Aasa Feragen, Andreas Abildtrup Hansen, and Georgios Arvanitidis. Counterfactual Explanations via Riemannian Latent Space Traversal, November 2024. 12

work page 2024

[34] [34]

Using StyleGAN for Visual Interpretability of Deep Learning Models on Medical Images, January 2021

Kathryn Schutte, Olivier Moindrot, Paul Hérent, Jean-Baptiste Schiratti, and Simon Jégou. Using StyleGAN for Visual Interpretability of Deep Learning Models on Medical Images, January 2021

work page 2021

[35] [35]

Visualizing the Impact of Feature Attribution Baselines

Pascal Sturmfels, Scott Lundberg, and Su-In Lee. Visualizing the Impact of Feature Attribution Baselines. Distill, 5(1):e22, January 2020. doi: 10.23915/distill.00022

work page doi:10.23915/distill.00022 2020

[36] [36]

Schuller

Qiyang Sun, Alican Akman, and Björn W. Schuller. Explainable Artificial Intelligence for Medical Applications: A Review. ACM Transactions on Computing for Healthcare, 6(2):1–31, April 2025. doi: 10.1145/3709367

work page doi:10.1145/3709367 2025

[37] [37]

Axiomatic Attribution for Deep Networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic Attribution for Deep Networks. In Pro- ceedings of the 34th International Conference on Machine Learning , pages 3319–3328. PMLR, July 2017

work page 2017

[38] [38]

Beyond Known Reality: Exploiting Counterfactual Explanations for Medical Research, February 2025

Toygar Tanyel, Serkan Ayvaz, and Bilgin Keserci. Beyond Known Reality: Exploiting Counterfactual Explanations for Medical Research, February 2025

work page 2025

[39] [39]

Bas H. M. van der Velden, Hugo J. Kuijf, Kenneth G. A. Gilhuijs, and Max A. Viergever. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Medical Image Analysis, 79: 102470, July 2022. doi: 10.1016/j.media.2022.102470

work page doi:10.1016/j.media.2022.102470 2022

[40] [40]

Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation

Nina Weng, Paraskevas Pegios, Eike Petersen, Aasa Feragen, and Siavash Bigdeli. Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation. In Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and Gül Varol, editors, European Conference on Computer Vision, volume 15144, pages 338–357, Cham, 2025. Springer Nature Sw...

work page doi:10.1007/978-3-031-73016-0_20 2025

[41] [41]

On the (In)fidelity and Sensitivity of Explanations

Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Suggala, David I Inouye, and Pradeep K Ravikumar. On the (In)fidelity and Sensitivity of Explanations. In Advances in Neural Information Processing Systems , volume 32. Curran Associates, Inc., 2019

work page 2019

[42] [42]

Siim-acr pneumothorax segmentation

Anna Zawacki, Carol Wu, George Shih, Julia Elliott, Mikhail Fomitchev, Mohannad Hussain, ParasLakhani, Phil Culliton, and Shunxing Bao. Siim-acr pneumothorax segmentation. https://kaggle.com/ competitions/siim-acr-pneumothorax-segmentation , 2019. Kaggle. 13 Appendix A Additional evaluation metrics 15 A.1 Top-k ablation . . . . . . . . . . . . . . . . . ....

work page 2019