On the notion of missingness for path attribution explainability methods in medical settings: Guiding the selection of medically meaningful baselines
Pith reviewed 2026-05-18 22:04 UTC · model grok-4.3
The pith
Counterfactual baselines for path attribution methods yield more faithful explanations in medical imaging by satisfying semantic missingness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that counterfactual baselines, created by generative models such as VAEs and diffusion models to represent clinically normal versions of pathological inputs, satisfy a stricter semantic missingness condition and thereby deliver more faithful attributions for path attribution methods compared to conventional baselines. This is supported by derived theoretical guarantees of improved faithfulness and validated empirically across three diverse medical datasets, where the approach also outperforms using the counterfactual directly as an explanation.
What carries the argument
Semantic missingness: a baseline that represents a clinically plausible state in which disease-related features are absent, which motivates and justifies the use of synthetically generated counterfactuals as the reference input for attribution calculations.
If this is right
- Attributions become more aligned with actual disease manifestations rather than baseline artifacts.
- The framework applies to any path attribution method and any suitable counterfactual generator.
- Empirical superiority holds across varied medical imaging datasets.
- Bridging attribution and counterfactual paradigms improves overall explainability.
- Clinically, this supports better trust and transparency in deep learning models for diagnosis.
Where Pith is reading between the lines
- Clinicians could use these explanations to verify AI decisions against expected pathology locations.
- The idea might extend to non-imaging medical data like time series or tabular records where semantic absence matters.
- Comparing to other explainability techniques could reveal broader applicability.
- Real-world deployment would require validation that generated counterfactuals match expert judgment of normality.
Load-bearing premise
Synthetically generated counterfactual images must accurately represent a clinically plausible state without the disease features for the faithfulness improvement to hold.
What would settle it
Observing that Integrated Gradients attributions using the counterfactual baseline fail to better highlight known disease regions or agree with expert annotations than those using a zero baseline on a held-out medical dataset would falsify the improved faithfulness claim.
Figures
read the original abstract
The explainability of deep learning models remains a significant challenge, particularly in the medical domain where interpretable outputs are essential for clinical trust and transparency. Path attribution methods such as Integrated Gradients rely on a baseline that represents the absence of informative features, a notion commonly referred to as missingness. Standard baselines, such as all-zero inputs, are often semantically meaningless in medical contexts, where intensity values carry clinical significance. In this work, we revisit the notion of missingness for medical imaging, expose the limitations of standard baselines in this setting, and formalize a stricter missingness we term semantic missingness: a baseline must not merely lack signal, but must represent a clinically plausible state in which the disease-related features are absent. This formulation motivates a counterfactual-guided approach to baseline selection, in which a synthetically generated counterfactual (i.e. a clinically normal variant of the pathological input) serves as a principled and semantically meaningful reference. We derive theoretical guarantees showing that counterfactual baselines yield more faithful attributions than standard alternatives, and empirically validate this with two complementary counterfactual generative models, a VAE and a diffusion model, though the concept is model-agnostic and compatible with any suitable counterfactual method. Across three diverse medical datasets, counterfactual baselines produce more faithful and medically relevant attributions, outperforming standard baseline choices as well as related methods. Notably, we also compare against using the counterfactual directly as an explanation (an established paradigm in its own) and show that employing it as a baseline for Integrated Gradients yields superior results, thereby bridging two complementary explainability paradigms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that standard baselines for path attribution methods like Integrated Gradients are semantically meaningless in medical imaging and proposes 'semantic missingness' as a stricter criterion: baselines must represent clinically plausible states with disease-related features absent. It motivates using synthetically generated counterfactuals (via VAE or diffusion models) as baselines, derives theoretical guarantees for improved faithfulness over zero/mean baselines, and empirically shows superior performance on three medical datasets while also outperforming direct use of counterfactuals as explanations.
Significance. If the central claims hold, the work could meaningfully advance explainable AI for medical applications by providing a principled baseline selection strategy that aligns with clinical semantics and bridges attribution-based and counterfactual explanation paradigms. The use of two distinct generative models and multiple datasets adds robustness to the empirical component, and the model-agnostic framing increases potential impact if the faithfulness improvements are independently verifiable.
major comments (2)
- [§4] §4 (Theoretical Guarantees): The derivation of theoretical guarantees for superior faithfulness of counterfactual baselines presupposes that the generated counterfactuals satisfy semantic missingness exactly (i.e., no residual disease signals or anatomical distortions). No quantitative bounds, error analysis, or formal verification of this property are provided for the VAE and diffusion outputs; this assumption is load-bearing for both the guarantees and the claim of outperformance over standard baselines.
- [§5] §5 (Empirical Evaluation, e.g. faithfulness metrics and Table 2): The reported superiority in faithfulness metrics may be circular if those metrics are computed with respect to the same counterfactual generation process used to create the baselines. The manuscript should explicitly demonstrate independence between the evaluation criteria and the generative models, or provide an external validation (e.g., clinician ratings of attribution plausibility) to support the central empirical claim.
minor comments (2)
- [Abstract and §3] The abstract states that the approach is 'model-agnostic and compatible with any suitable counterfactual method,' but the main text provides limited discussion of failure modes or requirements for alternative generators beyond VAE and diffusion.
- [Figures] Figure captions and axis labels in the attribution visualization panels could be expanded to explicitly indicate which baseline was used in each column for easier cross-comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment below, indicating planned revisions where appropriate while maintaining the integrity of our theoretical and empirical contributions.
read point-by-point responses
-
Referee: [§4] §4 (Theoretical Guarantees): The derivation of theoretical guarantees for superior faithfulness of counterfactual baselines presupposes that the generated counterfactuals satisfy semantic missingness exactly (i.e., no residual disease signals or anatomical distortions). No quantitative bounds, error analysis, or formal verification of this property are provided for the VAE and diffusion outputs; this assumption is load-bearing for both the guarantees and the claim of outperformance over standard baselines.
Authors: We agree that the theoretical results in §4 are derived under the assumption of exact semantic missingness. In the revised manuscript we will (i) state this assumption explicitly at the beginning of the theoretical section, (ii) add a dedicated paragraph discussing the practical deviation from the ideal case, and (iii) include quantitative diagnostics (e.g., residual disease-signal scores obtained from an independent downstream classifier) that bound the approximation error of both the VAE and diffusion counterfactuals. These additions will clarify the scope of the guarantees without changing their formal statements. revision: partial
-
Referee: [§5] §5 (Empirical Evaluation, e.g. faithfulness metrics and Table 2): The reported superiority in faithfulness metrics may be circular if those metrics are computed with respect to the same counterfactual generation process used to create the baselines. The manuscript should explicitly demonstrate independence between the evaluation criteria and the generative models, or provide an external validation (e.g., clinician ratings of attribution plausibility) to support the central empirical claim.
Authors: The faithfulness metrics reported in §5 (insertion/deletion AUC and ROAR) are standard, model-agnostic attribution-evaluation protocols that measure the predictive model’s output change under feature occlusion; they do not invoke the VAE or diffusion generators at evaluation time. We will add an explicit paragraph in §5 that documents this separation and confirms that the evaluation pipeline shares no parameters or data with the counterfactual generators. While clinician ratings would constitute valuable supplementary evidence, the current multi-dataset, multi-generator quantitative results already provide independent support for the claims; we will note clinician validation as a worthwhile direction for follow-up work. revision: partial
Circularity Check
No significant circularity; derivation remains independent of inputs
full rationale
The paper formalizes semantic missingness as a clinically plausible state without disease features and claims independent theoretical guarantees that counterfactual baselines produce more faithful attributions than zero or mean baselines. No equations or definitions in the provided text reduce the faithfulness metric or the guarantees to the counterfactual generation process by construction; the guarantees are presented as first-principles results following from the missingness definition rather than tautological restatements. Empirical comparisons use separate VAE and diffusion generators on three datasets and are not framed as predictions fitted to the same data used for the theory. No self-citation chains, ansatz smuggling, or renaming of known results appear as load-bearing steps. The central claim therefore retains independent content and does not collapse to its own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Counterfactual generative models can produce images that represent clinically normal variants without disease features
Reference graph
Works this paper leans on
-
[1]
Sanity Checks for Saliency Maps
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity Checks for Saliency Maps. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018
work page 2018
-
[2]
Towards better understanding of gradient-based attribution methods for Deep Neural Networks
Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. Towards better understanding of gradient-based attribution methods for Deep Neural Networks. In 6th International Conference on Learning Representations, ICLR 2018, 2018. doi: 10.3929/ethz-b-000249929
-
[3]
Plamen P. Angelov, Eduardo A. Soares, Richard Jiang, Nicholas I. Arnold, and Peter M. Atkinson. Explainable artificial intelligence: An analytical review. WIREs Data Mining and Knowledge Discovery, 11(5):e1424, 2021. doi: 10.1002/widm.1424
-
[4]
Kirschke, and Matthias Keicher
Matan Atad, David Schinz, Hendrik Moeller, Robert Graf, Benedikt Wiestler, Daniel Rueckert, Nassir Navab, Jan S. Kirschke, and Matthias Keicher. Counterfactual Explanations for Medical Image Classifica- tion and Regression using Diffusion Autoencoder. Machine Learning for Biomedical Imaging, 2(iMIMIC 2023 special issue):2103–2125, September 2024. doi: 10....
-
[5]
ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping
Cher Bass, Mariana da Silva, Carole Sudre, Petru-Daniel Tudosiu, Stephen Smith, and Emma Robinson. ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping. In Advances in Neural Information Processing Systems, volume 33, pages 7697–7709. Curran Associates, Inc., 2020
work page 2020
-
[6]
Christian F. Baumgartner, Lisa M. Koch, Kerem Can Tezcan, Jia Xi Ang, and Ender Konukoglu. Visual Feature Attribution Using Wasserstein GANs. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8309–8319, Salt Lake City, UT, USA, June 2018. IEEE. ISBN 978-1-5386- 6420-9. doi: 10.1109/CVPR.2018.00867
-
[7]
Johanna M. Brandenburg, Beat P. Müller-Stich, Martin Wagner, and Mihaela van der Schaar. Can surgeons trust AI? Perspectives on machine learning in surgery and the importance of eXplainable Artificial Intelligence (XAI). Langenbeck’s Archives of Surgery, 410(1), 2025. doi: 10.1007/s00423-025-03626-7
-
[8]
Mateusz Buda, Ashirbani Saha, and Maciej Mazurowski. Association of genomic subtypes of lower-grade gliomas with shape features automatically extracted by a deep learning algorithm. Computers in Biology and Medicine, 109, May 2019. doi: 10.1016/j.compbiomed.2019.05.002
-
[9]
Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, and Fredo Durand. What Do Different Evaluation Metrics Tell Us About Saliency Models? IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(3):740–757, March 2019. doi: 10.1109/TPAMI.2018.2815601
-
[10]
Survey of Explainable AI Techniques in Healthcare
Ahmad Chaddad, Jihao Peng, Jian Xu, and Ahmed Bouridane. Survey of Explainable AI Techniques in Healthcare. Sensors, 23(2):634, January 2023. doi: 10.3390/s23020634
-
[11]
Clough, Ilkay Oksuz, Esther Puyol-Antón, Bram Ruijsink, Andrew P
James R. Clough, Ilkay Oksuz, Esther Puyol-Antón, Bram Ruijsink, Andrew P. King, and Julia A. Schnabel. Global and Local Interpretability for Cardiac MRI Classification. In Dinggang Shen, Tianming Liu, Terry M. Peters, Lawrence H. Staib, Caroline Essert, Sean Zhou, Pew-Thian Yap, and Ali Khan, editors,Medical Image Computing and Computer Assisted Interven...
-
[12]
Real Time Image Saliency for Black Box Classifiers
Piotr Dabkowski and Yarin Gal. Real Time Image Saliency for Black Box Classifiers. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017
work page 2017
-
[13]
Bercea, Emily Chan, and Julia A
Maxime Di Folco, Cosmin I. Bercea, Emily Chan, and Julia A. Schnabel. Interpretable Representation Learning of Cardiac MRI via Attribute Regularization. In Marius George Linguraru, Qi Dou, Aasa Feragen, Stamatia Giannarou, Ben Glocker, Karim Lekadir, and Julia A. Schnabel, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2024, ...
-
[14]
Amil Dravid, Florian Schiffers, Boqing Gong, and Aggelos K. Katsaggelos. medXGAN: Visual Expla- nations for Medical Classifiers through a Generative Latent Space. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2935–2944, New Orleans, LA, USA, June 2022. IEEE. ISBN 978-1-6654-8739-9. doi: 10.1109/CVPRW56347....
-
[15]
Janizek, Pascal Sturmfels, Scott M
Gabriel Erion, Joseph D. Janizek, Pascal Sturmfels, Scott M. Lundberg, and Su-In Lee. Improving performance of deep learning models with axiomatic attribution priors and expected gradients. Nature Machine Intelligence, 3(7):620–631, July 2021. doi: 10.1038/s42256-021-00343-w. 11
-
[16]
Ruth C. Fong and Andrea Vedaldi. Interpretable Explanations of Black Boxes by Meaningful Perturbation. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 3449–3457, Venice, October
work page 2017
-
[17]
Human Uncertainty Makes Classification More Robust
IEEE. ISBN 978-1-5386-1032-9. doi: 10.1109/ICCV .2017.371
-
[18]
A Benchmark for Interpretability Methods in Deep Neural Networks
Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. A Benchmark for Interpretability Methods in Deep Neural Networks. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019
work page 2019
-
[19]
Mouton, Md Sirajus Salekin, Yu Sun, and Dmitry Goldgof
Md Imran Hossain, Ghada Zamzmi, Peter R. Mouton, Md Sirajus Salekin, Yu Sun, and Dmitry Goldgof. Explainable AI for Medical Data: Current Methods, Limitations, and Future Directions. ACM Computing Surveys, 57(6):1–46, June 2025. doi: 10.1145/3637487
-
[20]
Evaluation Metrics for XAI: A Review, Taxonomy, and Practical Applications
Md Abdul Kadir, Amir Mosavi, and Daniel Sonntag. Evaluation Metrics for XAI: A Review, Taxonomy, and Practical Applications. In 2023 IEEE 27th International Conference on Intelligent Engineering Systems (INES), pages 000111–000124, Nairobi, Kenya, July 2023. IEEE. ISBN 979-8-3503-2851-6. doi: 10.1109/INES59282.2023.10297629
-
[21]
Andrei Kapishnikov, Subhashini Venugopalan, Besim Avci, Ben Wedin, Michael Terry, and Tolga Boluk- basi. Guided Integrated Gradients: An Adaptive Path Method for Removing Noise. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5048–5056, Nashville, TN, USA, June 2021. IEEE. ISBN 978-1-6654-4509-2. doi: 10.1109/CVPR46437...
-
[22]
Sunnie S. Y . Kim, Nicole Meister, Vikram V . Ramaswamy, Ruth Fong, and Olga Russakovsky. HIVE: Evaluating the Human Interpretability of Visual Explanations. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors,Computer Vision – ECCV 2022, pages 280–298, Cham, 2022. Springer Nature Switzerland. ISBN 978-3-03...
-
[23]
Schütt, Sven Dähne, Dumitru Erhan, and Been Kim
Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. The (Un)reliability of Saliency Methods. In Wojciech Samek, Grégoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Müller, editors,Explainable AI: Interpret- ing, Explaining and Visualizing Deep Learning, pages 26...
-
[24]
doi: 10.1007/978-3-030-28954-6_14
ISBN 978-3-030-28954-6. doi: 10.1007/978-3-030-28954-6_14
-
[25]
Diederik P. Kingma and Max Welling. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014
work page 2014
-
[26]
An Evaluation of the Human-Interpretability of Explanation, August 2019
Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Been Kim, Sam Gershman, and Finale Doshi- Velez. An Evaluation of the Human-Interpretability of Explanation, August 2019
work page 2019
-
[27]
Masked feature prediction for self-supervised visual pre-training
Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, and Baining Guo. Swin Transformer V2: Scaling Up Capacity and Resolution. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11999–12009, June 2022. doi: 10.1109/CVPR52688.2022.01170
-
[28]
Lundstrom, Tianjian Huang, and Meisam Razaviyayn
Daniel D. Lundstrom, Tianjian Huang, and Meisam Razaviyayn. A Rigorous Study of Integrated Gradi- ents Method and Extensions to Internal Neuron Attributions. In Proceedings of the 39th International Conference on Machine Learning, pages 14485–14508. PMLR, June 2022
work page 2022
-
[29]
Silvan Mertes, Tobias Huber, Katharina Weitz, Alexander Heimerl, and Elisabeth André. GANterfactual— Counterfactual Explanations for Medical Non-experts Using Generative Adversarial Learning. Frontiers in Artificial Intelligence, 5, April 2022. doi: 10.3389/frai.2022.825565
-
[30]
Dang Minh, H. Xiang Wang, Y . Fen Li, and Tan N. Nguyen. Explainable artificial intelligence: A comprehensive review. Artificial Intelligence Review , 55(5):3503–3568, June 2022. doi: 10.1007/ s10462-021-10088-y
work page 2022
-
[31]
Supriya Nagesh, Nina Mishra, Yonatan Naamad, James M. Rehg, Mehul A. Shah, and Alexei Wagner. Explaining a machine learning decision to physicians via counterfactuals. InProceedings of the Conference on Health, Inference, and Learning, pages 556–577. PMLR, June 2023
work page 2023
-
[32]
Menaka Narayanan, Emily Chen, Jeffrey He, Been Kim, Sam Gershman, and Finale Doshi-Velez. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human- Interpretability of Explanation, February 2018
work page 2018
-
[33]
Counterfactual Explanations via Riemannian Latent Space Traversal, November 2024
Paraskevas Pegios, Aasa Feragen, Andreas Abildtrup Hansen, and Georgios Arvanitidis. Counterfactual Explanations via Riemannian Latent Space Traversal, November 2024. 12
work page 2024
-
[34]
Using StyleGAN for Visual Interpretability of Deep Learning Models on Medical Images, January 2021
Kathryn Schutte, Olivier Moindrot, Paul Hérent, Jean-Baptiste Schiratti, and Simon Jégou. Using StyleGAN for Visual Interpretability of Deep Learning Models on Medical Images, January 2021
work page 2021
-
[35]
Visualizing the Impact of Feature Attribution Baselines
Pascal Sturmfels, Scott Lundberg, and Su-In Lee. Visualizing the Impact of Feature Attribution Baselines. Distill, 5(1):e22, January 2020. doi: 10.23915/distill.00022
-
[36]
Qiyang Sun, Alican Akman, and Björn W. Schuller. Explainable Artificial Intelligence for Medical Applications: A Review. ACM Transactions on Computing for Healthcare, 6(2):1–31, April 2025. doi: 10.1145/3709367
-
[37]
Axiomatic Attribution for Deep Networks
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic Attribution for Deep Networks. In Pro- ceedings of the 34th International Conference on Machine Learning , pages 3319–3328. PMLR, July 2017
work page 2017
-
[38]
Beyond Known Reality: Exploiting Counterfactual Explanations for Medical Research, February 2025
Toygar Tanyel, Serkan Ayvaz, and Bilgin Keserci. Beyond Known Reality: Exploiting Counterfactual Explanations for Medical Research, February 2025
work page 2025
-
[39]
Bas H. M. van der Velden, Hugo J. Kuijf, Kenneth G. A. Gilhuijs, and Max A. Viergever. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Medical Image Analysis, 79: 102470, July 2022. doi: 10.1016/j.media.2022.102470
-
[40]
Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation
Nina Weng, Paraskevas Pegios, Eike Petersen, Aasa Feragen, and Siavash Bigdeli. Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation. In Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and Gül Varol, editors, European Conference on Computer Vision, volume 15144, pages 338–357, Cham, 2025. Springer Nature Sw...
-
[41]
On the (In)fidelity and Sensitivity of Explanations
Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Suggala, David I Inouye, and Pradeep K Ravikumar. On the (In)fidelity and Sensitivity of Explanations. In Advances in Neural Information Processing Systems , volume 32. Curran Associates, Inc., 2019
work page 2019
-
[42]
Siim-acr pneumothorax segmentation
Anna Zawacki, Carol Wu, George Shih, Julia Elliott, Mikhail Fomitchev, Mohannad Hussain, ParasLakhani, Phil Culliton, and Shunxing Bao. Siim-acr pneumothorax segmentation. https://kaggle.com/ competitions/siim-acr-pneumothorax-segmentation , 2019. Kaggle. 13 Appendix A Additional evaluation metrics 15 A.1 Top-k ablation . . . . . . . . . . . . . . . . . ....
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.