FM-G-CAM: A Holistic Approach for Explainable AI in Computer Vision

Jordan J. Bird; Ravidu Suien Rammuni Silva

arxiv: 2312.05975 · v3 · pith:GJ4YWULXnew · submitted 2023-12-10 · 💻 cs.CV · cs.AI· cs.LG

FM-G-CAM: A Holistic Approach for Explainable AI in Computer Vision

Ravidu Suien Rammuni Silva , Jordan J. Bird This is my paper

classification 💻 cs.CV cs.AIcs.LG

keywords classfm-g-campredictionsactivationcomputerexistinggrad-camgradient-weighted

0 comments

read the original abstract

Explainability is a vital aspect of modern AI for real-world impact and usability. The main objective of this paper is to emphasise the need to understand the predictions of Computer Vision models, specifically Convolutional Neural Network (CNN) models. Existing methods for explaining CNN predictions are largely based on Gradient-weighted Class Activation Maps (Grad-CAM) and focus solely on a single target class; this assumption about the target class selection neglects a large portion of the predictor CNN's prediction process. In this paper, we present an exhaustive methodology, called Fused Multi-class Gradient-weighted Class Activation Map (FM-G-CAM), that considers multiple top-predicted classes and provides a holistic explanation of the predictor CNN's rationale. We also provide a detailed mathematical and algorithmic description of our method. Furthermore, alongside a concise comparison of existing methods, we compare FM-G-CAM with Grad-CAM, quantitatively and qualitatively highlighting its benefits through real-world practical use cases. Finally, we present an open-source Python library with an FM-G-CAM implementation to conveniently generate saliency maps for CNN-based model predictions.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation
cs.CV 2026-04 unverdicted novelty 5.0

An optimized KernelSHAP method for 3D medical image segmentation restricts computation to ROI and receptive fields, uses patch logit caching for 15-30% savings, and compares organ units versus supervoxels for clinical...