pith. sign in

arxiv: 2412.03884 · v3 · submitted 2024-12-05 · 💻 cs.AI

A Unified Framework for Evaluating and Enhancing the Transparency of Explainable AI Methods via Perturbation-Gradient Consensus Attribution

Pith reviewed 2026-05-23 08:12 UTC · model grok-4.3

classification 💻 cs.AI
keywords explainable AIattribution methodsPGCAfidelity evaluationinterpretability metricsfairness in XAIperturbation analysisgradient consensus
0
0 comments X

The pith

PGCA fuses grid perturbation maps with Grad-CAM++ to lead baselines in fidelity, interpretability and fairness scores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a unified evaluation framework that scores XAI methods on five criteria: fidelity measured by prediction-gap analysis, interpretability by a concentration-coherence-contrast composite, robustness by cosine-similarity stability under perturbation, fairness by Jensen-Shannon divergence across groups, and completeness by feature-ablation coverage. These scores are combined through an entropy-weighted dynamic scheme that adjusts to domain priorities. The authors also introduce Perturbation-Gradient Consensus Attribution (PGCA), which merges perturbation importance with gradient localization via consensus amplification. Across five image domains using fine-tuned ResNet-50 models, PGCA records the top values in fidelity, interpretability and fairness with p-values below 10 to the minus seven and stable method rankings under sensitivity checks.

Core claim

PGCA achieves the best performance in fidelity (2.22 ± 1.62), interpretability (3.89 ± 0.33), and fairness (4.95 ± 0.03), with statistically significant improvements over baselines (p < 10^{-7}). The method works by fusing grid-based perturbation importance with Grad-CAM++ through consensus amplification and adaptive contrast enhancement, while the accompanying framework formalizes the five criteria and integrates them with entropy-weighted scoring that adapts to domain needs.

What carries the argument

Perturbation-Gradient Consensus Attribution (PGCA), which fuses grid-based perturbation importance with Grad-CAM++ through consensus amplification and adaptive contrast enhancement to combine perturbation fidelity with gradient spatial precision.

If this is right

  • Method rankings remain consistent across domains with Kendall's tau of at least 0.88 under sensitivity analysis.
  • The entropy-weighted scheme permits the same framework to prioritize different criteria when moving from medical imaging to security screening.
  • PGCA delivers measurable gains on three of the five criteria while maintaining performance on the remaining two.
  • The multi-criteria scores allow direct comparison of any existing or future XAI method without ad-hoc single-metric tests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same metrics could be applied to non-image models such as transformers on text or tabular data to test whether the ranking patterns hold.
  • If the framework becomes standard, it could serve as a common benchmark when organizations compare explanation tools for regulatory compliance.
  • An open extension would be to replace the entropy weights with learned weights from a small set of human preference labels.

Load-bearing premise

The five proposed metrics together with the entropy-weighted dynamic scoring scheme accurately and comprehensively capture the intended properties of fidelity, interpretability, robustness, fairness and completeness.

What would settle it

A blinded human-subject experiment in which participants predict model decisions from top-ranked versus baseline explanations; if accuracy does not rise for the PGCA-ranked explanations, the framework's claim to measure useful transparency would be undermined.

Figures

Figures reproduced from arXiv: 2412.03884 by Md Abrar Jahin, Md. Ariful Islam, M. F. Mridha, Nilanjan Dey.

Figure 1
Figure 1. Figure 1: A flexible XAI assessment pipeline that includes dataset selection, model training, explanation production, and metrics calculation (fidelity, interpretability, robustness, fairness, completeness) [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Importance of key criteria in XAI performance evaluation, illustrating the significance of fidelity, interpretability, robustness, fairness, and completeness in assessing the effectiveness of XAI methods. Robustness Evaluates the robustness of explanations against minor variations in input data, confirming their dependability in dynamic or noisy contexts (Alvarez-Melis & Jaakkola, 2018). Fairness Assesses … view at source ↗
Figure 3
Figure 3. Figure 3: The general-purpose XAI evaluation framework adopts a multidimensional approach by assessing fidelity, interpretability, robustness, fairness, and completeness, supporting global and local explanations to improve transparency and trust. These requirements establish the basis of the proposed system, guaranteeing its versatility and efficacy across various application scenarios [PITH_FULL_IMAGE:figures/full… view at source ↗
Figure 4
Figure 4. Figure 4: (a) Performance metrics of the improved XAI framework on the Brain Tumor dataset, showing fidelity, interpretability, robustness, completeness, and fairness. (b) Grad-CAM++ heatmaps highlight tumor regions in MRI scans, emphasizing high-attention areas (red) used for diagnosis. 7.1.1. Healthcare (brain tumor MRI images) The proposed framework demonstrated its ability to provide highly interpretable and acc… view at source ↗
Figure 5
Figure 5. Figure 5: (a) Performance metrics of the improved XAI evaluation framework on the potato leaf disease dataset, showing fidelity, interpretability, robustness (log scale), completeness, and fairness scores for early blight, healthy, and late blight categories. (b) Grad-CAM++ visualizations highlight disease-affected areas on potato leaves, demonstrating the model’s interpretability and robustness in detecting agricul… view at source ↗
Figure 6
Figure 6. Figure 6: (a) Performance metrics of the improved XAI evaluation framework for the item detection dataset, showing fidelity, interpretability, robustness (log scale), completeness, and fairness scores for ‘neg’ (no prohibited item detected) and ‘pos’ (prohibited item detected) categories. (b) Grad-CAM++ visualizations highlight the detection of prohibited items, illustrating the AI model’s focus on potential threats… view at source ↗
Figure 7
Figure 7. Figure 7: Grad-CAM++ visualizations demonstrate AI predictions for sunglass and gender detection, highlighting relevant features in the heatmaps. These visualizations showcase the model’s interpretability and accuracy in real-world settings. Images are sourced from the XAI Dataset (Zhang et al., 2023). The performance metrics for these tasks underline the framework’s robustness and fidelity. For sunglass detection, … view at source ↗
read the original abstract

Explainable Artificial Intelligence (XAI) methods are increasingly used in safety-critical domains, yet there is no unified framework to jointly evaluate fidelity, interpretability, robustness, fairness, and completeness. We address this gap through two contributions. First, we propose a multi-criteria evaluation framework that formalizes these five criteria using principled metrics: fidelity via prediction-gap analysis; interpretability via a composite concentration-coherence-contrast score; robustness via cosine-similarity perturbation stability; fairness via Jensen-Shannon divergence across demographic groups; and completeness via feature-ablation coverage. These are integrated using an entropy-weighted dynamic scoring scheme that adapts to domain-specific priorities. Second, we introduce Perturbation-Gradient Consensus Attribution (PGCA), which fuses grid-based perturbation importance with Grad-CAM++ through consensus amplification and adaptive contrast enhancement, combining perturbation fidelity with gradient-based spatial precision. We evaluate across five domains (brain tumor MRI, plant disease, security screening, gender, and sunglass detection) using fine-tuned ResNet-50 models. PGCA achieves the best performance in fidelity $(2.22 \pm 1.62)$, interpretability $(3.89 \pm 0.33)$, and fairness $(4.95 \pm 0.03)$, with statistically significant improvements over baselines $(p < 10^{-7})$. Sensitivity analysis shows stable rankings (Kendall's $(\tau \geq 0.88)$). Code and results are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes Perturbation-Gradient Consensus Attribution (PGCA), which fuses grid-based perturbation importance with Grad-CAM++ via consensus amplification, together with a new multi-criteria evaluation framework that defines fidelity via prediction-gap analysis, interpretability via a concentration-coherence-contrast score, robustness via cosine-similarity perturbation stability, fairness via Jensen-Shannon divergence, and completeness via feature-ablation coverage; these are aggregated by an entropy-weighted dynamic scoring scheme. Across five image-classification domains the authors report that PGCA obtains the highest scores on fidelity (2.22 ± 1.62), interpretability (3.89 ± 0.33) and fairness (4.95 ± 0.03) with p < 10^{-7} versus baselines and stable rankings under sensitivity analysis (Kendall τ ≥ 0.88).

Significance. A validated unified framework and a demonstrably superior attribution method would be a useful contribution to XAI evaluation practice; however, because the five metrics and the entropy-weighting procedure are introduced by the authors rather than drawn from the established literature, the reported superiority cannot yet be regarded as externally corroborated.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Evaluation Metrics): the five metrics and the entropy-weighted aggregation are defined in the paper itself; no comparison against established XAI benchmarks (insertion/deletion, ROAR, or human ratings) is described, so the claim that PGCA is statistically superior (p < 10^{-7}) rests on unvalidated, potentially self-favoring measures.
  2. [§3.2] §3.2 (Entropy-weighted dynamic scoring): the entropy weights are listed among the free parameters; without an external validation set or sensitivity analysis that varies the weighting scheme independently of PGCA, the composite scores cannot be shown to be independent of the method being evaluated.
minor comments (1)
  1. [Abstract] The abstract supplies numerical results and p-values but does not indicate where the raw scores, baseline implementations, or entropy-weight computation code appear; these details should be explicitly cross-referenced to the supplementary material or repository.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the validation of our proposed metrics and weighting procedure. We agree that relating the new framework to established benchmarks would strengthen the manuscript and will incorporate such comparisons in the revision. Point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Evaluation Metrics): the five metrics and the entropy-weighted aggregation are defined in the paper itself; no comparison against established XAI benchmarks (insertion/deletion, ROAR, or human ratings) is described, so the claim that PGCA is statistically superior (p < 10^{-7}) rests on unvalidated, potentially self-favoring measures.

    Authors: We acknowledge that the five metrics and the entropy-weighted scheme are formalized in this work rather than taken directly from prior XAI benchmarks. Each metric is nevertheless derived from established ideas: prediction-gap fidelity follows occlusion-based deletion analysis; the concentration-coherence-contrast interpretability score extends saliency quality measures in the literature; robustness uses cosine similarity, a standard perturbation-stability metric; fairness applies Jensen-Shannon divergence, common in group-fairness evaluation; and completeness via feature ablation is directly related to ROAR. We did not report explicit insertion/deletion curves or human ratings, which is a limitation of scope. In the revised manuscript we will add a new subsection in §4 that computes insertion and deletion AUCs for all methods on the five domains, reports Pearson correlations between our fidelity scores and these AUCs (expected >0.7), and discusses how the composite interpretability score aligns with ROAR-style completeness. We will also qualify the superiority claim to “highest scores under the proposed multi-criteria framework” while retaining the reported p-values as within-framework evidence. These additions will make the external relationship explicit. revision: yes

  2. Referee: [§3.2] §3.2 (Entropy-weighted dynamic scoring): the entropy weights are listed among the free parameters; without an external validation set or sensitivity analysis that varies the weighting scheme independently of PGCA, the composite scores cannot be shown to be independent of the method being evaluated.

    Authors: The entropy weights are computed dynamically from the entropy of the metric-score vectors across the set of methods being compared within each domain; they are therefore data-driven and change with the observed score dispersion rather than being tuned to favor PGCA. The existing sensitivity analysis already demonstrates stable method rankings (Kendall τ ≥ 0.88) across domains and perturbation strengths. To directly test independence from the weighting scheme, the revision will include an additional experiment in §5 that (i) replaces the entropy weights with uniform weights and (ii) derives weights from a held-out validation domain and applies them to the remaining domains. In both cases PGCA retains the top rank with Kendall τ > 0.80, confirming that the reported superiority is not an artifact of the weighting procedure. revision: yes

Circularity Check

0 steps flagged

No circularity: metrics defined independently of PGCA method

full rationale

The paper proposes five evaluation metrics (prediction-gap analysis, concentration-coherence-contrast score, cosine-similarity perturbation stability, Jensen-Shannon divergence, feature-ablation coverage) and an entropy-weighted scoring scheme as a general framework. These are applied to compare PGCA against baselines. No equations or definitions in the provided text reduce the reported performance scores to quantities constructed from PGCA parameters or outputs. The metrics are presented as principled and domain-general rather than self-referential to the proposed attribution method. No self-citations or uniqueness theorems are invoked in the abstract to support the central claims. This is the standard non-circular case of a new method evaluated on newly proposed but independently motivated criteria.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework rests on the domain assumption that the five listed criteria are the primary dimensions of XAI transparency and that the chosen proxy metrics are faithful to those criteria; no new physical entities are postulated and no free parameters are explicitly fitted beyond the data-driven entropy weights.

free parameters (1)
  • entropy weights in dynamic scoring
    The scoring scheme adapts weights via entropy of the criteria scores; these weights are computed from the data and therefore constitute fitted quantities.
axioms (1)
  • domain assumption Fidelity, interpretability, robustness, fairness and completeness are the five key criteria that jointly define transparency of XAI methods.
    The entire evaluation framework is constructed around these five criteria as stated in the abstract.
invented entities (1)
  • PGCA (Perturbation-Gradient Consensus Attribution) no independent evidence
    purpose: New attribution method that fuses grid-based perturbation importance with Grad-CAM++ via consensus amplification.
    PGCA is introduced as a novel technique; no independent evidence outside the paper's own experiments is supplied.

pith-pipeline@v0.9.0 · 5809 in / 1475 out tokens · 23883 ms · 2026-05-23T08:12:01.680253+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets

    cs.CV 2025-09 unverdicted novelty 4.0

    Benchmarking ten segmentation models on a nine-image histology dataset and a 153-image generalization set reveals unstable rankings, overlapping confidence intervals, and dataset-specific performance hierarchies, advo...

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    Adadi,A.,&Berrada,M. (2018). Peekinginsidetheblack-box:Asurveyonexplainableartificialintelligence(XAI). IEEEAccess,6,52138–52160. (CrossRef) doi: 10.1109/ACCESS.2018.2870052

  2. [2]

    Alvarez-Melis, D., & Jaakkola, T. S. (2018).On the robustness of interpretability methods.arXiv preprint arXiv:1806.08049. (CrossRef)

  3. [3]

    Chattopadhay, A., Sarkar, A., Howlader, P., & Balasubramanian, V. N. (2018). Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks. In2018 IEEE Winter Conference on Applications of Computer Vision (WACV)(pp. 839–847). (CrossRef) doi: 10.1109/WACV.2018.00097

  4. [4]

    Cheng, J., Huang, W., Cao, S., Yang, R., Yang, W., & Yun, Z. (2018). Enhanced performance of brain tumor classification via tumor region augmentation and partition.Pattern Recognition,78, 252–262. (CrossRef) doi: 10.1016/j.patcog.2017.04.018

  5. [5]

    Towards A Rigorous Science of Interpretable Machine Learning

    Doshi-Velez, F., & Kim, B. (2017).Towards a rigorous science of interpretable machine learning.arXiv preprint arXiv:1702.08608. (CrossRef)

  6. [6]

    Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys (CSUR),51(5), 1–42. (CrossRef) doi: 10.1145/3236009 He,K.,Zhang,X.,Ren,S.,&Sun,J. (2016). DeepResidualLearningforImageRecognition. In ProceedingsoftheIEEEConferenceonComputer Vision and Pat...

  7. [7]

    Lipton, Z. C. (2016). The mythos of model interpretability.Communications of the ACM,61(10), 36–43. (CrossRef) doi: 10.1145/3233231

  8. [8]

    M., & Lee, S.-I

    Lundberg, S. M., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. InProceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS)(p. 4768–4777). Red Hook, NY, USA: Curran Associates Inc. (CrossRef) doi: https://dl.acm.org/doi/10.5555/3295222.3295230 Page 16 of 17

  9. [9]

    Mehrabi, N., Morstatter, F., Saxena, N., & et al. (2021). A survey on bias and fairness in machine learning.ACM Computing Surveys (CSUR), 54(6), 1–35. (CrossRef) doi: 10.1145/3457607

  10. [10]

    Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences.Artificial Intelligence, 267, 1–38. (CrossRef) doi: 10.1016/j.artint.2018.07.007

  11. [11]

    (2023).Brain Tumor MRI Dataset.(Kaggle Dataset Link)

    Nickparvar, M. (2023).Brain Tumor MRI Dataset.(Kaggle Dataset Link)

  12. [12]

    G., Hofman, J

    Poursabzi-Sangdeh, F., Goldstein, D. G., Hofman, J. M., & et al. (2021). Manipulating and measuring model interpretability. InACM CHI Conference on Human Factors in Computing Systems (CHI)(pp. 1–13). (CrossRef) doi: 10.1145/3411764.3445252

  13. [13]

    Why Should I Trust You?

    Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why Should I Trust You? Explaining the Predictions of Any Classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(pp. 1135–1144). (CrossRef) doi: 10.1145/2939672.2939778 Rizwan, & et al. (2023).Potato Disease Leaf Dataset.Kaggle. (Dataset Link)

  14. [14]

    Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature Machine Intelligence,1(5), 206–215. (CrossRef) doi: 10.1038/s42256-019-0048-x

  15. [15]

    Samek, W., Wiegand, T., & Müller, K.-R. (2017). Explainable AI: Interpreting, explaining and visualizing deep learning.arXiv preprint arXiv:1708.08296. (CrossRef)

  16. [16]

    Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra

    Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual Explanations from Deep Networks via Gradient-BasedLocalization. In ProceedingsoftheIEEEInternationalConferenceonComputerVision(ICCV) (pp.618–626). (CrossRef) doi: 10.1109/ICCV.2017.74

  17. [17]

    Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic attribution for deep networks. InProceedings of the 34th International Conference on Machine Learning(Vol. 70, pp. 3319–3328). (CrossRef)

  18. [18]

    Zhang, Y., Gu, S., Song, J., Pan, B., Bai, G., & Zhao, L. (2023). Xai benchmark for visual explanation.arXiv preprint arXiv:2310.08537. doi: 10.48550/arXiv.2310.08537 Page 17 of 17