Recognition: 2 theorem links
· Lean TheoremFeature Attribution Stability Suite: How Stable Are Post-Hoc Attributions?
Pith reviewed 2026-05-13 21:25 UTC · model grok-4.3
The pith
Feature attribution methods produce inconsistent explanations under geometric image changes even when the model prediction stays the same.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FASS shows that attribution stability depends critically on perturbation family and on conditioning evaluations to preserve the original prediction. Geometric perturbations expose substantially greater attribution instability than photometric changes. Among the four methods tested, Grad-CAM achieves the highest stability across all three datasets and multiple architectures.
What carries the argument
The FASS benchmark, which enforces prediction-invariance filtering before scoring attribution stability via structural similarity, rank correlation, and top-k Jaccard overlap.
If this is right
- Geometric perturbations should be included in any robustness assessment of vision explanations.
- Grad-CAM shows more consistent attributions than Integrated Gradients, GradientSHAP, or LIME under the tested conditions.
- Stability numbers drop sharply when evaluations are not restricted to prediction-preserving cases.
- Single-scalar stability scores miss important differences captured by the three-metric decomposition.
Where Pith is reading between the lines
- Safety-critical systems using these attributions may require additional checks such as cross-method agreement before acting on an explanation.
- The filtering approach could be adapted to other modalities like text or audio to test whether similar instability patterns appear.
- Adding domain-specific perturbations such as sensor noise or weather effects would make the suite closer to actual deployment conditions.
Load-bearing premise
The chosen geometric, photometric, and compression perturbations adequately represent the input variations that occur in safety-critical vision deployments.
What would settle it
Re-running the same protocol on a fresh collection of real camera-captured images or on perturbations outside the original families and checking whether Grad-CAM still ranks highest in stability.
Figures
read the original abstract
Post-hoc feature attribution methods are widely deployed in safety-critical vision systems, yet their stability under realistic input perturbations remains poorly characterized. Existing metrics evaluate explanations primarily under additive noise, collapse stability to a single scalar, and fail to condition on prediction preservation, conflating explanation fragility with model sensitivity. We introduce the Feature Attribution Stability Suite (FASS), a benchmark that enforces prediction-invariance filtering, decomposes stability into three complementary metrics: structural similarity, rank correlation, and top-k Jaccard overlap-and evaluates across geometric, photometric, and compression perturbations. Evaluating four attribution methods (Integrated Gradients, GradientSHAP, Grad-CAM, LIME) across four architectures and three datasets-ImageNet-1K, MS COCO, and CIFAR-10, FASS shows that stability estimates depend critically on perturbation family and prediction-invariance filtering. Geometric perturbations expose substantially greater attribution instability than photometric changes, and without conditioning on prediction preservation, up to 99% of evaluated pairs involve changed predictions. Under this controlled evaluation, we observe consistent method-level trends, with Grad-CAM achieving the highest stability across datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Feature Attribution Stability Suite (FASS), a benchmark for post-hoc attribution methods in computer vision. It enforces prediction-invariance filtering and uses three metrics—structural similarity, rank correlation, and top-k Jaccard overlap—to evaluate stability under geometric, photometric, and compression perturbations. The evaluation covers Integrated Gradients, GradientSHAP, Grad-CAM, and LIME across four architectures on ImageNet-1K, MS COCO, and CIFAR-10. The paper claims that stability depends on perturbation family and filtering, with geometric perturbations causing more instability, up to 99% prediction changes without filtering, and Grad-CAM showing the highest stability.
Significance. This benchmark addresses a gap in evaluating attribution stability by separating it from model sensitivity through invariance filtering. If the findings hold, they provide actionable insights for choosing attribution methods in safety-critical systems and underscore the limitations of unfiltered evaluations. The cross-dataset consistency of method rankings adds credibility to the recommendation of Grad-CAM for stable attributions.
major comments (2)
- The headline result that 'without conditioning on prediction preservation, up to 99% of evaluated pairs involve changed predictions' is central to arguing for the filtering step; however, the precise criterion for 'changed predictions' (e.g., whether it is top-1 class flip or a probability drop below a threshold) and the exact filtering implementation are not detailed enough to verify this percentage or assess its sensitivity to hyperparameters.
- The claim that geometric perturbations expose substantially greater attribution instability than photometric changes relies on the selected transforms being representative of realistic input variations. The manuscript should specify the exact parameter ranges (e.g., rotation angles, translation pixels, compression quality levels) and provide a justification or ablation showing why these families adequately sample the distribution of variations in safety-critical deployments, as disproportionate prediction changes in geometric cases (signaled by the 99% figure) could bias the filtered subset.
minor comments (2)
- The abstract states evaluation across four architectures but does not name them; listing the specific models (e.g., ResNet, ViT) in the main text would improve clarity.
- The three similarity metrics are introduced without explicit formulas or references to their standard definitions; adding equations for structural similarity (e.g., SSIM) and top-k Jaccard would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback on our manuscript. We address each of the major comments below, providing clarifications and committing to revisions where appropriate to improve the clarity and rigor of the paper.
read point-by-point responses
-
Referee: The headline result that 'without conditioning on prediction preservation, up to 99% of evaluated pairs involve changed predictions' is central to arguing for the filtering step; however, the precise criterion for 'changed predictions' (e.g., whether it is top-1 class flip or a probability drop below a threshold) and the exact filtering implementation are not detailed enough to verify this percentage or assess its sensitivity to hyperparameters.
Authors: We agree that the precise definition of 'changed predictions' requires more detail for reproducibility. In our implementation, a prediction is deemed changed if the argmax (top-1 class) differs between the original and perturbed input. The filtering step retains only those perturbation pairs where the top-1 prediction is preserved. We will revise the manuscript to include a clear description of this criterion, along with pseudocode for the filtering process and an analysis of sensitivity to alternative definitions such as top-5 agreement or probability thresholds. revision: yes
-
Referee: The claim that geometric perturbations expose substantially greater attribution instability than photometric changes relies on the selected transforms being representative of realistic input variations. The manuscript should specify the exact parameter ranges (e.g., rotation angles, translation pixels, compression quality levels) and provide a justification or ablation showing why these families adequately sample the distribution of variations in safety-critical deployments, as disproportionate prediction changes in geometric cases (signaled by the 99% figure) could bias the filtered subset.
Authors: We acknowledge the importance of specifying the perturbation parameters and justifying their choice. The revised manuscript will detail the exact ranges: geometric perturbations consist of rotations uniformly sampled from [-15°, 15°], translations up to 10% of image dimensions, and scaling factors from 0.9 to 1.1; photometric include brightness and contrast adjustments within ±0.2; compression uses JPEG quality from 50 to 95. These ranges are motivated by standard data augmentation practices in computer vision robustness benchmarks (e.g., ImageNet-C). We will add an ablation study examining how varying these ranges affects the percentage of prediction changes and stability metrics, to address potential bias in the filtered subset and better support applicability to safety-critical settings. revision: yes
Circularity Check
No circularity: empirical benchmark with independent measurements
full rationale
The paper introduces the FASS benchmark, explicitly defines its three metrics (structural similarity, rank correlation, top-k Jaccard) and perturbation families as design choices, applies prediction-invariance filtering as a stated protocol, and reports measured stability values across methods and datasets. No central claim reduces by the paper's own equations or self-citations to a quantity fitted inside the study; the observed trends (Grad-CAM highest stability, geometric perturbations showing greater instability) are direct empirical outputs rather than presupposed by the evaluation setup itself. The work is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The selected geometric, photometric, and compression perturbations represent realistic input variations in safety-critical vision systems.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
FASS enforces prediction-invariance filtering... decomposes stability into three complementary metrics: structural similarity, rank correlation, and top-k Jaccard overlap... across geometric, photometric, and compression perturbations.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
stability estimates depend critically on perturbation family and prediction-invariance filtering. Geometric perturbations expose substantially greater attribution instability than photometric changes
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Sanity checks for saliency maps
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Good- fellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. InAdvances in Neural Information Process- ing Systems (NeurIPS), 2018. 2
work page 2018
-
[2]
OpenXAI: Towards a transparent evaluation of model explanations
Chirag Agarwal, Eshika Saxena, Satyapriya Krishna, Mar- tin Pawelczyk, Nari Johnson, Isha Pishi, Marber Aber, and Himabindu Lakkaraju. OpenXAI: Towards a transparent evaluation of model explanations. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 1, 2, 3
work page 2022
-
[3]
On the Robustness of Interpretability Methods
David Alvarez-Melis and Tommi S Jaakkola. On the robustness of interpretability methods.arXiv preprint arXiv:1806.08049, 2018. 1, 2, 3
work page Pith review arXiv 2018
-
[4]
Nishanth Arun, Nathan Gishi, Skylar Fober, Kenneth Hajek, Liam Vaickus, Oscar Salas, and Lorenzo Torresani. Assess- ing the trustworthiness of saliency maps for localizing ab- normalities in medical imaging.Radiology: Artificial Intel- ligence, 3(6):e200267, 2021. 2
work page 2021
-
[5]
Shahin Atakishiyev, Mohammad Salameh, Hengshuai Yao, and Randy Goebel. Explainable artificial intelligence for autonomous driving: A comprehensive overview and field guide for future research directions.IEEE Access, 12:6702– 6739, 2024. 1
work page 2024
-
[6]
ImageNet: A large-scale hierarchical im- age database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical im- age database. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009. 5
work page 2009
-
[7]
Explanations can be manipulated and geometry is behind it
Ann-Kathrin Dombrowski, Maximilian Alber, Christopher Anders, Marcel Ackermann, Klaus-Robert M ¨uller, and Pan Kessel. Explanations can be manipulated and geometry is behind it. InAdvances in Neural Information Processing Systems (NeurIPS), 2019. 1, 2, 3
work page 2019
-
[8]
An image is worth 16x16 words: Trans- formers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale. InInternational Con- ference on Learning Representations (ICLR), 2021. 5
work page 2021
-
[9]
Inter- pretation of neural networks is fragile
Amirata Ghorbani, Abubakar Abid, and James Zou. Inter- pretation of neural networks is fragile. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3681– 3688, 2019. 1, 2, 3
work page 2019
-
[10]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. 5
work page 2016
-
[11]
Anna Hedstr ¨om, Leander Weber, Daniel Krakowczyk, Dil- yara Bareeva, Franz Motzkus, Wojciech Samek, Sebastian Lapuschkin, and Marina M-C H¨ohne. Quantus: An explain- able AI toolkit for responsible evaluation of neural network explanations and beyond.Journal of Machine Learning Re- search, 24(34):1–11, 2023. 1, 2, 3
work page 2023
-
[12]
Densely connected convolutional net- works
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kil- ian Q Weinberger. Densely connected convolutional net- works. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4700–4708,
-
[13]
Guided integrated gradients: An adaptive path method for removing noise
Andrei Kapishnikov, Subhashini Venugopalan, Besim Avber, Geoffrey Hinton, Fernanda Viegas, and Mukund Kudlur. Guided integrated gradients: An adaptive path method for removing noise. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5050–5058, 2021. 2
work page 2021
-
[14]
The (un)reliability of saliency methods
Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Max- imilian Alber, Kristof T Sch ¨utt, Sven D ¨ahne, Dumitru Er- han, and Been Kim. The (un)reliability of saliency methods. InExplainable AI: Interpreting, Explaining and Visualizing Deep Learning, pages 267–280. Springer, 2019. 1, 2, 3
work page 2019
-
[15]
LATEC: A large-scale benchmark for evaluating explainability methods on complex AI sys- tems
Linus Klein, Tobias Wartmann, Ren ´e Schallner, Stephan Scheele, and Rafet Sifa. LATEC: A large-scale benchmark for evaluating explainability methods on complex AI sys- tems. InProceedings of the 1st Workshop on Evaluating Trustworthiness of AI (EvalTAI), European Conference on Artificial Intelligence (ECAI), 2024. 1, 2, 3
work page 2024
-
[16]
Captum: A unified and generic model inter- pretability library for PyTorch,
Narine Kokhlikyan, Vivek Miglani, Miguel Martin, Edward Wang, Bilal Alsallakh, Jonathan Reynolds, Alexander Mel- nikov, Natalia Kliber, Carlos Fan, Pavlo Molchanov, et al. Captum: A unified and generic model interpretability library for PyTorch.arXiv preprint arXiv:2009.07896, 2020. 4, 5
-
[17]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 5
work page 2009
-
[18]
Sandeep Kumar et al. Brain tumor detection using deep learning and explainable AI.Computers in Biology and Medicine, 170:108035, 2024. 1
work page 2024
-
[19]
Xuhong Li, Haoyi Xiong, Xingjian Li, Xiao Wu, Xiao Zhang, Ji Liu, Jiang Bian, and Jun Huan. M4: A unified XAI benchmark for faithful evaluation of feature attribution methods across metrics, models and tasks.arXiv preprint arXiv:2310.19067, 2023. 2
-
[20]
Microsoft COCO: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft COCO: Common objects in context. In European Conference on Computer Vision (ECCV), pages 740–755. Springer, 2014. 5
work page 2014
-
[21]
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A ConvNet for the 2020s. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 11976– 11986, 2022. 5
work page 2022
-
[22]
A unified approach to interpreting model predictions
Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InAdvances in Neural Infor- mation Processing Systems (NeurIPS), 2017. 1, 2, 5
work page 2017
-
[23]
Investigating saturation effects in integrated gra- dients.arXiv preprint arXiv:2010.12697, 2020
Vivek Miglani, Narine Kober, Hana Morgenstern, and Dan- ish Pruthi. Investigating saturation effects in integrated gra- dients.arXiv preprint arXiv:2010.12697, 2020. 2
-
[24]
PyTorch: An imperative style, high-performance deep learning li- brary
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zem- ing Lin, Natalia Gimelshein, Luca Antiga, et al. PyTorch: An imperative style, high-performance deep learning li- brary. InAdvances in Neural Information Processing Sys- tems (NeurIPS), 2019. 5
work page 2019
-
[25]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why should I trust you?”: Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144, 2016. 1, 2, 5
work page 2016
-
[26]
Adriel Saporta, Xiaotong Gui, Ashwin Agrawal, Anuj Pa- reek, Steven QH Truong, Chanh DT Nguyen, Van-Doan Ngo, Jayne Seekins, Francis G Blankenberg, Andrew Y Ng, Matthew P Lungren, and Pranav Rajpurkar. Benchmarking saliency methods for chest X-ray interpretation.Nature Ma- chine Intelligence, 4(10):867–878, 2022. 2
work page 2022
-
[27]
Grad-CAM: Visual explanations from deep networks via gradient-based localization
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE In- ternational Conference on Computer Vision (ICCV), pages 618–626, 2017. 1, 2, 5
work page 2017
-
[28]
Charles Spearman. The proof and measurement of associa- tion between two things.The American Journal of Psychol- ogy, 15(1):72–101, 1904. 2
work page 1904
-
[29]
Visualiz- ing the impact of feature attribution baselines.Distill, 5(1): e22, 2020
Patrick Sturmfels, Scott Lundberg, and Su-In Lee. Visualiz- ing the impact of feature attribution baselines.Distill, 5(1): e22, 2020. 2
work page 2020
-
[30]
Axiomatic attribution for deep networks
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InProceedings of the 34th In- ternational Conference on Machine Learning (ICML), pages 3319–3328, 2017. 2, 5
work page 2017
-
[31]
Quick shift and kernel methods for mode seeking.Computer Vision–ECCV 2008, pages 705–718, 2008
Andrea Vedaldi and Stefano Soatto. Quick shift and kernel methods for mode seeking.Computer Vision–ECCV 2008, pages 705–718, 2008. 5
work page 2008
-
[32]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Image Process- ing, 13(4):600–612, 2004. 2, 3, 4
work page 2004
-
[33]
On the (in)fidelity and sen- sitivity of explanations
Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Suggala, David I In- ber, and Pradeep K Ravikumar. On the (in)fidelity and sen- sitivity of explanations. InAdvances in Neural Information Processing Systems (NeurIPS), 2019. 1, 2, 3
work page 2019
-
[34]
Muhammad Rehman Zafar and Naimul Mefraz Khan. DLIME: A deterministic local interpretable model-agnostic explanations approach for computer-aided diagnosis sys- tems. InACM SIGKDD Workshop on Explainable AI/ML (XAI), 2019. 2
work page 2019
-
[35]
S-LIME: Stabilized-LIME for model explanation
Zhengze Zhou, Giles Hooker, and Fei Wang. S-LIME: Stabilized-LIME for model explanation. InProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2429–2438, 2021. 2 Feature Attribution Stability Suite: How Stable Are Post-Hoc Attributions? Supplementary Material Appendix In the Appendix, we provide the followi...
work page 2021
-
[36]
The proof and measurement of associa- tion between two things,
C. Spearman, “The proof and measurement of associa- tion between two things,”The American Journal of Psy- chology, vol. 15, no. 1, pp. 72–101, 1904
work page 1904
-
[37]
Image quality assessment: From error visibility to struc- tural similarity,
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to struc- tural similarity,”IEEE Trans. Image Processing, vol. 13, no. 4, pp. 600–612, 2004
work page 2004
-
[38]
The distribution of the flora in the alpine zone,
P. Jaccard, “The distribution of the flora in the alpine zone,”New Phytologist, vol. 11, no. 2, pp. 37–50, 1912
work page 1912
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.