SmoothGrad: removing noise by adding noise
read the original abstract
Explaining the output of a deep network remains a challenge. In the case of an image classifier, one type of explanation is to identify pixels that strongly influence the final decision. A starting point for this strategy is the gradient of the class score function with respect to the input image. This gradient can be interpreted as a sensitivity map, and there are several techniques that elaborate on this basic idea. This paper makes two contributions: it introduces SmoothGrad, a simple method that can help visually sharpen gradient-based sensitivity maps, and it discusses lessons in the visualization of these maps. We publish the code for our experiments and a website with our results.
This paper has not been read by Pith yet.
Forward citations
Cited by 37 Pith papers
-
Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution
Spectral Integrated Gradients constructs SVD-based integration paths that activate singular components from largest to smallest, producing cleaner attribution maps and better quantitative scores than standard Integrat...
-
AIM: Adversarial Information Masking for Faithfulness Evaluation of Saliency Maps
AIM is a new saliency-guided adversarial feature replacement method to evaluate faithfulness of saliency maps and reliability of masking operators on image, audio, and EEG tasks.
-
AGOP as Explanation: From Feature Learning to Per-Sample Attribution in Image Classifiers
AGOP-based attribution methods outperform Integrated Gradients and other baselines on pixel-level ground truth benchmarks for explaining image classifier decisions, with AGOP-Global offering zero inference cost.
-
Attributions All the Way Down? The Metagame of Interpretability
Defines meta-attributions as directional second-order Shapley values on attribution methods, proves hierarchical decomposition of attributions, and demonstrates applications in language models, vision-language encoder...
-
From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models
An iERF-centric framework unifies local, global, and mechanistic interpretability in vision models via SRD for saliency, CAFE for concept anchoring, and ICAT for interlayer attribution.
-
Low Rank Adaptation for Adversarial Perturbation
Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.
-
Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning
A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.
-
The Linear Centroids Hypothesis: Features as Directions Learned by Local Experts
The Linear Centroids Hypothesis reframes network features as directions in centroid spaces of local affine experts, unifying interpretability methods and yielding sparser, more faithful dictionaries, circuits, and sal...
-
From Baselines to Transport Geodesics: Axiomatic Attribution via Optimal Generative Flows
Transport-geodesic attribution via optimal generative flows selects principled paths for feature attributions by minimizing kinetic action.
-
MobileMold: A Smartphone-Based Microscopy Dataset for Food Mold Detection
MobileMold provides 4941 smartphone microscopy images and shows deep learning models reach 99.5% accuracy on mold detection and food classification tasks.
-
Learning-Augmented Robust Algorithmic Recourse
Introduces learning-augmented robust algorithmic recourse that trades off consistency with accurate future-model predictions against robustness to inaccurate predictions via a novel algorithm.
-
CPC-VAR:Continual Personalized and Compositional Generation in Visual Autoregressive Models
CPC-VAR adds Gradient-based Concept Neuron Selection for continual single-concept learning and a context-aware multi-branch composition strategy to reduce forgetting and entanglement in VAR-based personalized image ge...
-
Instructions Shape Production of Language, not Processing
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
-
Scaling Vision Models Does Not Consistently Improve Localisation-Based Explanation Quality
Scaling vision models by depth and parameter count does not consistently improve localisation-based explanation quality across architectures, datasets, and post-hoc methods; smaller models often perform comparably or better.
-
H-Sets: Hessian-Guided Discovery of Set-Level Feature Interactions in Image Classifiers
H-Sets detects higher-order feature interactions in image classifiers via Hessian-guided pair merging and attributes them with IDG-Vis to generate more interpretable saliency maps than existing marginal or coarse methods.
-
When Can We Trust Deep Neural Networks? Towards Reliable Industrial Deployment with an Interpretability Guide
A new reliability score computed from the IoU difference between class-specific and class-agnostic heatmaps, boosted by adversarial enhancement, detects false negatives in binary industrial defect detectors with up to...
-
Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks
Token-level contrastive attribution yields informative signals for some LLM benchmark failures but is not universally applicable across datasets and models.
-
The Linear Centroids Hypothesis: Features as Directions Learned by Local Experts
Features in deep networks correspond to linear directions of centroids summarizing local functional behavior, enabling sparser and more effective feature dictionaries via sparse autoencoders applied to centroids rathe...
-
Causal Attribution via Activation Patching
CAAP produces patch attributions in ViTs by direct activation patching on intermediate layers to measure causal contribution to the target class score.
-
MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations
MaskDiME uses adaptive masked diffusion to produce 30x faster, localized, and semantically consistent visual counterfactual explanations without training, matching or exceeding prior performance on five datasets.
-
Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models
CA-LIG is a unified hierarchical attribution method that computes layer-wise Integrated Gradients fused with class-specific attention gradients to generate signed, context-sensitive explanations for transformer models.
-
Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection
LiMA reformulates attribution as submodular subset selection and uses bidirectional greedy search to identify minimal important regions, reporting 36.3% better insertion and 39.6% better deletion scores than prior met...
-
Explaining Object Detectors via Collective Contribution of Pixels
A Shapley-value method with interaction terms that explains object detector decisions by capturing collective pixel contributions for localization and classification.
-
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
-
MONAI: An open-source framework for deep learning in healthcare
MONAI is a community-supported PyTorch framework that extends deep learning to medical data with domain-specific architectures, transforms, and deployment tools.
-
Saliency-driven Word Alignment Interpretation for Neural Machine Translation
Saliency-driven interpretation methods reveal that NMT models learn word alignments of better quality than fast-align under force decoding and consistent with automatic tools under free decoding.
-
ExECG: An Explainable AI Framework for ECG models
ExECG is a Python framework providing Wrapper, Explainer, and Visualizer stages to unify XAI methods for ECG models and improve reproducibility.
-
Instructions Shape Production of Language, not Processing
Instructions primarily shape the production stage of language models rather than the processing stage, with task-specific information and causal effects stronger in output tokens than input tokens.
-
CAMAL: Improving Attention Alignment and Faithfulness with Segmentation Masks
CAMAL adds an auxiliary regularizer during training that aligns model attention with segmentation masks to improve both spatial accuracy and causal faithfulness of attention in deep learning and deep reinforcement lea...
-
Path-Sampled Integrated Gradients
Path-sampled integrated gradients generalizes integrated gradients by averaging gradients over sampled baselines on the linear path, proving equivalence to a weighted version that improves convergence rate to O(m^{-1}...
-
Interpretable and Explainable Surrogate Modeling for Simulations: A State-of-the-Art Survey and Perspectives on Explainable AI for Decision-Making
This survey synthesizes XAI methods with surrogate modeling workflows for simulations and outlines a research agenda to embed explainability into simulation-driven design and decision-making.
-
Event-Level Detection of Surgical Instrument Handovers in Videos with Interpretable Vision Models
A ViT-LSTM spatiotemporal model detects surgical instrument handovers and classifies direction in videos, achieving F1 of 0.84 for detection and 0.72 mean F1 for direction on kidney transplant data.
-
PECKER: A Precisely Efficient Critical Knowledge Erasure Recipe For Machine Unlearning in Diffusion Models
PECKER uses a saliency mask to prioritize parameter updates in distillation-based unlearning, achieving shorter training times for class and concept forgetting on CIFAR-10 and STL-10 while matching prior methods' efficacy.
-
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.
-
Attribution-Guided Pruning for Insight and Control: Circuit Discovery and Targeted Correction in Small-scale LLMs
Attribution-guided pruning with contrastive relevance identifies behavior-specific circuits in small LLMs and removes as little as 0.03-0.3% of components to reduce toxicity or repetition while preserving general performance.
-
ELF: Embedded Localisation of Features in pre-trained CNN
ELF derives keypoint locations via gradients on pre-trained CNN feature maps and reaches repeatability and matchability scores comparable to specialized detectors on HPatches, Webcam, and photo-tourism data.
-
Parameter Space Analysis through Guided Visual Interpolations
ParamInter is a guided visual interpolation tool for high-dimensional parameter space analysis that integrates t-SNE and XAI to support optimization, demonstrated on blast furnace modeling.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.