Recognition: unknown
Axiomatic Attribution for Deep Networks
read the original abstract
We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.
This paper has not been read by Pith yet.
Forward citations
Cited by 10 Pith papers
-
Architecture-Aware Explanation Auditing for Industrial Visual Inspection
Explanation faithfulness for deep classifiers on wafer maps is highest when the explainer matches the model's native readout structure, with ViT-Tiny plus Attention Rollout achieving lower Deletion AUC than mismatched...
-
Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning
A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.
-
Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification
The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray cl...
-
Rhamba: Region-Aware Hybrid Attention-Mamba Framework for Self-Supervised Learning in Resting-State fMRI
Rhamba uses region-aware masking strategies and hybrid Attention-Mamba models pretrained on ABIDE fMRI data to achieve top AUROC on schizophrenia and ADHD classification tasks while outperforming prior methods.
-
Compared to What? Baselines and Metrics for Counterfactual Prompting
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistica...
-
Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution
Cross-modal averaging maps ECG model attributions to CineECG 3D space, raising Dice overlap with expert annotations from 0.47 to 0.56 on 20 cases while filtering attribution noise.
-
TwinSpecNet: Extending APOGEE's chemical reach to low-S/N spectra via empirical paired learning
TwinSpecNet uses empirical paired learning on spectral twins to denoise low-S/N APOGEE spectra and predict stellar parameters and abundances with lower scatter than the standard pipeline.
-
AI-Generated Images: What Humans and Machines See When They Look at the Same Image
Researchers train AI detectors on a large photorealistic fake image dataset, apply 16 XAI methods, and use human survey feedback to assess alignment between machine explanations and human perception of AI-generated images.
-
Rhamba: Region-Aware Hybrid Attention-Mamba Framework for Self-Supervised Learning in Resting-State fMRI
Rhamba is a region-aware hybrid Attention-Mamba framework that uses anatomically guided masking for self-supervised pretraining on ABIDE fMRI data and shows competitive AUROC on downstream schizophrenia and ADHD class...
-
Uncertainty-Aware Transformers: Conformal Prediction for Language Models
CONFIDE applies conformal prediction to transformer embeddings for valid prediction sets, improving accuracy up to 4.09% and efficiency over baselines on models like BERT-tiny.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.