The (Un)reliability of saliency methods

Been Kim; Dumitru Erhan; Julius Adebayo; Kristof T. Sch\"utt; Maximilian Alber; Pieter-Jan Kindermans; Sara Hooker; Sven D\"ahne

arxiv: 1711.00867 · v1 · pith:4SHWFOBBnew · submitted 2017-11-02 · 📊 stat.ML · cs.LG

The (Un)reliability of saliency methods

Pieter-Jan Kindermans , Sara Hooker , Julius Adebayo , Maximilian Alber , Kristof T. Sch\"utt , Sven D\"ahne , Dumitru Erhan , Been Kim This is my paper

classification 📊 stat.ML cs.LG

keywords methodsinputsaliencymodelreliabilityinvarianceaddingattribute

0 comments

read the original abstract

Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step ---adding a constant shift to the input data--- to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute. In order to guarantee reliability, we posit that methods should fulfill input invariance, the requirement that a saliency method mirror the sensitivity of the model with respect to transformations of the input. We show, through several examples, that saliency methods that do not satisfy input invariance result in misleading attribution.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Interpretability Beyond Classification Output: Semantic Bottleneck Networks
cs.CV 2019-07 unverdicted novelty 6.0

Semantic Bottleneck Networks add interpretable semantic concept layers to deep networks, recovering SOTA segmentation performance with drastic channel reduction and enabling failure interpretation at over 99% accuracy...