Interpretability Beyond Classification Output: Semantic Bottleneck Networks

Bernt Schiele; Mario Fritz; Max Losch

arxiv: 1907.10882 · v2 · pith:YRDQNCX5new · submitted 2019-07-25 · 💻 cs.CV · cs.LG

Interpretability Beyond Classification Output: Semantic Bottleneck Networks

Max Losch , Mario Fritz , Bernt Schiele This is my paper

Pith reviewed 2026-05-24 16:23 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords semantic bottleneckinterpretabilityscene segmentationfailure analysisconfidence estimationobject partsdeep networksend-to-end training

0 comments

The pith

Semantic Bottleneck Networks force all predictions through a small set of human-interpretable concepts while matching state-of-the-art accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes inserting a Semantic Bottleneck Layer into deep networks for tasks like street scene segmentation. This layer consists of semantic concepts such as object parts and materials, reducing thousands of feature channels to tens. The network is retrained end-to-end around this layer and recovers full performance. Activations in the layer allow direct interpretation of why predictions fail and estimation of output confidence. This makes the basis for each decision transparent without sacrificing results.

Core claim

A deep network can house a Semantic Bottleneck Layer of task-related semantic concepts so that all downstream predictions depend only on those concepts. On street scene segmentation this yields state-of-the-art performance after reducing from thousands to tens of channels. The layer activations support failure case analysis and confidence prediction, producing interpretable segmentation results at over 99 percent accuracy for most predictions.

What carries the argument

The Semantic Bottleneck Layer (SB-Layer), an intermediate layer whose channels correspond to semantic concepts like object parts and materials; every final output must be computed from its activations alone.

If this is right

Failure modes become diagnosable through inspection of which semantic concepts the network activates incorrectly.
Output confidence can be predicted directly from the bottleneck activations.
Segmentation predictions gain interpretability because each result traces back to specific concepts.
High accuracy is maintained despite the drastic reduction in intermediate dimensionality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar bottlenecks could be tested on other dense prediction tasks such as depth estimation if appropriate concepts are identified.
Engineers might use the layer to inject domain knowledge by editing concept activations at inference time.
The method opens a route to auditing networks at the level of individual semantic units rather than raw features.

Load-bearing premise

The hand-chosen semantic concepts must carry all information needed for accurate segmentation so that retraining around the bottleneck introduces no permanent loss.

What would settle it

If no selection of around 50 semantic concepts allows a retrained network to reach within a few percent of the original segmentation accuracy on standard street scene benchmarks, the approach would not hold.

Figures

Figures reproduced from arXiv: 1907.10882 by Bernt Schiele, Mario Fritz, Max Losch.

**Figure 2.** Figure 2: Construction of SBNs. 1. Start off with a well performing model on the target task. 2. Train a function (SB) that maps intermediate representations to semantic concepts. 3. Insert the SB back into the original model and finetune all downstream layers. The power of SBNs lies in the ability to inspect the evidence for the chosen semantic concepts to investigate errors. Such errors could involve the absence … view at source ↗

**Figure 3.** Figure 3: Populating the SB space to find modes of errors. The gray boxes enclosed in the SB indicate [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Sample from Broden+ dataset with annotations for parts (2nd row) and materials (3rd). As discussed, we want to learn relevant semantic representations in our SB with additional supervision. Broden+ [27] is a recent collection of datasets which serves as a starting point of our case study as it contains annotations for a broad range of relevant semantic concepts. It offers thousands of images for objects, p… view at source ↗

**Figure 6.** Figure 6: Segmentation with the SB placed at two different locations in the network results in different [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 5.** Figure 5: Task relevant concepts outperform irrelevant ones. An experiment that we find necessary to conduct as sanity check, is the inspection of whether the relationship between semantic content and classes make sense, whether the feed forward pass from semantically meaningful concepts to the final network output is “semantically lossless”. This can be examined via the newly gained ability to manipulate the SB … view at source ↗

**Figure 7.** Figure 7: Selection of error examples from four different clusters. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Accuracy assessment of the networks predictions with our proposed confidence metric. [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

read the original abstract

Today's deep learning systems deliver high performance based on end-to-end training. While they deliver strong performance, these systems are hard to interpret. To address this issue, we propose Semantic Bottleneck Networks (SBN): deep networks with semantically interpretable intermediate layers that all downstream results are based on. As a consequence, the analysis on what the final prediction is based on is transparent to the engineer and failure cases and modes can be analyzed and avoided by high-level reasoning. We present a case study on street scene segmentation to demonstrate the feasibility and power of SBN. In particular, we start from a well performing classic deep network which we adapt to house a SB-Layer containing task related semantic concepts (such as object-parts and materials). Importantly, we can recover state of the art performance despite a drastic dimensionality reduction from 1000s (non-semantic feature) to 10s (semantic concept) channels. Additionally we show how the activations of the SB-Layer can be used for both the interpretation of failure cases of the network as well as for confidence prediction of the resulting output. For the first time, e.g., we show interpretable segmentation results for most predictions at over 99% accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Semantic Bottleneck Networks insert a semantic layer into segmentation nets with an interesting construction, but the abstract supplies no experiments to back the performance claims.

read the letter

The main thing here is the Semantic Bottleneck Network architecture, which adds an SB-Layer of task-related semantic concepts like object parts and materials into a segmentation model. This forces the downstream predictions to depend on those interpretable channels instead of opaque features, and the paper shows how the layer activations can then be used directly for failure analysis and confidence scoring. The 99% interpretable accuracy claim and the recovery of SOTA after cutting from thousands of channels to tens are the parts that stand out if they hold up in the full text. The construction itself is new relative to the referenced prior work and gives a concrete pattern for building semantic interpretability into the forward pass. What the paper does well is lay out a practical way to make the internal representation legible to an engineer without adding post-hoc tools. The stress-test concern about whether the chosen concepts are complete enough to avoid information loss is fair based on the abstract, since no evidence is given that alternatives were tried or that the set covers every signal the segmentation task needs. The abstract also gives no datasets, baselines, or quantitative tables, which leaves the central empirical claims unsupported and makes it impossible to judge how much end-to-end training actually compensates for the reduction. This is for people working on interpretable vision models, especially in safety settings. A reader who wants to see an architectural approach to semantic bottlenecks could get value from the full paper if the experiments are solid. I would send it for peer review so the methods and results can be checked properly rather than desk-rejecting on the abstract alone.

Referee Report

1 major / 1 minor

Summary. The paper introduces Semantic Bottleneck Networks (SBNs) featuring a Semantic Bottleneck Layer (SB-Layer) with semantically interpretable concepts such as object-parts and materials. Through a case study on street scene segmentation, the authors adapt a standard deep network to include this layer and claim to recover state-of-the-art performance despite reducing the dimensionality from thousands of non-semantic features to tens of semantic concepts. They further demonstrate the use of SB-Layer activations for interpreting network failure cases and for confidence prediction, reporting interpretable segmentation results at over 99% accuracy for most predictions.

Significance. If the reported performance recovery holds, the work offers a practical route to interpretable deep vision models by constraining intermediate representations to human-understandable concepts without apparent loss in task performance. The application to failure mode analysis and confidence estimation adds utility for deployment in safety-critical settings. The choice of street scene segmentation as the case study is appropriate for testing the approach in a complex, real-world domain.

major comments (1)

[Abstract] Abstract: The central claim that the chosen semantic concepts (object-parts and materials) are sufficient to encode all information required by the downstream segmentation task, allowing recovery of SOTA performance after drastic dimensionality reduction, lacks supporting evidence such as ablation studies on concept completeness or information-theoretic analysis; without this, the sufficiency assumption remains untested and the performance claim is at risk.

minor comments (1)

The abstract mentions recovery of 'state of the art performance' and '99% accuracy' without naming the exact metrics (e.g., mIoU), datasets, or comparison baselines; these details should be added for immediate clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and positive assessment of the work's significance. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the chosen semantic concepts (object-parts and materials) are sufficient to encode all information required by the downstream segmentation task, allowing recovery of SOTA performance after drastic dimensionality reduction, lacks supporting evidence such as ablation studies on concept completeness or information-theoretic analysis; without this, the sufficiency assumption remains untested and the performance claim is at risk.

Authors: We agree that the manuscript would benefit from additional explicit evidence supporting the sufficiency of the selected concepts. The reported recovery of state-of-the-art segmentation performance using only tens of semantic channels (versus thousands of non-semantic features) constitutes empirical evidence that the chosen concepts capture the information necessary for the task; this is further supported by the high accuracy of failure-mode interpretation derived directly from the SB-Layer activations. Nevertheless, to strengthen the claim we will revise the manuscript to include (i) an ablation study measuring performance degradation when individual concepts or concept groups are removed and (ii) a brief discussion of the task-specific rationale for concept selection. A formal information-theoretic analysis is not feasible within the scope of this work due to the intractability of estimating mutual information in high-dimensional feature spaces, but the empirical results provide a practical demonstration of sufficiency for the segmentation task. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical case study with no derivations or self-referential fits

full rationale

The paper reports an empirical adaptation of an existing segmentation network by inserting a fixed semantic bottleneck layer whose concepts are chosen by the authors. Performance recovery is demonstrated via end-to-end training and quantitative evaluation on held-out data, not via any equation, prediction, or uniqueness theorem that reduces to the inputs by construction. No load-bearing self-citations, fitted parameters renamed as predictions, or ansatzes smuggled through prior work appear in the provided text. The central claim therefore rests on external experimental outcomes rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The abstract introduces the SB-Layer as a new architectural component; no free parameters, background axioms, or additional invented entities beyond the layer itself are stated.

invented entities (1)

Semantic Bottleneck Layer (SB-Layer) no independent evidence
purpose: To contain task-related semantic concepts as an interpretable intermediate representation
Newly proposed component that all downstream results are based on

pith-pipeline@v0.9.0 · 5742 in / 1225 out tokens · 31059 ms · 2026-05-24T16:23:25.397553+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Investigating Concept Alignment Using Implausible Category Members
cs.AI 2026-05 unverdicted novelty 6.0

AI models misalign with humans on concept boundaries when probed with implausible category members, such as classifying words as vehicles or vegetables as fruit.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · cited by 1 Pith paper · 7 internal anchors

[1]

Contextual explanation networks.arXiv:1705.10301, 2017

Maruan Al-Shedivat, Avinava Dubey, and Eric P Xing. Contextual explanation networks.arXiv:1705.10301, 2017

work page arXiv 2017
[2]

On pixel-wise explanations for non-linear classiﬁer decisions by layer-wise relevance propagation

Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On pixel-wise explanations for non-linear classiﬁer decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015

work page 2015
[3]

Network dissection: Quantify- ing interpretability of deep visual representations

David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network dissection: Quantify- ing interpretability of deep visual representations. In CVPR, 2017

work page 2017
[4]

Surf: Speeded up robust features

Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. In ECCV, 2006

work page 2006
[5]

Opensurfaces: A richly annotated catalog of surface appearance

Sean Bell, Paul Upchurch, Noah Snavely, and Kavita Bala. Opensurfaces: A richly annotated catalog of surface appearance. ACM Transactions on Graphics (TOG), 32(4):111, 2013

work page 2013
[6]

Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI, 40(4):834–848, 2018

work page 2018
[7]

Detect what you can: Detecting and representing objects using holistic models and body parts

Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, and Alan Yuille. Detect what you can: Detecting and representing objects using holistic models and body parts. In CVPR, 2014

work page 2014
[8]

The cityscapes dataset for semantic urban scene understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016

work page 2016
[9]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv:1412.6572, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[10]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016

work page 2016
[11]

Joe H. Ward Jr. Hierarchical grouping to optimize an objective function.Journal of the American Statistical Association, 58(301):236–244, 1963

work page 1963
[12]

Inter- pretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Inter- pretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In ICML, 2018

work page 2018
[13]

The (Un)reliability of saliency methods

Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. The (un) reliability of saliency methods. arXiv:1711.00867, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Adversarial examples in the physical world

Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv:1607.02533, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[15]

Object bank: A high-level image representation for scene classiﬁcation & semantic feature sparsiﬁcation

Li-Jia Li, Hao Su, Li Fei-Fei, and Eric P Xing. Object bank: A high-level image representation for scene classiﬁcation & semantic feature sparsiﬁcation. InNIPS, 2010. 9

work page 2010
[16]

Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions

Oscar Li, Hao Liu, Chaofan Chen, and Cynthia Rudin. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In AAAI, 2018

work page 2018
[17]

The mythos of model interpretability

Zachary C Lipton. The mythos of model interpretability. Queue, 16(3):30, 2018

work page 2018
[18]

Distinctive image features from scale-invariant keypoints

David G Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004

work page 2004
[19]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018

work page 2018
[20]

Towards robust interpretability with self-explaining neural networks

David Alvarez Melis and Tommi Jaakkola. Towards robust interpretability with self-explaining neural networks. In NIPS, 2018

work page 2018
[21]

Grad-cam: Visual explanations from deep networks via gradient-based localization

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra, et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV, pages 618–626, 2017

work page 2017
[22]

Learning important features through propagat- ing activation differences

Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagat- ing activation differences. In ICML, 2017

work page 2017
[23]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classiﬁcation models and saliency maps. arXiv:1312.6034, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[24]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In ICML, 2017

work page 2017
[25]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[26]

Counterfactual explanations without opening the black box: Automated decisions and the gdpr

Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harvard Journal of Law & Technology, 31(2):2018, 2017

work page 2018
[27]

Uniﬁed perceptual parsing for scene understanding

Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, and Jian Sun. Uniﬁed perceptual parsing for scene understanding. In ECCV, 2018

work page 2018
[28]

Understanding Neural Networks Through Deep Visualization

Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. Understanding neural networks through deep visualization. arXiv:1506.06579, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[29]

Visualizing and understanding convolutional networks

Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014

work page 2014
[30]

Pyramid scene parsing network

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In CVPR, 2017

work page 2017
[31]

Object detectors emerge in deep scene cnns

Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Object detectors emerge in deep scene cnns. CoRR, 2015

work page 2015
[32]

Scene parsing through ade20k dataset

Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Scene parsing through ade20k dataset. In CVPR, 2017

work page 2017
[33]

Visualizing Deep Neural Network Decisions: Prediction Difference Analysis

Luisa M Zintgraf, Taco S Cohen, Tameem Adel, and Max Welling. Visualizing deep neural network decisions: Prediction difference analysis. arXiv:1702.04595, 2017. 10 Supplementary Material A Intro This material contains additional information that otherwise would not have ﬁt in the main paper. It is organized in three parts. The selection of concepts from t...

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

Contextual explanation networks.arXiv:1705.10301, 2017

Maruan Al-Shedivat, Avinava Dubey, and Eric P Xing. Contextual explanation networks.arXiv:1705.10301, 2017

work page arXiv 2017

[2] [2]

On pixel-wise explanations for non-linear classiﬁer decisions by layer-wise relevance propagation

Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On pixel-wise explanations for non-linear classiﬁer decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015

work page 2015

[3] [3]

Network dissection: Quantify- ing interpretability of deep visual representations

David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network dissection: Quantify- ing interpretability of deep visual representations. In CVPR, 2017

work page 2017

[4] [4]

Surf: Speeded up robust features

Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. In ECCV, 2006

work page 2006

[5] [5]

Opensurfaces: A richly annotated catalog of surface appearance

Sean Bell, Paul Upchurch, Noah Snavely, and Kavita Bala. Opensurfaces: A richly annotated catalog of surface appearance. ACM Transactions on Graphics (TOG), 32(4):111, 2013

work page 2013

[6] [6]

Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI, 40(4):834–848, 2018

work page 2018

[7] [7]

Detect what you can: Detecting and representing objects using holistic models and body parts

Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, and Alan Yuille. Detect what you can: Detecting and representing objects using holistic models and body parts. In CVPR, 2014

work page 2014

[8] [8]

The cityscapes dataset for semantic urban scene understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016

work page 2016

[9] [9]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv:1412.6572, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[10] [10]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016

work page 2016

[11] [11]

Joe H. Ward Jr. Hierarchical grouping to optimize an objective function.Journal of the American Statistical Association, 58(301):236–244, 1963

work page 1963

[12] [12]

Inter- pretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Inter- pretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In ICML, 2018

work page 2018

[13] [13]

The (Un)reliability of saliency methods

Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. The (un) reliability of saliency methods. arXiv:1711.00867, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Adversarial examples in the physical world

Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv:1607.02533, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[15] [15]

Object bank: A high-level image representation for scene classiﬁcation & semantic feature sparsiﬁcation

Li-Jia Li, Hao Su, Li Fei-Fei, and Eric P Xing. Object bank: A high-level image representation for scene classiﬁcation & semantic feature sparsiﬁcation. InNIPS, 2010. 9

work page 2010

[16] [16]

Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions

Oscar Li, Hao Liu, Chaofan Chen, and Cynthia Rudin. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In AAAI, 2018

work page 2018

[17] [17]

The mythos of model interpretability

Zachary C Lipton. The mythos of model interpretability. Queue, 16(3):30, 2018

work page 2018

[18] [18]

Distinctive image features from scale-invariant keypoints

David G Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004

work page 2004

[19] [19]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018

work page 2018

[20] [20]

Towards robust interpretability with self-explaining neural networks

David Alvarez Melis and Tommi Jaakkola. Towards robust interpretability with self-explaining neural networks. In NIPS, 2018

work page 2018

[21] [21]

Grad-cam: Visual explanations from deep networks via gradient-based localization

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra, et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV, pages 618–626, 2017

work page 2017

[22] [22]

Learning important features through propagat- ing activation differences

Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagat- ing activation differences. In ICML, 2017

work page 2017

[23] [23]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classiﬁcation models and saliency maps. arXiv:1312.6034, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[24] [24]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In ICML, 2017

work page 2017

[25] [25]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[26] [26]

Counterfactual explanations without opening the black box: Automated decisions and the gdpr

Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harvard Journal of Law & Technology, 31(2):2018, 2017

work page 2018

[27] [27]

Uniﬁed perceptual parsing for scene understanding

Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, and Jian Sun. Uniﬁed perceptual parsing for scene understanding. In ECCV, 2018

work page 2018

[28] [28]

Understanding Neural Networks Through Deep Visualization

Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. Understanding neural networks through deep visualization. arXiv:1506.06579, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[29] [29]

Visualizing and understanding convolutional networks

Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014

work page 2014

[30] [30]

Pyramid scene parsing network

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In CVPR, 2017

work page 2017

[31] [31]

Object detectors emerge in deep scene cnns

Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Object detectors emerge in deep scene cnns. CoRR, 2015

work page 2015

[32] [32]

Scene parsing through ade20k dataset

Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Scene parsing through ade20k dataset. In CVPR, 2017

work page 2017

[33] [33]

Visualizing Deep Neural Network Decisions: Prediction Difference Analysis

Luisa M Zintgraf, Taco S Cohen, Tameem Adel, and Max Welling. Visualizing deep neural network decisions: Prediction difference analysis. arXiv:1702.04595, 2017. 10 Supplementary Material A Intro This material contains additional information that otherwise would not have ﬁt in the main paper. It is organized in three parts. The selection of concepts from t...

work page internal anchor Pith review Pith/arXiv arXiv 2017