pith. sign in

arxiv: 1907.10882 · v2 · pith:YRDQNCX5new · submitted 2019-07-25 · 💻 cs.CV · cs.LG

Interpretability Beyond Classification Output: Semantic Bottleneck Networks

Pith reviewed 2026-05-24 16:23 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords semantic bottleneckinterpretabilityscene segmentationfailure analysisconfidence estimationobject partsdeep networksend-to-end training
0
0 comments X

The pith

Semantic Bottleneck Networks force all predictions through a small set of human-interpretable concepts while matching state-of-the-art accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes inserting a Semantic Bottleneck Layer into deep networks for tasks like street scene segmentation. This layer consists of semantic concepts such as object parts and materials, reducing thousands of feature channels to tens. The network is retrained end-to-end around this layer and recovers full performance. Activations in the layer allow direct interpretation of why predictions fail and estimation of output confidence. This makes the basis for each decision transparent without sacrificing results.

Core claim

A deep network can house a Semantic Bottleneck Layer of task-related semantic concepts so that all downstream predictions depend only on those concepts. On street scene segmentation this yields state-of-the-art performance after reducing from thousands to tens of channels. The layer activations support failure case analysis and confidence prediction, producing interpretable segmentation results at over 99 percent accuracy for most predictions.

What carries the argument

The Semantic Bottleneck Layer (SB-Layer), an intermediate layer whose channels correspond to semantic concepts like object parts and materials; every final output must be computed from its activations alone.

If this is right

  • Failure modes become diagnosable through inspection of which semantic concepts the network activates incorrectly.
  • Output confidence can be predicted directly from the bottleneck activations.
  • Segmentation predictions gain interpretability because each result traces back to specific concepts.
  • High accuracy is maintained despite the drastic reduction in intermediate dimensionality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar bottlenecks could be tested on other dense prediction tasks such as depth estimation if appropriate concepts are identified.
  • Engineers might use the layer to inject domain knowledge by editing concept activations at inference time.
  • The method opens a route to auditing networks at the level of individual semantic units rather than raw features.

Load-bearing premise

The hand-chosen semantic concepts must carry all information needed for accurate segmentation so that retraining around the bottleneck introduces no permanent loss.

What would settle it

If no selection of around 50 semantic concepts allows a retrained network to reach within a few percent of the original segmentation accuracy on standard street scene benchmarks, the approach would not hold.

Figures

Figures reproduced from arXiv: 1907.10882 by Bernt Schiele, Mario Fritz, Max Losch.

Figure 1
Figure 1. Figure 1: Semantic Bottleneck Network (SBN). While the semantics of features in traditional [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Construction of SBNs. 1. Start off with a well performing model on the target task. 2. Train a function (SB) that maps intermediate rep￾resentations to semantic concepts. 3. Insert the SB back into the original model and finetune all downstream layers. The power of SBNs lies in the ability to inspect the evidence for the chosen semantic concepts to investigate errors. Such errors could involve the absence … view at source ↗
Figure 3
Figure 3. Figure 3: Populating the SB space to find modes of errors. The gray boxes enclosed in the SB indicate [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Sample from Broden+ dataset with annotations for parts (2nd row) and materials (3rd). As discussed, we want to learn relevant semantic representations in our SB with additional supervision. Broden+ [27] is a recent collection of datasets which serves as a starting point of our case study as it contains annotations for a broad range of relevant semantic concepts. It offers thousands of images for objects, p… view at source ↗
Figure 6
Figure 6. Figure 6: Segmentation with the SB placed at two different locations in the network results in different [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Task relevant con￾cepts outperform irrelevant ones. An experiment that we find necessary to conduct as sanity check, is the inspection of whether the relationship between semantic con￾tent and classes make sense, whether the feed forward pass from semantically meaningful concepts to the final network output is “se￾mantically lossless”. This can be examined via the newly gained ability to manipulate the SB … view at source ↗
Figure 7
Figure 7. Figure 7: Selection of error examples from four different clusters. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Accuracy assessment of the networks predictions with our proposed confidence metric. [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
read the original abstract

Today's deep learning systems deliver high performance based on end-to-end training. While they deliver strong performance, these systems are hard to interpret. To address this issue, we propose Semantic Bottleneck Networks (SBN): deep networks with semantically interpretable intermediate layers that all downstream results are based on. As a consequence, the analysis on what the final prediction is based on is transparent to the engineer and failure cases and modes can be analyzed and avoided by high-level reasoning. We present a case study on street scene segmentation to demonstrate the feasibility and power of SBN. In particular, we start from a well performing classic deep network which we adapt to house a SB-Layer containing task related semantic concepts (such as object-parts and materials). Importantly, we can recover state of the art performance despite a drastic dimensionality reduction from 1000s (non-semantic feature) to 10s (semantic concept) channels. Additionally we show how the activations of the SB-Layer can be used for both the interpretation of failure cases of the network as well as for confidence prediction of the resulting output. For the first time, e.g., we show interpretable segmentation results for most predictions at over 99% accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces Semantic Bottleneck Networks (SBNs) featuring a Semantic Bottleneck Layer (SB-Layer) with semantically interpretable concepts such as object-parts and materials. Through a case study on street scene segmentation, the authors adapt a standard deep network to include this layer and claim to recover state-of-the-art performance despite reducing the dimensionality from thousands of non-semantic features to tens of semantic concepts. They further demonstrate the use of SB-Layer activations for interpreting network failure cases and for confidence prediction, reporting interpretable segmentation results at over 99% accuracy for most predictions.

Significance. If the reported performance recovery holds, the work offers a practical route to interpretable deep vision models by constraining intermediate representations to human-understandable concepts without apparent loss in task performance. The application to failure mode analysis and confidence estimation adds utility for deployment in safety-critical settings. The choice of street scene segmentation as the case study is appropriate for testing the approach in a complex, real-world domain.

major comments (1)
  1. [Abstract] Abstract: The central claim that the chosen semantic concepts (object-parts and materials) are sufficient to encode all information required by the downstream segmentation task, allowing recovery of SOTA performance after drastic dimensionality reduction, lacks supporting evidence such as ablation studies on concept completeness or information-theoretic analysis; without this, the sufficiency assumption remains untested and the performance claim is at risk.
minor comments (1)
  1. The abstract mentions recovery of 'state of the art performance' and '99% accuracy' without naming the exact metrics (e.g., mIoU), datasets, or comparison baselines; these details should be added for immediate clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and positive assessment of the work's significance. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the chosen semantic concepts (object-parts and materials) are sufficient to encode all information required by the downstream segmentation task, allowing recovery of SOTA performance after drastic dimensionality reduction, lacks supporting evidence such as ablation studies on concept completeness or information-theoretic analysis; without this, the sufficiency assumption remains untested and the performance claim is at risk.

    Authors: We agree that the manuscript would benefit from additional explicit evidence supporting the sufficiency of the selected concepts. The reported recovery of state-of-the-art segmentation performance using only tens of semantic channels (versus thousands of non-semantic features) constitutes empirical evidence that the chosen concepts capture the information necessary for the task; this is further supported by the high accuracy of failure-mode interpretation derived directly from the SB-Layer activations. Nevertheless, to strengthen the claim we will revise the manuscript to include (i) an ablation study measuring performance degradation when individual concepts or concept groups are removed and (ii) a brief discussion of the task-specific rationale for concept selection. A formal information-theoretic analysis is not feasible within the scope of this work due to the intractability of estimating mutual information in high-dimensional feature spaces, but the empirical results provide a practical demonstration of sufficiency for the segmentation task. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical case study with no derivations or self-referential fits

full rationale

The paper reports an empirical adaptation of an existing segmentation network by inserting a fixed semantic bottleneck layer whose concepts are chosen by the authors. Performance recovery is demonstrated via end-to-end training and quantitative evaluation on held-out data, not via any equation, prediction, or uniqueness theorem that reduces to the inputs by construction. No load-bearing self-citations, fitted parameters renamed as predictions, or ansatzes smuggled through prior work appear in the provided text. The central claim therefore rests on external experimental outcomes rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The abstract introduces the SB-Layer as a new architectural component; no free parameters, background axioms, or additional invented entities beyond the layer itself are stated.

invented entities (1)
  • Semantic Bottleneck Layer (SB-Layer) no independent evidence
    purpose: To contain task-related semantic concepts as an interpretable intermediate representation
    Newly proposed component that all downstream results are based on

pith-pipeline@v0.9.0 · 5742 in / 1225 out tokens · 31059 ms · 2026-05-24T16:23:25.397553+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Investigating Concept Alignment Using Implausible Category Members

    cs.AI 2026-05 unverdicted novelty 6.0

    AI models misalign with humans on concept boundaries when probed with implausible category members, such as classifying words as vehicles or vegetables as fruit.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · cited by 1 Pith paper · 7 internal anchors

  1. [1]

    Contextual explanation networks.arXiv:1705.10301, 2017

    Maruan Al-Shedivat, Avinava Dubey, and Eric P Xing. Contextual explanation networks.arXiv:1705.10301, 2017

  2. [2]

    On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation

    Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015

  3. [3]

    Network dissection: Quantify- ing interpretability of deep visual representations

    David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network dissection: Quantify- ing interpretability of deep visual representations. In CVPR, 2017

  4. [4]

    Surf: Speeded up robust features

    Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. In ECCV, 2006

  5. [5]

    Opensurfaces: A richly annotated catalog of surface appearance

    Sean Bell, Paul Upchurch, Noah Snavely, and Kavita Bala. Opensurfaces: A richly annotated catalog of surface appearance. ACM Transactions on Graphics (TOG), 32(4):111, 2013

  6. [6]

    Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs

    Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI, 40(4):834–848, 2018

  7. [7]

    Detect what you can: Detecting and representing objects using holistic models and body parts

    Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, and Alan Yuille. Detect what you can: Detecting and representing objects using holistic models and body parts. In CVPR, 2014

  8. [8]

    The cityscapes dataset for semantic urban scene understanding

    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016

  9. [9]

    Explaining and Harnessing Adversarial Examples

    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv:1412.6572, 2014

  10. [10]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016

  11. [11]

    Joe H. Ward Jr. Hierarchical grouping to optimize an objective function.Journal of the American Statistical Association, 58(301):236–244, 1963

  12. [12]

    Inter- pretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)

    Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Inter- pretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In ICML, 2018

  13. [13]

    The (Un)reliability of saliency methods

    Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. The (un) reliability of saliency methods. arXiv:1711.00867, 2017

  14. [14]

    Adversarial examples in the physical world

    Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv:1607.02533, 2016

  15. [15]

    Object bank: A high-level image representation for scene classification & semantic feature sparsification

    Li-Jia Li, Hao Su, Li Fei-Fei, and Eric P Xing. Object bank: A high-level image representation for scene classification & semantic feature sparsification. InNIPS, 2010. 9

  16. [16]

    Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions

    Oscar Li, Hao Liu, Chaofan Chen, and Cynthia Rudin. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In AAAI, 2018

  17. [17]

    The mythos of model interpretability

    Zachary C Lipton. The mythos of model interpretability. Queue, 16(3):30, 2018

  18. [18]

    Distinctive image features from scale-invariant keypoints

    David G Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004

  19. [19]

    Towards deep learning models resistant to adversarial attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018

  20. [20]

    Towards robust interpretability with self-explaining neural networks

    David Alvarez Melis and Tommi Jaakkola. Towards robust interpretability with self-explaining neural networks. In NIPS, 2018

  21. [21]

    Grad-cam: Visual explanations from deep networks via gradient-based localization

    Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra, et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV, pages 618–626, 2017

  22. [22]

    Learning important features through propagat- ing activation differences

    Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagat- ing activation differences. In ICML, 2017

  23. [23]

    Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

    Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv:1312.6034, 2013

  24. [24]

    Axiomatic attribution for deep networks

    Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In ICML, 2017

  25. [25]

    Intriguing properties of neural networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv:1312.6199, 2013

  26. [26]

    Counterfactual explanations without opening the black box: Automated decisions and the gdpr

    Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harvard Journal of Law & Technology, 31(2):2018, 2017

  27. [27]

    Unified perceptual parsing for scene understanding

    Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, and Jian Sun. Unified perceptual parsing for scene understanding. In ECCV, 2018

  28. [28]

    Understanding Neural Networks Through Deep Visualization

    Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. Understanding neural networks through deep visualization. arXiv:1506.06579, 2015

  29. [29]

    Visualizing and understanding convolutional networks

    Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014

  30. [30]

    Pyramid scene parsing network

    Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In CVPR, 2017

  31. [31]

    Object detectors emerge in deep scene cnns

    Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Object detectors emerge in deep scene cnns. CoRR, 2015

  32. [32]

    Scene parsing through ade20k dataset

    Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Scene parsing through ade20k dataset. In CVPR, 2017

  33. [33]

    Visualizing Deep Neural Network Decisions: Prediction Difference Analysis

    Luisa M Zintgraf, Taco S Cohen, Tameem Adel, and Max Welling. Visualizing deep neural network decisions: Prediction difference analysis. arXiv:1702.04595, 2017. 10 Supplementary Material A Intro This material contains additional information that otherwise would not have fit in the main paper. It is organized in three parts. The selection of concepts from t...