Towards Reasonable Concept Bottleneck Models

Georgi Vitanov; Isabel Valera; Kavya Gupta; Nektarios Kalampalikis

arxiv: 2506.05014 · v2 · submitted 2025-06-05 · 💻 cs.LG · cs.AI· stat.ML

Towards Reasonable Concept Bottleneck Models

Nektarios Kalampalikis , Kavya Gupta , Georgi Vitanov , Isabel Valera This is my paper

Pith reviewed 2026-05-19 11:11 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords concept bottleneck modelsinterpretable machine learningconcept reasoningmodel interventionsmissing conceptsconcept leakage

0 comments

The pith

CREAM models let users bake concept relationships directly into the architecture of concept bottleneck models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Concept REAsoning Models (CREAMs) as a flexible way to build concept bottleneck models that explicitly incorporate known relationships among concepts and from concepts to the prediction target. By encoding things like mutual exclusivity, hierarchies, or correlations into the model structure itself, CREAM avoids having to learn these links from data alone and supports direct interventions on concepts. An optional side-channel can fill in for missing concepts without letting the model drift away from concept-based reasoning. Experiments show these models match black-box performance while remaining intervenable and avoiding concept leakage even when the concept set is incomplete.

Core claim

CREAMs extend standard concept bottleneck models by architecturally encoding arbitrary C-C relationships such as mutual exclusivity or hierarchical associations together with potentially sparse C to Y mappings, and they optionally add a regularized side-channel that lets the model reach black-box accuracy under missing concepts while keeping predictions concept-grounded.

What carries the argument

The CREAM architecture that directly wires known concept-concept and concept-task relationships into the model's reasoning layers, plus an optional regularized side-channel for incomplete concept sets.

If this is right

Direct interventions on concepts become more efficient because the model already respects the encoded relationships.
Concept leakage is reduced because the architecture prevents the model from bypassing the concept layer.
Task performance stays competitive with black-box models even when only a subset of concepts is observed.
Predictions remain more interpretable because the side-channel is regularized and the core reasoning stays concept-grounded.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same wiring approach could be applied to other bottleneck-style architectures beyond classification, such as regression or generative tasks.
If the encoded relationships turn out to be only approximately correct, the side-channel might still allow graceful degradation rather than catastrophic failure.
Future work could test whether automatically discovering and then wiring the relationships produces similar gains without requiring manual prior knowledge.

Load-bearing premise

Practitioners already know the correct concept-concept and concept-task relationships and can encode them accurately without adding new errors.

What would settle it

An experiment in which the encoded relationships are deliberately set to incorrect values and the model is then checked for drops in task accuracy, failed interventions, or increased concept leakage compared with a standard CBM.

Figures

Figures reproduced from arXiv: 2506.05014 by Georgi Vitanov, Isabel Valera, Kavya Gupta, Nektarios Kalampalikis.

**Figure 1.** Figure 1: Reasoning graph for FMNIST from [52]. The white nodes represent concepts used in iFMNIST, while the gray nodes introduce additional seasonality-related concepts to form cFMNIST, addressing concept incompleteness. Concepts and classes within the solid boxes are mutually exclusive. To address these limitations, we propose Concept REAsoning Models (CREAM), a novel family of CBMs, which enforce a desired struc… view at source ↗

**Figure 2.** Figure 2: Partial illustration of CUB’s reasoning graph is shown, where concepts are represented as [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Reasoning graph for predicting “Smiling” (Y ) in CelebA dataset [30]. The graph shows the relationships between most correlated concepts (C) within the face region that directly contribute to smiling. Types of C-C Relationships. AC allows us to capture various types of relationships among the concepts. First, it allows us to capture both the hierarchical and bidirected concept relationships that lead to as… view at source ↗

**Figure 4.** Figure 4: Sketch of CREAM, where the backbone output is split into concept ( [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Impact of dropout rates on channel importance. Models with complete concept sets require less side-channel regularization. Interpretability in the presence of a side-channel. Given that our model incorporates a black-box side-channel, it is crucial to verify that predictions primarily rely on the concepts rather than the side-channel. To quantify this, we introduce a new metric called Concept Channel Impo… view at source ↗

**Figure 6.** Figure 6: Impact of individual interventions on task accuracy. The baseline model is [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Mean values ± standard deviation of the permutation feature importance of both channels, in all datasets for S-CREAM, averaged over 5 seeds. Increasing p leads to an increase in PCI and decreases PSI. C Dataset Details A summary of the datasets used in our experiments is provided in [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Expanded version of the illustration of CREAM found in the main text, providing additional [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Impact of group interventions compared to individual ones, on task accuracy. Group [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Correlation matrix for the CREAM models reported in Table 2. For each concept we draw [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Correlation matrix for the CREAM models reported in Table 2. For each concept we draw [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

**Figure 12.** Figure 12: Covariance matrix of the outputs of the side-channel for CREAM. On the left, we report [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

**Figure 13.** Figure 13: Concept and Task accuracies across different dropout rates (p [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗

**Figure 14.** Figure 14: Concept and Task accuracies across different [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗

**Figure 15.** Figure 15: Concept and Task accuracies across different [PITH_FULL_IMAGE:figures/full_fig_p024_15.png] view at source ↗

**Figure 16.** Figure 16: Intervenability comparison of H-CREAM and S-CREAM. The latter slightly outperforms [PITH_FULL_IMAGE:figures/full_fig_p025_16.png] view at source ↗

**Figure 17.** Figure 17: Correlation matrices of the output of the representation splitter ( [PITH_FULL_IMAGE:figures/full_fig_p026_17.png] view at source ↗

read the original abstract

We propose a novel, flexible, and efficient framework for designing Concept Bottleneck Models (CBMs) that enables practitioners to explicitly encode and extend their prior knowledge and beliefs about the concept-concept ($C-C$) and concept-task ($C \to Y$) relationships within the model's reasoning when making predictions. The resulting $\textbf{C}$oncept $\textbf{REA}$soning $\textbf{M}$odels (CREAMs) architecturally encode arbitrary types of $C-C$ relationships such as mutual exclusivity, hierarchical associations, and/or correlations, as well as potentially sparse $C \to Y$ relationships. Moreover, CREAM can optionally incorporate a regularized side-channel to complement the potentially {incomplete concept sets}, achieving competitive task performance while encouraging predictions to be concept-grounded. To evaluate CBMs in such settings, we introduce a $C \to Y$ agnostic metric that quantifies interpretability when predictions partially rely on the side-channel. In our experiments, we show that, without additional computational overhead, CREAM models support efficient interventions, can avoid concept leakage, and achieve black-box-level performance under missing concepts. We further analyze how an optional side-channel affects interpretability and intervenability. Importantly, the side-channel enables CBMs to remain effective even in scenarios where only a limited number of concepts are available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Concept REAsoning Models (CREAMs), a flexible extension of Concept Bottleneck Models that architecturally encodes prior knowledge on arbitrary C-C relationships (mutual exclusivity, hierarchies, correlations) and sparse C→Y links. It introduces an optional regularized side-channel to handle incomplete concept sets while encouraging concept-grounded predictions, along with a new C→Y agnostic metric for interpretability. Experiments claim that CREAM supports efficient interventions, avoids concept leakage, achieves black-box-level task performance under missing concepts, and remains effective with limited concepts, all without added computational overhead.

Significance. If the central claims hold, this work offers a practical way to inject domain knowledge into CBMs via architecture rather than post-hoc constraints, addressing a key limitation when concept sets are incomplete. The side-channel plus regularization and the new metric could improve both performance and evaluable interpretability in real-world settings. The manuscript's strengths include explicit handling of prior relationships and analysis of side-channel effects on intervenability.

major comments (2)

[§3] The central claim that architectural encoding of C-C and C→Y priors plus side-channel regularization forces predictions to remain concept-grounded (even with missing concepts) is load-bearing for the 'avoid concept leakage' and 'black-box-level performance' assertions, yet §3 provides no derivation or constraint showing that the chosen regularization term mathematically prevents the side-channel from learning non-concept shortcuts.
[§4.3] Table 2 and §4.3: the reported competitive performance and intervention results lack quantitative baselines, effect sizes, and cross-dataset behavior of the new C→Y agnostic metric, leaving the claim of black-box-level performance under missing concepts insufficiently verified.

minor comments (2)

[Abstract] The abstract states 'without additional computational overhead' but this is not supported by explicit runtime or parameter counts in the experiments section.
[§3.2] Notation for the side-channel regularization strength (listed as a free parameter) should be defined consistently with the loss term in Eq. (X) of §3.2.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address the major concerns regarding the theoretical justification for the side-channel regularization and the empirical validation of our results and metric.

read point-by-point responses

Referee: [§3] The central claim that architectural encoding of C-C and C→Y priors plus side-channel regularization forces predictions to remain concept-grounded (even with missing concepts) is load-bearing for the 'avoid concept leakage' and 'black-box-level performance' assertions, yet §3 provides no derivation or constraint showing that the chosen regularization term mathematically prevents the side-channel from learning non-concept shortcuts.

Authors: We agree that a formal mathematical derivation would provide stronger theoretical support. The regularization term is designed to minimize the side-channel's influence on the final prediction by penalizing deviations from concept-based reasoning. While we do not claim a hard constraint that completely eliminates shortcuts, the combination with the architectural priors encourages grounded predictions, as evidenced by our intervention experiments and leakage analyses. We will revise §3 to include a more detailed explanation of the regularization's intended effect and discuss its limitations in preventing all possible shortcuts. revision: partial
Referee: [§4.3] Table 2 and §4.3: the reported competitive performance and intervention results lack quantitative baselines, effect sizes, and cross-dataset behavior of the new C→Y agnostic metric, leaving the claim of black-box-level performance under missing concepts insufficiently verified.

Authors: We appreciate this point and will enhance the presentation in §4.3 and Table 2 by including quantitative baselines from standard CBMs and black-box models, along with effect sizes for the performance differences. Additionally, we will report the behavior of the C→Y agnostic metric across multiple datasets to better substantiate the interpretability claims under incomplete concept sets. revision: yes

Circularity Check

0 steps flagged

CREAM proposal introduces explicit architectural priors and a new agnostic metric; performance claims rest on external benchmarks rather than self-referential fits or definitions.

full rationale

The paper's core contribution is a new modeling framework that takes practitioner-provided C-C and C-Y relationships as explicit inputs and encodes them directly into the architecture (mutual exclusivity, hierarchies, sparse links, optional regularized side-channel). These encodings are not derived from the evaluation data; they are user-specified priors. The introduced C→Y agnostic metric is defined to measure reliance on the side-channel and is not fitted to the same performance numbers used to claim black-box-level results. Experiments compare against baselines on standard datasets, with no evidence that any reported prediction or grounding property reduces by construction to a parameter fit on the test set or to a self-citation chain. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that domain experts can supply accurate relational constraints and that the side-channel regularization can be tuned without destroying concept grounding.

free parameters (1)

side-channel regularization strength
Hyperparameter balancing the contribution of the optional side-channel against the concept bottleneck path.

axioms (1)

domain assumption Practitioners can accurately specify C-C and C-to-Y relationships
The entire encoding mechanism depends on this input being correct and complete.

pith-pipeline@v0.9.0 · 5777 in / 1203 out tokens · 44755 ms · 2026-05-19T11:11:28.238471+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/LogicAsFunctionalEquation.lean SatisfiesLawsOfLogic / Translation Theorem unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CREAM architecturally embeds (bi)directed concept-concept, and concept to task relationships specified by a human expert, while severing undesired information flows... using StrNN binary masks
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel / Jcost uniqueness unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Regularization of side-channel... dropout based regularization strategy... CCI = ϕc / (ϕc + ϕy)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages

[1]

Towards robust interpretability with self-explaining neural networks

David Alvarez Melis and Tommi Jaakkola. Towards robust interpretability with self-explaining neural networks. Advances in neural information processing systems, 2018

work page 2018
[2]

The description logic handbook: Theory, implementation and applications

Franz Baader. The description logic handbook: Theory, implementation and applications . Cambridge university press, 2003

work page 2003
[3]

Logic tensor networks

Samy Badreddine, Artur d’Avila Garcez, Luciano Serafini, and Michael Spranger. Logic tensor networks. Artificial Intelligence, 2022

work page 2022
[4]

Interpretable neural-symbolic concept reasoning

Pietro Barbiero, Gabriele Ciravegna, Francesco Giannini, Mateo Espinosa Zarlenga, Lucie Char- lotte Magister, Alberto Tonda, Pietro Lió, Frederic Precioso, Mateja Jamnik, and Giuseppe Marra. Interpretable neural-symbolic concept reasoning. In International Conference on Machine Learning. PMLR, 2023

work page 2023
[5]

Relational concept bottleneck models

Pietro Barbiero, Francesco Giannini, Gabriele Ciravegna, Michelangelo Diligenti, and Giuseppe Marra. Relational concept bottleneck models. In Advances in Neural Information Processing Systems, 2024

work page 2024
[6]

Estimating or propagating gradients through stochastic neurons for conditional computation

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. ArXiv, 2013

work page 2013
[7]

A neuro-symbolic benchmark suite for concept quality and reasoning shortcuts

Samuele Bortolotti, Emanuele Marconato, Tommaso Carraro, Paolo Morettin, Emile van Krieken, Antonio Vergari, Stefano Teso, Andrea Passerini, et al. A neuro-symbolic benchmark suite for concept quality and reasoning shortcuts. Advances in neural information processing systems, 2024

work page 2024
[8]

Shortcuts and identifiability in concept-based models from a neuro-symbolic lens

Samuele Bortolotti, Emanuele Marconato, Paolo Morettin, Andrea Passerini, and Stefano Teso. Shortcuts and identifiability in concept-based models from a neuro-symbolic lens. ArXiv, 2025

work page 2025
[9]

Random forests

Leo Breiman. Random forests. Machine learning, 2001

work page 2001
[10]

Structured neural networks for density estimation and causal inference

Asic Chen, Ruian Ian Shi, Xiang Gao, Ricardo Baptista, and Rahul G Krishnan. Structured neural networks for density estimation and causal inference. Advances in Neural Information Processing Systems, 2024

work page 2024
[11]

Concept whitening for interpretable image recognition

Zhi Chen, Yijie Bei, and Cynthia Rudin. Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2020

work page 2020
[12]

Understanding global feature contributions with additive importance measures

Ian Covert, Scott M Lundberg, and Su-In Lee. Understanding global feature contributions with additive importance measures. Advances in Neural Information Processing Systems, 2020

work page 2020
[13]

Skincon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis

Roxana Daneshjou, Mert Yuksekgonul, Zhuo Ran Cai, Roberto Novoa, and James Y Zou. Skincon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis. In Advances in Neural Information Processing Systems, 2022

work page 2022
[14]

Causally reliable concept bottleneck models

Giovanni De Felice, Arianna Casanova Flores, Francesco De Santis, Silvia Santini, Johannes Schneider, Pietro Barbiero, and Alberto Termine. Causally reliable concept bottleneck models. ArXiv, 2025

work page 2025
[15]

Causal concept graph models: Beyond causal opacity in deep learning

Gabriele Dominici, Pietro Barbiero, Mateo Espinosa Zarlenga, Alberto Termine, Martin Gjoreski, Giuseppe Marra, and Marc Langheinrich. Causal concept graph models: Beyond causal opacity in deep learning. In International Conference on Learning Representations , 2025

work page 2025
[16]

Towards a rigorous science of interpretable machine learning

Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning. ArXiv, 2017. 10

work page 2017
[17]

All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously

Aaron Fisher, Cynthia Rudin, and Francesca Dominici. All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 2019

work page 2019
[18]

Interpretable prognostics with concept bottleneck models

Florent Forest, Katharina Rombach, and Olga Fink. Interpretable prognostics with concept bottleneck models. ArXiv, 2024

work page 2024
[19]

Shortcut learning in deep neural networks

Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, and Felix A Wichmann. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2020

work page 2020
[20]

Taming the sigmoid bottleneck: Provably argmaxable sparse multi-label classification

Andreas Grivas, Antonio Vergari, and Adam Lopez. Taming the sigmoid bottleneck: Provably argmaxable sparse multi-label classification. InProceedings of the AAAI Conference on Artificial Intelligence, 2024

work page 2024
[21]

Addressing leakage in concept bottleneck models

Marton Havasi, Sonali Parbhoo, and Finale Doshi-Velez. Addressing leakage in concept bottleneck models. Advances in Neural Information Processing Systems, 2022

work page 2022
[22]

Upper bound of bayesian generalization error in partial concept bottleneck model (cbm): Partial cbm outperforms naive cbm

Naoki Hayashi and Yoshihide Sawada. Upper bound of bayesian generalization error in partial concept bottleneck model (cbm): Partial cbm outperforms naive cbm. ArXiv, 2024

work page 2024
[23]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition, 2016

work page 2016
[24]

Deep networks with stochastic depth

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Weinberger. Deep networks with stochastic depth. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 2016

work page 2016
[25]

Causal normalizing flows: from theory to practice

Adrián Javaloy, Pablo Sánchez-Martín, and Isabel Valera. Causal normalizing flows: from theory to practice. Advances in Neural Information Processing Systems, 2024

work page 2024
[26]

Adam: A method for stochastic optimization

Diederik P Kingma. Adam: A method for stochastic optimization. ArXiv, 2014

work page 2014
[27]

Concept bottleneck models

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In International conference on machine learning. PMLR, 2020

work page 2020
[28]

how do i fool you?

Himabindu Lakkaraju and Osbert Bastani. " how do i fool you?" manipulating user trust via misleading black box explanations. In ACM Conference on AI, Ethics, and Society, 2020

work page 2020
[29]

The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery

Zachary C Lipton. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 2018

work page 2018
[30]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In International Conference on Computer Vision, 2015

work page 2015
[31]

Towards learning to explain with concept bottleneck models: mitigating information leakage

Joshua Lockhart, Nicolas Marchesotti, Daniele Magazzeni, and Manuela Veloso. Towards learning to explain with concept bottleneck models: mitigating information leakage. ArXiv, 2022

work page 2022
[32]

Promises and pitfalls of black-box concept learning models

Anita Mahinpei, Justin Clark, Isaac Lage, Finale Doshi-Velez, and Weiwei Pan. Promises and pitfalls of black-box concept learning models. ArXiv, 2021

work page 2021
[33]

Measuring leakage in concept-based methods: An information theoretic approach

Mikael Makonnen, Moritz Vandenhirtz, Sonia Laguna, and Julia E V ogt. Measuring leakage in concept-based methods: An information theoretic approach. In ICLR 2025 Workshop: XAI4Science: From Understanding Model Behavior to Discovering New Scientific Knowledge, 2025

work page 2025
[34]

Deepproblog: Neural probabilistic logic programming

Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, and Luc De Raedt. Deepproblog: Neural probabilistic logic programming. Advances in neural informa- tion processing systems, 2018

work page 2018
[35]

Glancenets: Interpretable, leak-proof concept-based models

Emanuele Marconato, Andrea Passerini, and Stefano Teso. Glancenets: Interpretable, leak-proof concept-based models. Advances in Neural Information Processing Systems, 2022

work page 2022
[36]

Interpretability is in the mind of the beholder: A causal framework for human-interpretable representation learning

Emanuele Marconato, Andrea Passerini, and Stefano Teso. Interpretability is in the mind of the beholder: A causal framework for human-interpretable representation learning. Entropy, 2023

work page 2023
[37]

Not all neuro- symbolic concepts are created equal: Analysis and mitigation of reasoning shortcuts

Emanuele Marconato, Stefano Teso, Antonio Vergari, and Andrea Passerini. Not all neuro- symbolic concepts are created equal: Analysis and mitigation of reasoning shortcuts. Advances in Neural Information Processing Systems, 2023. 11

work page 2023
[38]

Do concept bottleneck models learn as intended? ArXiv, 2021

Andrei Margeloiu, Matthew Ashman, Umang Bhatt, Yanzhi Chen, Mateja Jamnik, and Adrian Weller. Do concept bottleneck models learn as intended? ArXiv, 2021

work page 2021
[39]

Interpretable Machine Learning

Christoph Molnar. Interpretable Machine Learning. 3 edition, 2025. ISBN 978-3-911578-03-5. URL https://christophm.github.io/interpretable-ml-book

work page 2025
[40]

Nguyen, and Tsui-Wei Weng

Tuomas Oikarinen, Subhro Das, Lam M. Nguyen, and Tsui-Wei Weng. Label-free concept bottleneck models. In International Conference on Learning Representations, 2023

work page 2023
[41]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 2019

work page 2019
[42]

J. Pearl. Causality. Causality: Models, Reasoning, and Inference. Cambridge Univer- sity Press, 2009. ISBN 9780521895606. URL https://books.google.de/books?id= f4nuexsNVZIC

work page 2009
[43]

Concept- based explainable artificial intelligence: A survey

Eleonora Poeta, Gabriele Ciravegna, Eliana Pastor, Tania Cerquitelli, and Elena Baralis. Concept- based explainable artificial intelligence: A survey. ArXiv, 2023

work page 2023
[44]

Tree-based leakage inspection and control in concept bottleneck models

Angelos Ragkousis and Sonali Parbhoo. Tree-based leakage inspection and control in concept bottleneck models. ArXiv, 2024

work page 2024
[45]

From causal to concept-based representation learning

Goutham Rajendran, Simon Buchholz, Bryon Aragam, Bernhard Schölkopf, and Pradeep Ravikumar. From causal to concept-based representation learning. Advances in Neural Infor- mation Processing Systems, 37:101250–101296, 2024

work page 2024
[46]

Zuko: Normalizing flows in pytorch, 2022

François Rozet et al. Zuko: Normalizing flows in pytorch, 2022. URL https://pypi.org/ project/zuko

work page 2022
[47]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 2019

work page 2019
[48]

Interpretable machine learning: Fundamental principles and 10 grand challenges

Cynthia Rudin, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, and Chudi Zhong. Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistic Surveys, 2022

work page 2022
[49]

Explainable AI: interpreting, explaining and visualizing deep learning

Wojciech Samek, Grégoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Müller. Explainable AI: interpreting, explaining and visualizing deep learning. Springer Nature, 2019

work page 2019
[50]

Improving the interpretability of gnn predictions through conformal-based graph sparsification

Pablo Sanchez-Martin, Kinaan Aamir Khan, and Isabel Valera. Improving the interpretability of gnn predictions through conformal-based graph sparsification. ArXiv, 2024

work page 2024
[51]

Concept bottleneck model with additional unsuper- vised concepts

Yoshihide Sawada and Keigo Nakamura. Concept bottleneck model with additional unsuper- vised concepts. IEEE Access, 2022

work page 2022
[52]

Hierarchical convolutional neural networks for fashion image classification

Yian Seo and Kyung-shik Shin. Hierarchical convolutional neural networks for fashion image classification. Expert systems with applications, 2019

work page 2019
[53]

A value for n-person games

Lloyd S Shapley et al. A value for n-person games. 1953

work page 1953
[54]

Auxiliary losses for learning generalizable concept- based models

Ivaxi Sheth and Samira Ebrahimi Kahou. Auxiliary losses for learning generalizable concept- based models. Advances in Neural Information Processing Systems, 2023

work page 2023
[55]

A closer look at the intervention procedure of concept bottleneck models

Sungbin Shin, Yohan Jo, Sungsoo Ahn, and Namhoon Lee. A closer look at the intervention procedure of concept bottleneck models. In International Conference on Machine Learning. PMLR, 2023

work page 2023
[56]

Colidr: Concept learning using aggre- gated disentangled representations

Sanchit Sinha, Guangzhi Xiong, and Aidong Zhang. Colidr: Concept learning using aggre- gated disentangled representations. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

work page 2024
[57]

Eliminating information leakage in hard concept bottleneck models with supervised, hierarchical concept learning

Ao Sun, Yuanyuan Yuan, Pingchuan Ma, and Shuai Wang. Eliminating information leakage in hard concept bottleneck models with supervised, hierarchical concept learning. ArXiv, 2024

work page 2024
[58]

Stochastic concept bottleneck models

Moritz Vandenhirtz, Sonia Laguna, Riˇcards Marcinkeviˇcs, and Julia V ogt. Stochastic concept bottleneck models. Advances in Neural Information Processing Systems, 2024

work page 2024
[59]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011. 12

work page 2011
[60]

Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017

Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017

work page 2017
[61]

A study of face obfuscation in imagenet

Kaiyu Yang, Jacqueline H Yau, Li Fei-Fei, Jia Deng, and Olga Russakovsky. A study of face obfuscation in imagenet. In International Conference on Machine Learning. PMLR, 2022

work page 2022
[62]

Language in a bottle: Language model guided concept bottlenecks for interpretable image classification

Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, and Mark Yatskar. Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. In Conference on Computer Vision and Pattern Recognition, 2023

work page 2023
[63]

On completeness-aware concept-based explanations in deep neural networks

Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. Advances in neural information processing systems, 2020

work page 2020
[64]

Post-hoc concept bottleneck models

Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc concept bottleneck models. In International Conference on Learning Representations, 2023

work page 2023
[65]

Benchmarking and enhancing disentanglement in concept-residual models

Renos Zabounidis, Ini Oguntola, Konghao Zhao, Joseph Campbell, Simon Stepputtis, and Katia Sycara. Benchmarking and enhancing disentanglement in concept-residual models. ArXiv, 2023

work page 2023
[66]

A survey on causal discovery: Theory and practice

Alessio Zanga, Elif Ozkirimli, and Fabio Stella. A survey on causal discovery: Theory and practice. International Journal of Approximate Reasoning, 2022

work page 2022
[67]

Summer”, “Winter

Mateo Espinosa Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Frederic Precioso, Stefano Melacci, Adrian Weller, Pietro Lio, et al. Concept embedding models. In Advances in Neural Information Processing Systems, 2022. 13 A Structured Neural Networks for Concept-Concept and Concept-Task Relationsh...

work page 2022

[1] [1]

Towards robust interpretability with self-explaining neural networks

David Alvarez Melis and Tommi Jaakkola. Towards robust interpretability with self-explaining neural networks. Advances in neural information processing systems, 2018

work page 2018

[2] [2]

The description logic handbook: Theory, implementation and applications

Franz Baader. The description logic handbook: Theory, implementation and applications . Cambridge university press, 2003

work page 2003

[3] [3]

Logic tensor networks

Samy Badreddine, Artur d’Avila Garcez, Luciano Serafini, and Michael Spranger. Logic tensor networks. Artificial Intelligence, 2022

work page 2022

[4] [4]

Interpretable neural-symbolic concept reasoning

Pietro Barbiero, Gabriele Ciravegna, Francesco Giannini, Mateo Espinosa Zarlenga, Lucie Char- lotte Magister, Alberto Tonda, Pietro Lió, Frederic Precioso, Mateja Jamnik, and Giuseppe Marra. Interpretable neural-symbolic concept reasoning. In International Conference on Machine Learning. PMLR, 2023

work page 2023

[5] [5]

Relational concept bottleneck models

Pietro Barbiero, Francesco Giannini, Gabriele Ciravegna, Michelangelo Diligenti, and Giuseppe Marra. Relational concept bottleneck models. In Advances in Neural Information Processing Systems, 2024

work page 2024

[6] [6]

Estimating or propagating gradients through stochastic neurons for conditional computation

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. ArXiv, 2013

work page 2013

[7] [7]

A neuro-symbolic benchmark suite for concept quality and reasoning shortcuts

Samuele Bortolotti, Emanuele Marconato, Tommaso Carraro, Paolo Morettin, Emile van Krieken, Antonio Vergari, Stefano Teso, Andrea Passerini, et al. A neuro-symbolic benchmark suite for concept quality and reasoning shortcuts. Advances in neural information processing systems, 2024

work page 2024

[8] [8]

Shortcuts and identifiability in concept-based models from a neuro-symbolic lens

Samuele Bortolotti, Emanuele Marconato, Paolo Morettin, Andrea Passerini, and Stefano Teso. Shortcuts and identifiability in concept-based models from a neuro-symbolic lens. ArXiv, 2025

work page 2025

[9] [9]

Random forests

Leo Breiman. Random forests. Machine learning, 2001

work page 2001

[10] [10]

Structured neural networks for density estimation and causal inference

Asic Chen, Ruian Ian Shi, Xiang Gao, Ricardo Baptista, and Rahul G Krishnan. Structured neural networks for density estimation and causal inference. Advances in Neural Information Processing Systems, 2024

work page 2024

[11] [11]

Concept whitening for interpretable image recognition

Zhi Chen, Yijie Bei, and Cynthia Rudin. Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2020

work page 2020

[12] [12]

Understanding global feature contributions with additive importance measures

Ian Covert, Scott M Lundberg, and Su-In Lee. Understanding global feature contributions with additive importance measures. Advances in Neural Information Processing Systems, 2020

work page 2020

[13] [13]

Skincon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis

Roxana Daneshjou, Mert Yuksekgonul, Zhuo Ran Cai, Roberto Novoa, and James Y Zou. Skincon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis. In Advances in Neural Information Processing Systems, 2022

work page 2022

[14] [14]

Causally reliable concept bottleneck models

Giovanni De Felice, Arianna Casanova Flores, Francesco De Santis, Silvia Santini, Johannes Schneider, Pietro Barbiero, and Alberto Termine. Causally reliable concept bottleneck models. ArXiv, 2025

work page 2025

[15] [15]

Causal concept graph models: Beyond causal opacity in deep learning

Gabriele Dominici, Pietro Barbiero, Mateo Espinosa Zarlenga, Alberto Termine, Martin Gjoreski, Giuseppe Marra, and Marc Langheinrich. Causal concept graph models: Beyond causal opacity in deep learning. In International Conference on Learning Representations , 2025

work page 2025

[16] [16]

Towards a rigorous science of interpretable machine learning

Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning. ArXiv, 2017. 10

work page 2017

[17] [17]

All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously

Aaron Fisher, Cynthia Rudin, and Francesca Dominici. All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 2019

work page 2019

[18] [18]

Interpretable prognostics with concept bottleneck models

Florent Forest, Katharina Rombach, and Olga Fink. Interpretable prognostics with concept bottleneck models. ArXiv, 2024

work page 2024

[19] [19]

Shortcut learning in deep neural networks

Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, and Felix A Wichmann. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2020

work page 2020

[20] [20]

Taming the sigmoid bottleneck: Provably argmaxable sparse multi-label classification

Andreas Grivas, Antonio Vergari, and Adam Lopez. Taming the sigmoid bottleneck: Provably argmaxable sparse multi-label classification. InProceedings of the AAAI Conference on Artificial Intelligence, 2024

work page 2024

[21] [21]

Addressing leakage in concept bottleneck models

Marton Havasi, Sonali Parbhoo, and Finale Doshi-Velez. Addressing leakage in concept bottleneck models. Advances in Neural Information Processing Systems, 2022

work page 2022

[22] [22]

Upper bound of bayesian generalization error in partial concept bottleneck model (cbm): Partial cbm outperforms naive cbm

Naoki Hayashi and Yoshihide Sawada. Upper bound of bayesian generalization error in partial concept bottleneck model (cbm): Partial cbm outperforms naive cbm. ArXiv, 2024

work page 2024

[23] [23]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition, 2016

work page 2016

[24] [24]

Deep networks with stochastic depth

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Weinberger. Deep networks with stochastic depth. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 2016

work page 2016

[25] [25]

Causal normalizing flows: from theory to practice

Adrián Javaloy, Pablo Sánchez-Martín, and Isabel Valera. Causal normalizing flows: from theory to practice. Advances in Neural Information Processing Systems, 2024

work page 2024

[26] [26]

Adam: A method for stochastic optimization

Diederik P Kingma. Adam: A method for stochastic optimization. ArXiv, 2014

work page 2014

[27] [27]

Concept bottleneck models

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In International conference on machine learning. PMLR, 2020

work page 2020

[28] [28]

how do i fool you?

Himabindu Lakkaraju and Osbert Bastani. " how do i fool you?" manipulating user trust via misleading black box explanations. In ACM Conference on AI, Ethics, and Society, 2020

work page 2020

[29] [29]

The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery

Zachary C Lipton. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 2018

work page 2018

[30] [30]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In International Conference on Computer Vision, 2015

work page 2015

[31] [31]

Towards learning to explain with concept bottleneck models: mitigating information leakage

Joshua Lockhart, Nicolas Marchesotti, Daniele Magazzeni, and Manuela Veloso. Towards learning to explain with concept bottleneck models: mitigating information leakage. ArXiv, 2022

work page 2022

[32] [32]

Promises and pitfalls of black-box concept learning models

Anita Mahinpei, Justin Clark, Isaac Lage, Finale Doshi-Velez, and Weiwei Pan. Promises and pitfalls of black-box concept learning models. ArXiv, 2021

work page 2021

[33] [33]

Measuring leakage in concept-based methods: An information theoretic approach

Mikael Makonnen, Moritz Vandenhirtz, Sonia Laguna, and Julia E V ogt. Measuring leakage in concept-based methods: An information theoretic approach. In ICLR 2025 Workshop: XAI4Science: From Understanding Model Behavior to Discovering New Scientific Knowledge, 2025

work page 2025

[34] [34]

Deepproblog: Neural probabilistic logic programming

Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, and Luc De Raedt. Deepproblog: Neural probabilistic logic programming. Advances in neural informa- tion processing systems, 2018

work page 2018

[35] [35]

Glancenets: Interpretable, leak-proof concept-based models

Emanuele Marconato, Andrea Passerini, and Stefano Teso. Glancenets: Interpretable, leak-proof concept-based models. Advances in Neural Information Processing Systems, 2022

work page 2022

[36] [36]

Interpretability is in the mind of the beholder: A causal framework for human-interpretable representation learning

Emanuele Marconato, Andrea Passerini, and Stefano Teso. Interpretability is in the mind of the beholder: A causal framework for human-interpretable representation learning. Entropy, 2023

work page 2023

[37] [37]

Not all neuro- symbolic concepts are created equal: Analysis and mitigation of reasoning shortcuts

Emanuele Marconato, Stefano Teso, Antonio Vergari, and Andrea Passerini. Not all neuro- symbolic concepts are created equal: Analysis and mitigation of reasoning shortcuts. Advances in Neural Information Processing Systems, 2023. 11

work page 2023

[38] [38]

Do concept bottleneck models learn as intended? ArXiv, 2021

Andrei Margeloiu, Matthew Ashman, Umang Bhatt, Yanzhi Chen, Mateja Jamnik, and Adrian Weller. Do concept bottleneck models learn as intended? ArXiv, 2021

work page 2021

[39] [39]

Interpretable Machine Learning

Christoph Molnar. Interpretable Machine Learning. 3 edition, 2025. ISBN 978-3-911578-03-5. URL https://christophm.github.io/interpretable-ml-book

work page 2025

[40] [40]

Nguyen, and Tsui-Wei Weng

Tuomas Oikarinen, Subhro Das, Lam M. Nguyen, and Tsui-Wei Weng. Label-free concept bottleneck models. In International Conference on Learning Representations, 2023

work page 2023

[41] [41]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 2019

work page 2019

[42] [42]

J. Pearl. Causality. Causality: Models, Reasoning, and Inference. Cambridge Univer- sity Press, 2009. ISBN 9780521895606. URL https://books.google.de/books?id= f4nuexsNVZIC

work page 2009

[43] [43]

Concept- based explainable artificial intelligence: A survey

Eleonora Poeta, Gabriele Ciravegna, Eliana Pastor, Tania Cerquitelli, and Elena Baralis. Concept- based explainable artificial intelligence: A survey. ArXiv, 2023

work page 2023

[44] [44]

Tree-based leakage inspection and control in concept bottleneck models

Angelos Ragkousis and Sonali Parbhoo. Tree-based leakage inspection and control in concept bottleneck models. ArXiv, 2024

work page 2024

[45] [45]

From causal to concept-based representation learning

Goutham Rajendran, Simon Buchholz, Bryon Aragam, Bernhard Schölkopf, and Pradeep Ravikumar. From causal to concept-based representation learning. Advances in Neural Infor- mation Processing Systems, 37:101250–101296, 2024

work page 2024

[46] [46]

Zuko: Normalizing flows in pytorch, 2022

François Rozet et al. Zuko: Normalizing flows in pytorch, 2022. URL https://pypi.org/ project/zuko

work page 2022

[47] [47]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 2019

work page 2019

[48] [48]

Interpretable machine learning: Fundamental principles and 10 grand challenges

Cynthia Rudin, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, and Chudi Zhong. Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistic Surveys, 2022

work page 2022

[49] [49]

Explainable AI: interpreting, explaining and visualizing deep learning

Wojciech Samek, Grégoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Müller. Explainable AI: interpreting, explaining and visualizing deep learning. Springer Nature, 2019

work page 2019

[50] [50]

Improving the interpretability of gnn predictions through conformal-based graph sparsification

Pablo Sanchez-Martin, Kinaan Aamir Khan, and Isabel Valera. Improving the interpretability of gnn predictions through conformal-based graph sparsification. ArXiv, 2024

work page 2024

[51] [51]

Concept bottleneck model with additional unsuper- vised concepts

Yoshihide Sawada and Keigo Nakamura. Concept bottleneck model with additional unsuper- vised concepts. IEEE Access, 2022

work page 2022

[52] [52]

Hierarchical convolutional neural networks for fashion image classification

Yian Seo and Kyung-shik Shin. Hierarchical convolutional neural networks for fashion image classification. Expert systems with applications, 2019

work page 2019

[53] [53]

A value for n-person games

Lloyd S Shapley et al. A value for n-person games. 1953

work page 1953

[54] [54]

Auxiliary losses for learning generalizable concept- based models

Ivaxi Sheth and Samira Ebrahimi Kahou. Auxiliary losses for learning generalizable concept- based models. Advances in Neural Information Processing Systems, 2023

work page 2023

[55] [55]

A closer look at the intervention procedure of concept bottleneck models

Sungbin Shin, Yohan Jo, Sungsoo Ahn, and Namhoon Lee. A closer look at the intervention procedure of concept bottleneck models. In International Conference on Machine Learning. PMLR, 2023

work page 2023

[56] [56]

Colidr: Concept learning using aggre- gated disentangled representations

Sanchit Sinha, Guangzhi Xiong, and Aidong Zhang. Colidr: Concept learning using aggre- gated disentangled representations. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

work page 2024

[57] [57]

Eliminating information leakage in hard concept bottleneck models with supervised, hierarchical concept learning

Ao Sun, Yuanyuan Yuan, Pingchuan Ma, and Shuai Wang. Eliminating information leakage in hard concept bottleneck models with supervised, hierarchical concept learning. ArXiv, 2024

work page 2024

[58] [58]

Stochastic concept bottleneck models

Moritz Vandenhirtz, Sonia Laguna, Riˇcards Marcinkeviˇcs, and Julia V ogt. Stochastic concept bottleneck models. Advances in Neural Information Processing Systems, 2024

work page 2024

[59] [59]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011. 12

work page 2011

[60] [60]

Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017

Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017

work page 2017

[61] [61]

A study of face obfuscation in imagenet

Kaiyu Yang, Jacqueline H Yau, Li Fei-Fei, Jia Deng, and Olga Russakovsky. A study of face obfuscation in imagenet. In International Conference on Machine Learning. PMLR, 2022

work page 2022

[62] [62]

Language in a bottle: Language model guided concept bottlenecks for interpretable image classification

Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, and Mark Yatskar. Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. In Conference on Computer Vision and Pattern Recognition, 2023

work page 2023

[63] [63]

On completeness-aware concept-based explanations in deep neural networks

Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. Advances in neural information processing systems, 2020

work page 2020

[64] [64]

Post-hoc concept bottleneck models

Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc concept bottleneck models. In International Conference on Learning Representations, 2023

work page 2023

[65] [65]

Benchmarking and enhancing disentanglement in concept-residual models

Renos Zabounidis, Ini Oguntola, Konghao Zhao, Joseph Campbell, Simon Stepputtis, and Katia Sycara. Benchmarking and enhancing disentanglement in concept-residual models. ArXiv, 2023

work page 2023

[66] [66]

A survey on causal discovery: Theory and practice

Alessio Zanga, Elif Ozkirimli, and Fabio Stella. A survey on causal discovery: Theory and practice. International Journal of Approximate Reasoning, 2022

work page 2022

[67] [67]

Summer”, “Winter

Mateo Espinosa Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Frederic Precioso, Stefano Melacci, Adrian Weller, Pietro Lio, et al. Concept embedding models. In Advances in Neural Information Processing Systems, 2022. 13 A Structured Neural Networks for Concept-Concept and Concept-Task Relationsh...

work page 2022