Prototype-Grounded Concept Models for Verifiable Concept Alignment

David Debot; Giuseppe Marra; Pietro Barbiero; Stefano Colamonaco

arxiv: 2604.16076 · v2 · pith:TSAHBHVFnew · submitted 2026-04-17 · 💻 cs.LG · cs.AI· cs.NE

Prototype-Grounded Concept Models for Verifiable Concept Alignment

Stefano Colamonaco , David Debot , Pietro Barbiero , Giuseppe Marra This is my paper

Pith reviewed 2026-05-22 10:02 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NE

keywords concept bottleneck modelsprototype learningmodel interpretabilityconcept alignmentvisual prototypesverifiable explanationsdeep learning interventions

0 comments

The pith

Grounding concepts in learned visual prototypes enables verification of alignment and targeted interventions in concept bottleneck models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Concept Bottleneck Models try to make deep learning interpretable by routing predictions through human-understandable concepts, yet offer no built-in check that those concepts match the intended meanings. This paper proposes Prototype-Grounded Concept Models that tie each concept to specific visual prototypes extracted from images. The prototypes act as direct, inspectable evidence for what the model has learned about the concept. Because the evidence is explicit, a human can spot misalignments and intervene at the prototype level to correct them. The approach keeps predictive performance close to that of existing concept models while adding transparency and the ability to fix problems directly.

Core claim

By anchoring concepts to learned visual prototypes that serve as explicit evidence, Prototype-Grounded Concept Models let users inspect whether each concept captures the intended human meaning and apply targeted interventions at the prototype level to correct misalignments, all without degrading predictive accuracy relative to state-of-the-art Concept Bottleneck Models.

What carries the argument

Learned visual prototypes that ground each concept and serve as both evidence for inspection and points for direct intervention.

If this is right

Users can directly inspect the image parts that justify each concept and confirm alignment with their understanding.
Targeted edits to specific prototypes correct concept misalignments without retraining the entire model.
Predictive performance remains comparable to standard concept bottleneck models while transparency increases.
Interventions become more precise because changes target only the prototypes tied to a misaligned concept.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same prototype-grounding idea could be applied to other concept-based architectures to add verifiable evidence without changing their core prediction path.
In domains where concept drift occurs, monitoring prototype stability over time might flag when human semantics have shifted.
Automated tools could compare prototype clusters across models to detect systematic concept misalignment before deployment.

Load-bearing premise

The learned visual prototypes will reliably match the human-intended meaning of each concept so that prototype-level interventions fix misalignments without creating new errors.

What would settle it

A controlled experiment in which humans label prototype evidence for each concept, apply interventions to fix detected misalignments, and measure whether accuracy stays the same or improves compared with unadjusted models.

Figures

Figures reproduced from arXiv: 2604.16076 by David Debot, Giuseppe Marra, Pietro Barbiero, Stefano Colamonaco.

**Figure 2.** Figure 2: Example inference of PGCMs. From the image [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Probabilistic Graphical Model of PGCMs. Black arrows are used for generative and discriminative inference. Red arrows are only used for discriminative inference. The generative model is primarily used to inspect and interpret the learned prototypes. For each image part i, we introduce a latent variable Si , representing the selection of a prototype. Its prior p(Si) is a categorical distribution with one v… view at source ↗

**Figure 4.** Figure 4: Concept accuracy after intervening on increasingly more concepts on ColorMNIST+ and [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Concept and task accuracy on CLEVR-Hans for different numbers of learned prototypes. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of model outputs on CelebA, comparing our PGCM with competing CBM [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Examples from the CLEVR-Hans3 validation set together with the prototypes selected by [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Generated prototypes on CLEVR-Hans3 together with their predicted concept repre [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

read the original abstract

Concept Bottleneck Models (CBMs) aim to improve interpretability in Deep Learning by structuring predictions through human-understandable concepts, but they provide no way to verify whether learned concepts align with the human's intended meaning, hurting interpretability. We introduce Prototype-Grounded Concept Models (PGCMs), which ground concepts in learned visual prototypes: image parts that serve as explicit evidence for the concepts. This grounding enables direct inspection of concept semantics and supports targeted human intervention at the prototype level to correct misalignments. Empirically, PGCMs achieve similar predictive performance as state-of-the-art CBMs while substantially improving transparency, interpretability, and intervenability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PGCMs ground CBM concepts in learned visual prototypes for direct inspection and edits, but the intervenability gains rest on untested assumptions about semantic capture.

read the letter

PGCMs ground CBM concepts in learned visual prototypes for direct inspection and edits, but the intervenability gains rest on untested assumptions about semantic capture and error-free propagation. The core addition is making prototypes explicit, intervenable evidence rather than implicit activations. This lets a user look at the actual image parts tied to a concept and adjust them to fix misalignment without full retraining. That mechanism is not in the standard CBM papers cited and gives a concrete handle on verification that prior work lacked. The paper keeps accuracy comparable to existing CBMs on the reported tasks, which is a reasonable baseline to clear. The framing of the alignment problem is clear and the prototype idea fits visual data naturally. The soft spot is the lack of quantitative checks on the intervention step itself. Claims that targeted prototype edits correct misalignments without new errors or accuracy drops are supported mainly by examples, not by controlled measurements of success rate, side effects, or robustness to spurious correlations. If prototypes latch onto unintended visual cues, the transparency benefit shrinks even if raw performance holds. This is for people already working on concept bottleneck models who want a practical verification layer. Readers in high-stakes vision applications could extract a usable idea, though they would need to fill in the experimental gaps themselves. It deserves peer review so referees can push on the validation of the edit process and see whether the prototype grounding scales beyond the current setups.

Referee Report

2 major / 2 minor

Summary. The paper introduces Prototype-Grounded Concept Models (PGCMs) as an extension of Concept Bottleneck Models (CBMs). Concepts are grounded in learned visual prototypes that act as explicit image-part evidence, enabling direct inspection of semantic alignment with human intent and targeted interventions at the prototype level to correct misalignments. The central empirical claim is that PGCMs achieve predictive performance comparable to state-of-the-art CBMs while substantially improving transparency, interpretability, and intervenability.

Significance. If the performance parity and intervenability claims hold under rigorous testing, the work would meaningfully advance interpretable ML by supplying a concrete mechanism for verifying and editing concept semantics via visual prototypes. This directly tackles the alignment-verification gap in CBMs and could improve practical deployment in domains needing human oversight. The prototype-grounding idea is a clear incremental strength over prior CBM formulations.

major comments (2)

[Experiments] Experiments section: The claim of similar predictive performance to SOTA CBMs is load-bearing for the main result, yet the manuscript supplies no detailed baseline tables, accuracy numbers with variance, statistical tests, or controls. Without these, the parity assertion cannot be assessed and the overall contribution remains unsupported by visible evidence.
[Prototype Interventions] Prototype intervention subsection: The key benefit of improved intervenability rests on the assumption that prototype-level edits correct misalignments without new errors or performance degradation. The results appear to offer only qualitative examples rather than quantitative metrics (e.g., intervention success rate, post-edit accuracy delta, or side-effect analysis across concepts). This is a load-bearing gap for the transparency and intervenability claims.

minor comments (2)

[Abstract] Abstract: Adding one sentence naming the datasets or tasks used for the empirical evaluation would immediately contextualize the performance claims.
[Method] Notation: Ensure consistent definition of how prototype similarity is computed and whether any new hyperparameters are introduced beyond standard CBM training.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and clarify the empirical support in our work while committing to revisions that strengthen the presentation of results.

read point-by-point responses

Referee: [Experiments] Experiments section: The claim of similar predictive performance to SOTA CBMs is load-bearing for the main result, yet the manuscript supplies no detailed baseline tables, accuracy numbers with variance, statistical tests, or controls. Without these, the parity assertion cannot be assessed and the overall contribution remains unsupported by visible evidence.

Authors: We agree that the current results section would benefit from expanded quantitative detail to allow full assessment of the performance parity claim. In the revised manuscript we will add a new table in the Experiments section that reports mean test accuracy (with standard deviation over five random seeds) for PGCMs and the cited SOTA CBM baselines on all datasets. We will also include pairwise statistical significance tests (Wilcoxon signed-rank) and a brief description of hyper-parameter controls. These numbers are already available from our experimental logs and will be incorporated without altering the original experimental protocol. revision: yes
Referee: [Prototype Interventions] Prototype intervention subsection: The key benefit of improved intervenability rests on the assumption that prototype-level edits correct misalignments without new errors or performance degradation. The results appear to offer only qualitative examples rather than quantitative metrics (e.g., intervention success rate, post-edit accuracy delta, or side-effect analysis across concepts). This is a load-bearing gap for the transparency and intervenability claims.

Authors: We acknowledge that the intervenability evaluation would be strengthened by quantitative metrics. We will revise the Prototype Interventions subsection to include a new quantitative study: for a random sample of 200 misaligned concept predictions we will report (i) the fraction of cases in which a single prototype-level edit restores correct concept alignment (as verified by two independent human annotators), (ii) the mean change in downstream task accuracy after the edit, and (iii) the average number of unintended side-effect changes to other concepts. These results will be presented in a table alongside the existing qualitative examples. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces Prototype-Grounded Concept Models as an empirical architectural extension to Concept Bottleneck Models, grounding concepts via learned visual prototypes to enable inspection and intervention. Claims of comparable predictive performance with improved transparency rest on experimental results rather than any closed mathematical derivation or self-referential definition. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided abstract or description that would reduce the central claims to tautological inputs by construction. The approach is presented as a verifiable design choice validated externally through performance metrics and qualitative examples, remaining self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the domain assumption that visual prototypes can faithfully represent concept semantics and on the invented entity of 'visual prototypes' introduced to enable inspection and intervention. No free parameters are mentioned in the abstract.

axioms (1)

domain assumption Visual prototypes extracted from images can serve as faithful and intervenable representations of human-intended concepts.
This premise is required for the grounding and intervention claims to hold.

invented entities (1)

Visual prototypes no independent evidence
purpose: To provide explicit, inspectable evidence for each concept and to support targeted human corrections.
New postulated mechanism introduced by the paper; no independent evidence outside the model is described in the abstract.

pith-pipeline@v0.9.0 · 5644 in / 1208 out tokens · 40490 ms · 2026-05-22T10:02:34.284253+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PGCMs ground concepts in learned visual prototypes... concept alignment table maps every prototype index j to a dual representation: image representation... and concept representation.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

This design mirrors a classical notion of semantics from logic (Tarski, 1944), where meaning is defined by an explicit correspondence between symbols and elements of a domain.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 1 internal anchor

[1]

and Giunchiglia, E

Andolfi, L. and Giunchiglia, E. Right for the right reasons: Avoiding reasoning shortcuts via prototypical neurosymbolic ai. arXiv preprint arXiv:2510.25497, 2025

work page arXiv 2025
[2]

E., Magister, L

Barbiero, P., Ciravegna, G., Giannini, F., Zarlenga, M. E., Magister, L. C., Tonda, A., Lio', P., Precioso, F., Jamnik, M., and Marra, G. Interpretable neural-symbolic concept reasoning. In ICML, 2023

work page 2023
[3]

P., Matthey, L., Watters, N., Kabra, R., Higgins, I., Botvinick, M., and Lerchner, A

Burgess, C. P., Matthey, L., Watters, N., Kabra, R., Higgins, I., Botvinick, M., and Lerchner, A. MONet: Unsupervised Scene Decomposition and Representation . CoRR, 2019

work page 2019
[4]

Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., and Su, J. K. This looks like that: deep learning for interpretable image recognition. Advances in neural information processing systems, 32, 2019

work page 2019
[5]

Neurosymbolic Object-Centric Learning with Distant Supervision

Colamonaco, S., Debot, D., and Marra, G. Neurosymbolic object-centric learning with distant supervision. arXiv preprint arXiv:2506.16129, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

and Marra, G

Debot, D. and Marra, G. Quantifying the accuracy-interpretability trade-off in concept-based sidechannel models. arXiv preprint arXiv:2510.05670, 2025

work page arXiv 2025
[7]

Interpretable concept-based memory reasoning

Debot, D., Barbiero, P., Giannini, F., Ciravegna, G., Diligenti, M., and Marra, G. Interpretable concept-based memory reasoning. Advances of neural information processing systems 37, NeurIPS 2024, 2024

work page 2024
[8]

Causal concept graph models: Beyond causal opacity in deep learning.arXiv:2405.16507, 2024

Dominici, G., Barbiero, P., Zarlenga, M. E., Termine, A., Gjoreski, M., Marra, G., and Langheinrich, M. Causal concept graph models: Beyond causal opacity in deep learning. arXiv preprint arXiv:2405.16507, 2024

work page arXiv 2024
[9]

J., and Chen, C

Donnelly, J., Barnett, A. J., and Chen, C. Deformable protopnet: An interpretable image classifier using deformable prototypes. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 10265--10275, 2022

work page 2022
[10]

Concept embedding models: Beyond the accuracy-explainability trade-off

Espinosa Zarlenga, M., Barbiero, P., Ciravegna, G., Marra, G., Giannini, F., Diligenti, M., Shams, Z., Precioso, F., Melacci, S., Weller, A., Lio, P., and Jamnik, M. Concept embedding models: Beyond the accuracy-explainability trade-off. Advances in Neural Information Processing Systems, 35, 2022

work page 2022
[11]

Addressing leakage in concept bottleneck models

Havasi, M., Parbhoo, S., and Doshi-Velez, F. Addressing leakage in concept bottleneck models. Advances in Neural Information Processing Systems, 35: 0 23386--23397, 2022

work page 2022
[12]

On the concept trustworthiness in concept bottleneck models

Huang, Q., Song, J., Hu, J., Zhang, H., Wang, Y., and Song, M. On the concept trustworthiness in concept bottleneck models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.\ 21161--21168, 2024

work page 2024
[13]

Locality-aware concept bottleneck model

Jeon, S., Lee, H., Kim, E., Lee, S., Zhang, B.-T., and Hwang, I. Locality-aware concept bottleneck model. arXiv preprint arXiv:2508.14562, 2025

work page arXiv 2025
[14]

C., Lo, W.-Y., et al

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., et al. Segment anything. In Proceedings of the IEEE/CVF international conference on computer vision, pp.\ 4015--4026, 2023

work page 2023
[15]

W., Nguyen, T., Tang, Y

Koh, P. W., Nguyen, T., Tang, Y. S., Mussmann, S., Pierson, E., Kim, B., and Liang, P. Concept bottleneck models. In International conference on machine learning, pp.\ 5338--5348. PMLR, 2020

work page 2020
[16]

Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions

Li, O., Liu, H., Chen, C., and Rudin, C. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018
[17]

Deep learning face attributes in the wild

Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In ICCV, 2015

work page 2015
[18]

Object-Centric Learning with Slot Attention

Locatello, F., Weissenborn, D., Unterthiner, T., Mahendran, A., Heigold, G., Uszkoreit, J., Dosovitskiy, A., and Kipf, T. Object-Centric Learning with Slot Attention . In NeurIPS, 2020

work page 2020
[19]

This looks like those: Illuminating prototypical concepts using multiple visualizations

Ma, C., Zhao, B., Chen, C., and Rudin, C. This looks like those: Illuminating prototypical concepts using multiple visualizations. Advances in Neural Information Processing Systems, 36: 0 39212--39235, 2023

work page 2023
[20]

Promises and pitfalls of black-box concept learning models

Mahinpei, A., Clark, J., Lage, I., Doshi-Velez, F., and Pan, W. Promises and pitfalls of black-box concept learning models. arXiv preprint arXiv:2106.13314, 2021

work page arXiv 2021
[21]

Manhaeve, R., Dumancic, S., Kimmig, A., Demeester, T., and Raedt, L. D. DeepProbLog: Neural Probabilistic Logic Programming . In NeurIPS, pp.\ 3753--3763, 2018

work page 2018
[22]

Glancenets: Interpretable, leak-proof concept-based models

Marconato, E., Passerini, A., and Teso, S. Glancenets: Interpretable, leak-proof concept-based models. Advances in Neural Information Processing Systems, 35: 0 21212--21227, 2022

work page 2022
[23]

A survey on knowledge editing of neural networks

Mazzia, V., Pedrani, A., Caciolai, A., Rottmann, K., and Bernardi, D. A survey on knowledge editing of neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2024

work page 2024
[24]

M., and Weng, T.-W

Oikarinen, T., Das, S., Nguyen, L. M., and Weng, T.-W. Label-free concept bottleneck models, 2023

work page 2023
[25]

and Nakamura, K

Sawada, Y. and Nakamura, K. Concept bottleneck model with additional unsupervised concepts. IEEE Access, 10: 0 41758--41765, 2022

work page 2022
[26]

Prototypical networks for few-shot learning

Snell, J., Swersky, K., and Zemel, R. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017

work page 2017
[27]

Right for the right concept: Revising neuro-symbolic concepts by interacting with their explanations

Stammer, W., Schramowski, P., and Kersting, K. Right for the right concept: Revising neuro-symbolic concepts by interacting with their explanations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 3619--3629, 2021

work page 2021
[28]

Object centric concept bottlenecks

Steinmann, D., Stammer, W., W \"u st, A., and Kersting, K. Object centric concept bottlenecks. arXiv preprint arXiv:2505.24492, 2025

work page arXiv 2025
[29]

The semantic conception of truth: and the foundations of semantics

Tarski, A. The semantic conception of truth: and the foundations of semantics. Philosophy and phenomenological research, 4 0 (3): 0 341--376, 1944

work page 1944
[30]

Stochastic concept bottleneck models

Vandenhirtz, M., Laguna, S., Marcinkevi c s, R., and Vogt, J. Stochastic concept bottleneck models. Advances in Neural Information Processing Systems, 37: 0 51787--51810, 2024

work page 2024
[31]

Post-hoc concept bottleneck models

Yuksekgonul, M., Wang, M., and Zou, J. Post-hoc concept bottleneck models. In ICLR 2022 Workshop on PAIR 2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data , 2022. URL https://openreview.net/forum?id=HAMeOIRD_g9

work page 2022

[1] [1]

and Giunchiglia, E

Andolfi, L. and Giunchiglia, E. Right for the right reasons: Avoiding reasoning shortcuts via prototypical neurosymbolic ai. arXiv preprint arXiv:2510.25497, 2025

work page arXiv 2025

[2] [2]

E., Magister, L

Barbiero, P., Ciravegna, G., Giannini, F., Zarlenga, M. E., Magister, L. C., Tonda, A., Lio', P., Precioso, F., Jamnik, M., and Marra, G. Interpretable neural-symbolic concept reasoning. In ICML, 2023

work page 2023

[3] [3]

P., Matthey, L., Watters, N., Kabra, R., Higgins, I., Botvinick, M., and Lerchner, A

Burgess, C. P., Matthey, L., Watters, N., Kabra, R., Higgins, I., Botvinick, M., and Lerchner, A. MONet: Unsupervised Scene Decomposition and Representation . CoRR, 2019

work page 2019

[4] [4]

Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., and Su, J. K. This looks like that: deep learning for interpretable image recognition. Advances in neural information processing systems, 32, 2019

work page 2019

[5] [5]

Neurosymbolic Object-Centric Learning with Distant Supervision

Colamonaco, S., Debot, D., and Marra, G. Neurosymbolic object-centric learning with distant supervision. arXiv preprint arXiv:2506.16129, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[6] [6]

and Marra, G

Debot, D. and Marra, G. Quantifying the accuracy-interpretability trade-off in concept-based sidechannel models. arXiv preprint arXiv:2510.05670, 2025

work page arXiv 2025

[7] [7]

Interpretable concept-based memory reasoning

Debot, D., Barbiero, P., Giannini, F., Ciravegna, G., Diligenti, M., and Marra, G. Interpretable concept-based memory reasoning. Advances of neural information processing systems 37, NeurIPS 2024, 2024

work page 2024

[8] [8]

Causal concept graph models: Beyond causal opacity in deep learning.arXiv:2405.16507, 2024

Dominici, G., Barbiero, P., Zarlenga, M. E., Termine, A., Gjoreski, M., Marra, G., and Langheinrich, M. Causal concept graph models: Beyond causal opacity in deep learning. arXiv preprint arXiv:2405.16507, 2024

work page arXiv 2024

[9] [9]

J., and Chen, C

Donnelly, J., Barnett, A. J., and Chen, C. Deformable protopnet: An interpretable image classifier using deformable prototypes. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 10265--10275, 2022

work page 2022

[10] [10]

Concept embedding models: Beyond the accuracy-explainability trade-off

Espinosa Zarlenga, M., Barbiero, P., Ciravegna, G., Marra, G., Giannini, F., Diligenti, M., Shams, Z., Precioso, F., Melacci, S., Weller, A., Lio, P., and Jamnik, M. Concept embedding models: Beyond the accuracy-explainability trade-off. Advances in Neural Information Processing Systems, 35, 2022

work page 2022

[11] [11]

Addressing leakage in concept bottleneck models

Havasi, M., Parbhoo, S., and Doshi-Velez, F. Addressing leakage in concept bottleneck models. Advances in Neural Information Processing Systems, 35: 0 23386--23397, 2022

work page 2022

[12] [12]

On the concept trustworthiness in concept bottleneck models

Huang, Q., Song, J., Hu, J., Zhang, H., Wang, Y., and Song, M. On the concept trustworthiness in concept bottleneck models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.\ 21161--21168, 2024

work page 2024

[13] [13]

Locality-aware concept bottleneck model

Jeon, S., Lee, H., Kim, E., Lee, S., Zhang, B.-T., and Hwang, I. Locality-aware concept bottleneck model. arXiv preprint arXiv:2508.14562, 2025

work page arXiv 2025

[14] [14]

C., Lo, W.-Y., et al

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., et al. Segment anything. In Proceedings of the IEEE/CVF international conference on computer vision, pp.\ 4015--4026, 2023

work page 2023

[15] [15]

W., Nguyen, T., Tang, Y

Koh, P. W., Nguyen, T., Tang, Y. S., Mussmann, S., Pierson, E., Kim, B., and Liang, P. Concept bottleneck models. In International conference on machine learning, pp.\ 5338--5348. PMLR, 2020

work page 2020

[16] [16]

Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions

Li, O., Liu, H., Chen, C., and Rudin, C. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018

[17] [17]

Deep learning face attributes in the wild

Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In ICCV, 2015

work page 2015

[18] [18]

Object-Centric Learning with Slot Attention

Locatello, F., Weissenborn, D., Unterthiner, T., Mahendran, A., Heigold, G., Uszkoreit, J., Dosovitskiy, A., and Kipf, T. Object-Centric Learning with Slot Attention . In NeurIPS, 2020

work page 2020

[19] [19]

This looks like those: Illuminating prototypical concepts using multiple visualizations

Ma, C., Zhao, B., Chen, C., and Rudin, C. This looks like those: Illuminating prototypical concepts using multiple visualizations. Advances in Neural Information Processing Systems, 36: 0 39212--39235, 2023

work page 2023

[20] [20]

Promises and pitfalls of black-box concept learning models

Mahinpei, A., Clark, J., Lage, I., Doshi-Velez, F., and Pan, W. Promises and pitfalls of black-box concept learning models. arXiv preprint arXiv:2106.13314, 2021

work page arXiv 2021

[21] [21]

Manhaeve, R., Dumancic, S., Kimmig, A., Demeester, T., and Raedt, L. D. DeepProbLog: Neural Probabilistic Logic Programming . In NeurIPS, pp.\ 3753--3763, 2018

work page 2018

[22] [22]

Glancenets: Interpretable, leak-proof concept-based models

Marconato, E., Passerini, A., and Teso, S. Glancenets: Interpretable, leak-proof concept-based models. Advances in Neural Information Processing Systems, 35: 0 21212--21227, 2022

work page 2022

[23] [23]

A survey on knowledge editing of neural networks

Mazzia, V., Pedrani, A., Caciolai, A., Rottmann, K., and Bernardi, D. A survey on knowledge editing of neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2024

work page 2024

[24] [24]

M., and Weng, T.-W

Oikarinen, T., Das, S., Nguyen, L. M., and Weng, T.-W. Label-free concept bottleneck models, 2023

work page 2023

[25] [25]

and Nakamura, K

Sawada, Y. and Nakamura, K. Concept bottleneck model with additional unsupervised concepts. IEEE Access, 10: 0 41758--41765, 2022

work page 2022

[26] [26]

Prototypical networks for few-shot learning

Snell, J., Swersky, K., and Zemel, R. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017

work page 2017

[27] [27]

Right for the right concept: Revising neuro-symbolic concepts by interacting with their explanations

Stammer, W., Schramowski, P., and Kersting, K. Right for the right concept: Revising neuro-symbolic concepts by interacting with their explanations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 3619--3629, 2021

work page 2021

[28] [28]

Object centric concept bottlenecks

Steinmann, D., Stammer, W., W \"u st, A., and Kersting, K. Object centric concept bottlenecks. arXiv preprint arXiv:2505.24492, 2025

work page arXiv 2025

[29] [29]

The semantic conception of truth: and the foundations of semantics

Tarski, A. The semantic conception of truth: and the foundations of semantics. Philosophy and phenomenological research, 4 0 (3): 0 341--376, 1944

work page 1944

[30] [30]

Stochastic concept bottleneck models

Vandenhirtz, M., Laguna, S., Marcinkevi c s, R., and Vogt, J. Stochastic concept bottleneck models. Advances in Neural Information Processing Systems, 37: 0 51787--51810, 2024

work page 2024

[31] [31]

Post-hoc concept bottleneck models

Yuksekgonul, M., Wang, M., and Zou, J. Post-hoc concept bottleneck models. In ICLR 2022 Workshop on PAIR 2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data , 2022. URL https://openreview.net/forum?id=HAMeOIRD_g9

work page 2022