pith. machine review for the scientific record. sign in

arxiv: 2605.06440 · v2 · submitted 2026-05-07 · 💻 cs.LG · cs.CV

Recognition: 2 theorem links

· Lean Theorem

Hyperbolic Concept Bottleneck Models

Authors on Pith no claims yet

Pith reviewed 2026-05-13 06:28 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords concept bottleneck modelshyperbolic geometrymodel interpretabilityhierarchical conceptssemantic hierarchiespost-hoc explanationsmachine learning
0
0 comments X

The pith

Embedding concepts in hyperbolic space lets bottleneck models match Euclidean performance with far less data while respecting concept hierarchies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Concept bottleneck models make neural networks interpretable by forcing decisions through human-understandable concepts. Most current versions place these concepts in flat Euclidean space and treat them as independent dimensions. This paper instead embeds concepts in hyperbolic space so that one concept can contain another through geometric containment inside entailment cones. The distance from an activation to the cone boundary then serves as the activation strength, producing sparse and hierarchy-respecting signals without extra supervision or learned modules. An adaptive scaling rule further lets a user correction at one concept level propagate consistently to related concepts higher or lower in the tree. If the approach works, interpretable models could reach high accuracy in the low-data regimes needed for human oversight and remain more stable when inputs are corrupted.

Core claim

Hyperbolic Concept Bottleneck Models reformulate concept activation as asymmetric geometric containment in hyperbolic space. The margin of inclusion inside a concept's entailment cone supplies a sparse, hierarchy-aware activation signal at test time without additional supervision or learned modules. An adaptive scaling law then converts user interventions into hierarchically faithful updates that propagate coherently through the concept tree. Empirically the resulting models match the accuracy of post-hoc Euclidean concept models trained on twenty times more data while showing stronger hierarchical consistency and greater robustness to input corruptions.

What carries the argument

Entailment cones in hyperbolic space whose inclusion margin supplies the concept activation value.

If this is right

  • HypCBM reaches accuracy comparable to Euclidean models trained on twenty times more concept-labeled data in the sparse regimes needed for human interpretability.
  • Concept activations exhibit stronger hierarchical consistency across levels of the concept tree.
  • The models show improved robustness to input corruptions relative to flat Euclidean embeddings.
  • User corrections applied at one concept level propagate coherently to related concepts via the adaptive scaling law.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The geometric containment signal could be tested on taxonomies deeper than those used in the original experiments, such as fine-grained biological or medical hierarchies.
  • The same cone-margin idea might be tried in other post-hoc explanation methods that currently assume flat concept spaces.
  • If the scaling law generalizes, it would allow concept-level editing interfaces that automatically maintain logical consistency across large concept graphs.

Load-bearing premise

The margin of inclusion inside a concept's entailment cone produces sparse and hierarchy-aware activations without extra supervision or learned modules.

What would settle it

Running HypCBM on a dataset whose concept hierarchy is independently verified and checking whether the activation sparsity and hierarchical consistency metrics remain above those of Euclidean baselines when the amount of concept-labeled data is increased.

Figures

Figures reproduced from arXiv: 2605.06440 by Daniel Uyterlinde, Pascal Mettes, Swasti Shreya Mishra.

Figure 1
Figure 1. Figure 1: Method overview. (1) Generated concepts and the target image are encoded with a hyperbolic VLM onto the hyperbolic manifold, where they are hierarchically organized. (2) The activation of a concept is measured as the margin of inclusion of the image embedding in the entailment cone of the concept. (3) An intervention on a parent concept (cparent) is propagated to all entailed children (cchild). training in… view at source ↗
Figure 2
Figure 2. Figure 2: Geometry and Scaling of Hyperbolic Entailment. (a) An image z activates concept ci if the exterior angle ϕ(z, ci) falls within the scaled cone half-aperture ηω(ci). Here, η denotes the strictness parameter ηimg (Eq. 7). (b) Empirical scaling law derived from WordNet. The entailment strictness ηtext required to geometrically capture true descendants scales linearly with the parent concept’s norm (r = 0.729)… view at source ↗
Figure 3
Figure 3. Figure 3: We validate the interpretability of HypCBM through view at source ↗
Figure 4
Figure 4. Figure 4: a shows that HypCBM exhibits a steeper confidence decay than the Euclidean baseline, indicat￾ing stronger responsiveness to corrective edits. Under manual intervention, HypCBM successfully flips incorrect predictions to the correct class for 19% more samples than LF-CBM, confirming the practical utility of hierarchically propagated interventions. Random interventions cause negligible confidence changes (Ap… view at source ↗
Figure 5
Figure 5. Figure 5: Complete Intervention Analysis. We compare the probability response of HypCBM (Red) and LF-CBM (Blue) across three distinct strategies. Left (Manual): We intervene on the concept with the highest contribution that is ground-truth absent (a false positive). HypCBM displays the sharpest confidence decay, indicating it is highly responsive to valid human corrections. Center (Top-Contributing): We intervene on… view at source ↗
Figure 6
Figure 6. Figure 6: Data efficiency on CIFAR100. HypCBM outperforms LF-CBM (CLIP-20M) for any data budget view at source ↗
Figure 7
Figure 7. Figure 7: Semantic stability across five severities of input corruption. The plot shows the average Jaccard similarity across all 15 corruption types as a function of severity. The shaded regions represent one standard deviation. The baseline (LF-CBM) shows a rapid decrease in concept stability as severity increases, while HypCBM maintains high stability (J > 0.7) even at Severity 5. The stability gap remains consis… view at source ↗
Figure 8
Figure 8. Figure 8: Ablation on Hyperbolic Norm Filtering, SUN397. τ = 0.27. substantial gains. The Euclidean LF-CBM, in contrast, relies on angular similarity; pruning general concepts does not lead to a comparable reduction in trivial activations. B Concept Bank As the concept banks for CIFAR100, ImageNet and CUB-200 were already created and made public by Oikarinen et al. [32], we only apply the concept bank creation proce… view at source ↗
Figure 9
Figure 9. Figure 9: LLM Prompts for Concept Generation. We use three distinct prompt templates (Important Features, Superclass, Context) to generate diverse visual attributes. These few-shot examples are fed to GPT-3 to produce the raw concept bank. After this initial set of candidate concepts is created, a few processing steps are applied. First, concepts that are too long (longer than 30 tokens) are removed. Then, we calcul… view at source ↗
Figure 10
Figure 10. Figure 10: Image-text entailment distributions. The plot shows the entailment ratio view at source ↗
Figure 11
Figure 11. Figure 11: Accuracy vs. number of active concepts, SUN397. We sweep ηimg on a validation set to determine the optimal value on datasets where the distribution shift between proxy class labels and the concept bank is large. C.2 Finding the Optimal Intra-Modal ηtext Experimental Setup. Due to the exponential expansion of volume in hyperbolic space, the aperture required to capture a semantic subtree varies drastically… view at source ↗
Figure 12
Figure 12. Figure 12: Geometric Properties and Calibration. (a) We calibrate the cone scaling factor K to ensure that the distribution of concept apertures is well-posed (i.e. all arguments to arcsin are smaller than 1), avoiding numerical saturation limits (dashed line). For the default value of K = 0.1, we observe that all possible text embeddings are clipped to 1, leading to a constant half-aperture of ω(ci) = 1 2 π, ∀i. (b… view at source ↗
Figure 13
Figure 13. Figure 13: Global Explanations. We visualize the top contributing concepts (weight × activation) for the classes ’pantry’ and ’ocean’, along with two sample images from SUN397. More examples comparing LF-CBM and HypCBM are in the supplementary material. 0.29 0.26 0.24 0.23 GT: Locker room Pred: Server room X 0.73 X 0.18 0.37 0.33 electrical equipment self-service option several access point long, narrow space floori… view at source ↗
Figure 14
Figure 14. Figure 14: Intervention Propagation. An example that shows our intervention propagation mech￾anism on an image of a locker room that is misclassified as ’server room’. When intervening on ’electrical equipment’, HypCBM automatically intervenes on entailed children ’circuit breaker’ and ’technical equipment’ too. Without this propagation, the prediction is still wrong, whereas with propagation the prediction flips to… view at source ↗
read the original abstract

Concept Bottleneck Models (CBMs) have become a popular approach to enable interpretability in neural networks by constraining classifier inputs to a set of human-understandable concepts. While effective, current models embed concepts in flat Euclidean space, treating them as independent, orthogonal dimensions. Concepts, however, are highly structured and organized in semantic hierarchies. To resolve this mismatch, we propose Hyperbolic Concept Bottleneck Models (HypCBM), a post-hoc framework that grounds the bottleneck in this structure by reformulating concept activation as asymmetric geometric containment in hyperbolic space. Rather than treating entailment cones as a pre-training penalty, we show they encode a natural test-time activation signal: the margin of inclusion within a concept's entailment cone yields sparse, hierarchy-aware activations without any additional supervision or learned modules. We further introduce an adaptive scaling law for hierarchically faithful interventions, propagating user corrections coherently through the concept tree. Empirically, HypCBM rivals post-hoc Euclidean models trained on 20$\times$ more data in sparse regimes required for human interpretability, with stronger hierarchical consistency and improved robustness to input corruptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes Hyperbolic Concept Bottleneck Models (HypCBM), a post-hoc framework that embeds concepts in hyperbolic space and reformulates activations as the margin of inclusion within entailment cones. This is claimed to produce sparse, hierarchy-aware signals without additional supervision or learned modules. An adaptive scaling law is introduced for propagating user interventions coherently through the concept tree. Empirically, HypCBM is said to rival post-hoc Euclidean CBMs trained on 20× more data in sparse regimes, while showing stronger hierarchical consistency and robustness to input corruptions.

Significance. If the no-additional-supervision property and empirical gains hold, the work would meaningfully advance interpretable ML by leveraging hyperbolic geometry to capture semantic hierarchies in CBMs, potentially lowering data requirements and enhancing robustness in human-interpretable settings. The post-hoc framing and geometric activation signal represent clear strengths if the derivations are parameter-light and reproducible.

major comments (3)
  1. [Abstract and §3] Abstract and §3 (method): the claim that entailment-cone margins yield activations 'without any additional supervision or learned modules' is load-bearing yet unsupported by the given description; constructing the hyperbolic embedding of the concept taxonomy appears to presuppose a hierarchy that may be derived from the same labeled data used in standard CBMs, risking circularity with the 'no additional supervision' assertion.
  2. [§4] §4 (adaptive scaling): the adaptive scaling law for hierarchically faithful interventions is introduced without an explicit equation or proof that it introduces no new learned parameters beyond the single 'adaptive scaling parameter' listed in the axiom ledger; this must be shown to confirm the parameter-free character of the intervention mechanism.
  3. [Experiments] Experiments section: the claim of rivaling Euclidean models trained on 20× more data lacks reported details on exact datasets, concept counts, sparsity regimes, baseline implementations, and statistical tests; without these, the performance, hierarchical consistency, and robustness advantages cannot be verified as load-bearing results.
minor comments (2)
  1. [Abstract] Abstract: the metric for 'hierarchical consistency' is not defined, making the comparative claim difficult to interpret.
  2. [Notation] Notation: the precise definition of the entailment-cone margin (e.g., how it is computed from hyperbolic coordinates) should be stated early to distinguish it from standard hyperbolic distances.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which has helped us identify areas for clarification in the manuscript. We address each major comment below and will revise the paper accordingly to strengthen the presentation of our contributions.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (method): the claim that entailment-cone margins yield activations 'without any additional supervision or learned modules' is load-bearing yet unsupported by the given description; constructing the hyperbolic embedding of the concept taxonomy appears to presuppose a hierarchy that may be derived from the same labeled data used in standard CBMs, risking circularity with the 'no additional supervision' assertion.

    Authors: We appreciate the referee raising this point of potential circularity. The concept taxonomy is supplied as a fixed, external input (analogous to the predefined concept set in standard CBMs) and is not derived from the task-specific labeled data. Hyperbolic embeddings are then constructed deterministically from this given hierarchy using a standard tree-embedding procedure with no trainable parameters or additional supervision. We will revise §3 to explicitly state the source of the taxonomy and the deterministic nature of the embedding step, thereby removing any ambiguity around the 'no additional supervision' claim. revision: partial

  2. Referee: [§4] §4 (adaptive scaling): the adaptive scaling law for hierarchically faithful interventions is introduced without an explicit equation or proof that it introduces no new learned parameters beyond the single 'adaptive scaling parameter' listed in the axiom ledger; this must be shown to confirm the parameter-free character of the intervention mechanism.

    Authors: We agree that §4 would benefit from greater formality. In the revision we will insert the explicit equation for the adaptive scaling law together with a short derivation demonstrating that the mechanism depends only on the single listed adaptive scaling parameter and introduces no additional learned parameters. This will confirm the parameter-light character of the intervention procedure. revision: yes

  3. Referee: [Experiments] Experiments section: the claim of rivaling Euclidean models trained on 20× more data lacks reported details on exact datasets, concept counts, sparsity regimes, baseline implementations, and statistical tests; without these, the performance, hierarchical consistency, and robustness advantages cannot be verified as load-bearing results.

    Authors: We acknowledge that the current experimental description is insufficiently detailed for independent verification. The revised manuscript will expand the Experiments section to report the precise datasets, concept counts, sparsity levels, baseline implementations (including any hyper-parameter choices), and the results of statistical significance tests (e.g., paired t-tests with p-values). These additions will allow readers to fully assess the reported performance, hierarchical consistency, and robustness gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's core step reformulates concept activations as the margin of inclusion inside hyperbolic entailment cones, presented as a direct geometric consequence rather than a fitted parameter or self-referential definition. No equations are shown that reduce by construction to inputs, no self-citation chains are load-bearing for the central claim, and the 'no additional supervision' property is asserted from the post-hoc framework itself. The derivation remains self-contained against Euclidean baselines without reducing to renamed fits or imported uniqueness theorems.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that semantic hierarchies can be faithfully represented by hyperbolic geometry and that entailment cones provide an unsupervised activation signal. One adaptive scaling parameter is introduced for interventions.

free parameters (1)
  • adaptive scaling parameter
    Used to propagate user corrections coherently through the concept tree; its exact fitting procedure is not detailed in the abstract.
axioms (1)
  • domain assumption Concepts are organized in semantic hierarchies that hyperbolic space can represent via entailment cones.
    Invoked to justify moving from Euclidean to hyperbolic embeddings.
invented entities (1)
  • entailment cone margin as activation signal no independent evidence
    purpose: To generate sparse, hierarchy-aware concept activations at test time without supervision.
    New use of geometric containment for activation; no independent evidence supplied in the abstract.

pith-pipeline@v0.9.0 · 5495 in / 1283 out tokens · 36751 ms · 2026-05-13T06:28:03.268382+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 2 internal anchors

  1. [1]

    Emergent visual-semantic hierarchies in image-text representations

    Morris Alper and Hadar Averbuch-Elor. Emergent visual-semantic hierarchies in image-text representations. InProceedings of the European Conference on Computer Vision (ECCV), 2024

  2. [2]

    Hyperbolic Image Segmentation

    Mina Ghadimi Atigh, Julian Schoep, Erman Acar, Nanne Van Noord, and Pascal Mettes. Hyperbolic Image Segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4453–4462, 2022

  3. [3]

    Relational concept bottleneck models.Advances in Neural Information Processing Systems, 37:77663–77685, 2024

    Pietro Barbiero, Francesco Giannini, Gabriele Ciravegna, Michelangelo Diligenti, and Giuseppe Marra. Relational concept bottleneck models.Advances in Neural Information Processing Systems, 37:77663–77685, 2024

  4. [4]

    Hyperbolic geometry

    James W Cannon, William J Floyd, Richard Kenyon, and Walter R Parry. Hyperbolic geometry. InFlavors of geometry, pages 59–115. Cambridge University Press, 1997

  5. [5]

    Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning, 2025

    David Debot, Pietro Barbiero, Gabriele Dominici, and Giuseppe Marra. Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning, 2025. URL https://arxiv.org/abs/2506.21102

  6. [6]

    ImageNet:

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large- scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. doi: 10.1109/CVPR.2009.5206848

  7. [7]

    Hyperbolic Image-Text Representations

    Karan Desai, Maximilian Nickel, Tanmay Rajpurohit, Justin Johnson, and Shanmukha Ramakr- ishna Vedantam. Hyperbolic Image-Text Representations. InInternational Conference on Machine Learning, pages 7694–7731. PMLR, 2023

  8. [8]

    Hierarchical image classification using entailment cone embeddings

    Ankit Dhall, Anastasia Makarova, Octavian Ganea, Dario Pavllo, Michael Greeff, and Andreas Krause. Hierarchical image classification using entailment cone embeddings. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2020

  9. [9]

    Causally Reliable Concept Bottleneck Models,

    Giovanni De Felice, Arianna Casanova Flores, Francesco De Santis, Silvia Santini, Johannes Schneider, Pietro Barbiero, and Alberto Termine. Causally Reliable Concept Bottleneck Models,

  10. [10]

    URLhttps://arxiv.org/abs/2503.04363

  11. [11]

    Hyperbolic Entailment Cones for Learning Hierarchical Embeddings

    Octavian Ganea, Gary Bécigneul, and Thomas Hofmann. Hyperbolic Entailment Cones for Learning Hierarchical Embeddings. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80, pages 1646–1655. PMLR, 2018

  12. [12]

    Benchmarking neural network robustness to common corruptions and perturbations.Proceedings of the International Conference on Learning Representations, 2019

    Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations.Proceedings of the International Conference on Learning Representations, 2019

  13. [13]

    Improving interpretation faithfulness for vision transformers

    Lijie Hu, Yixin Liu, Ninghao Liu, Mengdi Huai, Lichao Sun, and Di Wang. Improving interpretation faithfulness for vision transformers. InForty-first International Conference on Machine Learning, 2023

  14. [14]

    Seat: stable and explainable attention

    Lijie Hu, Yixin Liu, Ninghao Liu, Mengdi Huai, Lichao Sun, and Di Wang. Seat: stable and explainable attention. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 12907–12915, 2023

  15. [15]

    Open-finllms: Open multimodal large language models for financial applications.arXiv preprint arXiv:2408.11878, 2024

    Jimin Huang, Mengxi Xiao, Dong Li, Zihao Jiang, Yuzhe Yang, Yifei Zhang, Lingfei Qian, Yan Wang, Xueqing Peng, Yang Ren, et al. Open-finllms: Open multimodal large language models for financial applications.arXiv preprint arXiv:2408.11878, 2024

  16. [16]

    Argent: Adaptive hierarchical image-text representations, 2026

    Chuong Huynh, Hossein Souri, Abhinav Kumar, Vitali Petsiuk, Deen Dayal Mohan, and Suren Kumar. Argent: Adaptive hierarchical image-text representations, 2026. URL https: //arxiv.org/abs/2603.23311

  17. [17]

    Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

    Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision. InProceedings of the 38th International Conference on Machine Learning (ICML), volume 139, pages 4904–4916. PMLR, 2021. 10

  18. [18]

    Hyperbolic Image Embeddings

    Valentin Khrulkov, Leyla Mirvakhabova, Evgeniya Ustinova, Ivan Oseledets, and Victor Lempit- sky. Hyperbolic Image Embeddings. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6418–6428, 2020

  19. [19]

    Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCA V)

    Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCA V). InProceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 2668–2677. PMLR,

  20. [20]

    URLhttps://proceedings.mlr.press/v80/kim18d.html

  21. [21]

    Concept bottleneck models

    Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. InInternational conference on machine learning, pages 5338–5348. PMLR, 2020

  22. [22]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009

  23. [23]

    Inferring concept hierarchies from text corpora via hyperbolic embeddings

    Matthew Le, Stephen Roller, Laetitia Papaxanthos, Douwe Kiela, and Maximilian Nickel. Inferring concept hierarchies from text corpora via hyperbolic embeddings. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3231–3241. Association for Computational Linguistics, July 2019. doi: 10.18653/v1/P19-1313. URL htt...

  24. [24]

    LLaV A-Med: Training a large language-and-vision assistant for biomedicine in one day.Advances in Neural Information Processing Systems, 36: 28541–28564, 2023

    Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, and Jianfeng Gao. LLaV A-Med: Training a large language-and-vision assistant for biomedicine in one day.Advances in Neural Information Processing Systems, 36: 28541–28564, 2023

  25. [25]

    BLIP: Bootstrapping language- image pre-training for unified vision-language understanding and generation

    Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. BLIP: Bootstrapping language- image pre-training for unified vision-language understanding and generation. InInternational Conference on Machine Learning, pages 12888–12900. PMLR, 2022

  26. [26]

    Visual instruction tuning

    Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

  27. [27]

    Teng Long, Pascal Mettes, Heng Tao Shen, and Cees G. M. Snoek. Searching for Actions on the Hyperbole. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1138–1147, 2020. doi: 10.1109/CVPR42600.2020.00122

  28. [28]

    Lundberg and Su-In Lee

    Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InProceedings of the 31st International Conference on Neural Information Processing Sys- tems, NIPS’17, page 4768–4777, Red Hook, NY , USA, 2017. Curran Associates Inc. ISBN 9781510860964

  29. [29]

    George A. Miller. WordNet: a lexical database for English.Commun. ACM, 38(11):39–41, November 1995. ISSN 0001-0782. doi: 10.1145/219717.219748. URL https://doi.org/ 10.1145/219717.219748

  30. [30]

    The Numerical Stability of Hyperbolic Representation Learning

    Gal Mishne, Zhengchao Wan, Yusu Wang, and Sheng Yang. The Numerical Stability of Hyperbolic Representation Learning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, ...

  31. [31]

    Med-flamingo: a multimodal medical few-shot learner

    Michael Moor, Qian Huang, Shirley Wu, Michihiro Yasunaga, Yash Dalmia, Jure Leskovec, Cyril Zakka, Eduardo Pontes Reis, and Pranav Rajpurkar. Med-flamingo: a multimodal medical few-shot learner. InMachine Learning for Health (ML4H), pages 353–367. PMLR, 2023

  32. [32]

    Poincaré embeddings for learning hierarchical repre- sentations

    Maximilian Nickel and Douwe Kiela. Poincaré embeddings for learning hierarchical repre- sentations. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30, 2017. 11

  33. [33]

    Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry

    Maximilian Nickel and Douwe Kiela. Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80, pages 3779–3788. PMLR, 2018

  34. [34]

    Label-Free Concept Bottleneck Models

    Tuomas Oikarinen, Subhro Das, Lam M Nguyen, and Tsui-Wei Weng. Label-Free Concept Bottleneck Models. InInternational Conference on Learning Representations (ICLR), 2023

  35. [35]

    Compositional entailment learning for hyperbolic vision-language models.arXiv preprint arXiv:2410.06912, 2024

    Avik Pal, Max van Spengler, Guido Maria D’Amely di Melendugno, Alessandro Flaborea, Fabio Galasso, and Pascal Mettes. Compositional entailment learning for hyperbolic vision-language models.arXiv preprint arXiv:2410.06912, 2024

  36. [36]

    Panousis, Dino Ienco, and Diego Marcos

    Konstantinos P. Panousis, Dino Ienco, and Diego Marcos. Coarse-to-fine concept bottleneck models. InProceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY , USA, 2024. Curran Associates Inc. ISBN 9798331314385

  37. [37]

    Grounding multimodal large language models to the world

    Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Qixiang Ye, and Furu Wei. Grounding multimodal large language models to the world. InThe Twelfth International Conference on Learning Representations, 2024

  38. [38]

    Federico Pittino, Vesna Dimitrievska, and Rudolf Heer. Hierarchical concept bottleneck models for vision and their application to explainable fine classification and tracking.Engineering Applications of Artificial Intelligence, 118:105674, 2023

  39. [39]

    Hyperbolic Safety-Aware Vision-Language Models

    Tobia Poppi, Tejaswi Kasarla, Pascal Mettes, Lorenzo Baraldi, and Rita Cucchiara. Hyperbolic Safety-Aware Vision-Language Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

  40. [40]

    Learning Transferable Visual Models From Natural Language Supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning Transferable Visual Models From Natural Language Supervision. InProceedings of the 38th International Conference on Machine Learning (ICML), volume 139, pages 8748–8763. PMLR, 2021

  41. [41]

    Accept the modality gap: An exploration in the hyperbolic space

    Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham, and Ajanthan Thalaiyasingam. Accept the modality gap: An exploration in the hyperbolic space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27263– 27272, June 2024

  42. [42]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical Text-Conditional Image Generation with CLIP Latents, 2022. URL https://arxiv.org/ abs/2204.06125

  43. [43]

    Discover-then-name: Task- agnostic concept bottlenecks via automated concept discovery

    Sukrut Rao, Sweta Mahajan, Moritz Böhle, and Bernt Schiele. Discover-then-name: Task- agnostic concept bottlenecks via automated concept discovery. InEuropean Conference on Computer Vision, 2024

  44. [44]

    "Why Should

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "Why Should I Trust You?": Explain- ing the Predictions of Any Classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 1135–1144. Association for Computing Machinery, 2016. ISBN 9781450342322. doi: 10.1145/2939672.2939778. URL ht...

  45. [45]

    High- Resolution Image Synthesis With Latent Diffusion Models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- Resolution Image Synthesis With Latent Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022

  46. [46]

    Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane

    Rik Sarkar. Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane. In Marc van Kreveld and Bettina Speckmann, editors,Graph Drawing, pages 355–366, Berlin, Heidelberg,

  47. [47]

    ISBN 978-3-642-25878-7

    Springer Berlin Heidelberg. ISBN 978-3-642-25878-7

  48. [48]

    A closer look at the intervention procedure of concept bottleneck models

    Sungbin Shin, Yohan Jo, Sungsoo Ahn, and Namhoon Lee. A closer look at the intervention procedure of concept bottleneck models. InInternational Conference on Machine Learning, pages 31504–31520. PMLR, 2023. 12

  49. [49]

    VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance

    Divyansh Srivastava, Ge Yan, and Tsui-Wei Weng. VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 79057–79094. Curran Associates, Inc., 2024. doi: 10.52202/ 079017-2510

  50. [50]

    Learning to intervene on concept bottlenecks

    David Steinmann, Wolfgang Stammer, Felix Friedrich, and Kristian Kersting. Learning to intervene on concept bottlenecks. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024

  51. [51]

    DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

    Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Yang Wang, Zhiyong Zhao, Kun Zhan, Peng Jia, Xianpeng Lang, and Hang Zhao. DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models, 2024. URLhttps://arxiv.org/abs/2402.12289

  52. [52]

    LogicCBMs: Logic-Enhanced Concept-Based Learning, 2025

    Deepika SN Vemuri, Gautham Bellamkonda, Aditya Pola, and Vineeth N Balasubramanian. LogicCBMs: Logic-Enhanced Concept-Based Learning, 2025. URL https://arxiv.org/ abs/2512.07383

  53. [53]

    C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. CUB200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011

  54. [54]

    Ehinger, Aude Oliva, and Antonio Torralba

    Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3485–3492, 2010. doi: 10. 1109/CVPR.2010.5539970

  55. [55]

    Nguyen, and Tengfei Ma

    Haotian Xu, Tsui-Wei Weng, Lam M. Nguyen, and Tengfei Ma. Graph concept bottleneck models.Transactions on Machine Learning Research, 2026. ISSN 2835-8856. URL https: //openreview.net/forum?id=a4azUYjRhU

  56. [56]

    Human-ai interactions in the communication era: Autophagy makes large models achieving local optima.CoRR, 2024

    Shu Yang, Lijie Hu, Lu Yu, Muhammad Asif Ali, and Di Wang. Human-ai interactions in the communication era: Autophagy makes large models achieving local optima.CoRR, 2024

  57. [57]

    Language in a bottle: Language model guided concept bottlenecks for interpretable image classification

    Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, and Mark Yatskar. Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19187–19197, 2023

  58. [58]

    Post-hoc concept bottleneck models

    Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc concept bottleneck models. In The Eleventh International Conference on Learning Representations, 2023. URL https: //openreview.net/forum?id=nA5AZ8CEyow

  59. [59]

    Concept embedding models: Beyond the accuracy-explainability trade-off.Advances in Neural Information Processing Systems, 35, 2022

    Mateo Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Zohreh Shams, Frederic Precioso, Stefano Melacci, Adrian Weller, Pietro Lio, and Mateja Jamnik. Concept embedding models: Beyond the accuracy-explainability trade-off.Advances in Neural Information Processing Systems, 35, 2022. 13 A Additional ...

  60. [60]

    Important Features "List the most important features for recognizing something as a ’goldfish’: - bright orange color - a small, round body - a long, flowing tail - a small mouth - orange fins List the most important features for recognizing something as a ’beer glass’: - a tall, cylindrical shape - clear or translucent color - opening at the top - a stur...

  61. [61]

    Give superclasses for the word ’tench’: - fish - vertebrate - animal Give superclasses for the word ’beer glass’: - glass - container - object Give superclasses for the word ’{}’:

    Superclass "Give superclasses for the word ’tench’: - fish - vertebrate - animal Give superclasses for the word ’beer glass’: - glass - container - object Give superclasses for the word ’{}’:"

  62. [62]

    Persian cat

    Context (Around) "List the things most commonly seen around a ’tench’: - a pond - fish - a net - a rod - a reel - a hook - bait List the things most commonly seen around a ’beer glass’: - beer - a bar - a coaster - a napkin - a straw - a lime - a person List the things most commonly seen around a ’{}’:" Figure 9:LLM Prompts for Concept Generation.We use t...