arxiv: 2605.06440 · v2 · submitted 2026-05-07 · 💻 cs.LG · cs.CV

Recognition: 2 theorem links

· Lean Theorem

Hyperbolic Concept Bottleneck Models

Daniel Uyterlinde , Swasti Shreya Mishra , Pascal Mettes

Authors on Pith no claims yet

Pith reviewed 2026-05-13 06:28 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords concept bottleneck modelshyperbolic geometrymodel interpretabilityhierarchical conceptssemantic hierarchiespost-hoc explanationsmachine learning

0 comments

The pith

Embedding concepts in hyperbolic space lets bottleneck models match Euclidean performance with far less data while respecting concept hierarchies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Concept bottleneck models make neural networks interpretable by forcing decisions through human-understandable concepts. Most current versions place these concepts in flat Euclidean space and treat them as independent dimensions. This paper instead embeds concepts in hyperbolic space so that one concept can contain another through geometric containment inside entailment cones. The distance from an activation to the cone boundary then serves as the activation strength, producing sparse and hierarchy-respecting signals without extra supervision or learned modules. An adaptive scaling rule further lets a user correction at one concept level propagate consistently to related concepts higher or lower in the tree. If the approach works, interpretable models could reach high accuracy in the low-data regimes needed for human oversight and remain more stable when inputs are corrupted.

Core claim

Hyperbolic Concept Bottleneck Models reformulate concept activation as asymmetric geometric containment in hyperbolic space. The margin of inclusion inside a concept's entailment cone supplies a sparse, hierarchy-aware activation signal at test time without additional supervision or learned modules. An adaptive scaling law then converts user interventions into hierarchically faithful updates that propagate coherently through the concept tree. Empirically the resulting models match the accuracy of post-hoc Euclidean concept models trained on twenty times more data while showing stronger hierarchical consistency and greater robustness to input corruptions.

What carries the argument

Entailment cones in hyperbolic space whose inclusion margin supplies the concept activation value.

If this is right

HypCBM reaches accuracy comparable to Euclidean models trained on twenty times more concept-labeled data in the sparse regimes needed for human interpretability.
Concept activations exhibit stronger hierarchical consistency across levels of the concept tree.
The models show improved robustness to input corruptions relative to flat Euclidean embeddings.
User corrections applied at one concept level propagate coherently to related concepts via the adaptive scaling law.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The geometric containment signal could be tested on taxonomies deeper than those used in the original experiments, such as fine-grained biological or medical hierarchies.
The same cone-margin idea might be tried in other post-hoc explanation methods that currently assume flat concept spaces.
If the scaling law generalizes, it would allow concept-level editing interfaces that automatically maintain logical consistency across large concept graphs.

Load-bearing premise

The margin of inclusion inside a concept's entailment cone produces sparse and hierarchy-aware activations without extra supervision or learned modules.

What would settle it

Running HypCBM on a dataset whose concept hierarchy is independently verified and checking whether the activation sparsity and hierarchical consistency metrics remain above those of Euclidean baselines when the amount of concept-labeled data is increased.

Figures

Figures reproduced from arXiv: 2605.06440 by Daniel Uyterlinde, Pascal Mettes, Swasti Shreya Mishra.

**Figure 1.** Figure 1: Method overview. (1) Generated concepts and the target image are encoded with a hyperbolic VLM onto the hyperbolic manifold, where they are hierarchically organized. (2) The activation of a concept is measured as the margin of inclusion of the image embedding in the entailment cone of the concept. (3) An intervention on a parent concept (cparent) is propagated to all entailed children (cchild). training in… view at source ↗

**Figure 2.** Figure 2: Geometry and Scaling of Hyperbolic Entailment. (a) An image z activates concept ci if the exterior angle ϕ(z, ci) falls within the scaled cone half-aperture ηω(ci). Here, η denotes the strictness parameter ηimg (Eq. 7). (b) Empirical scaling law derived from WordNet. The entailment strictness ηtext required to geometrically capture true descendants scales linearly with the parent concept’s norm (r = 0.729)… view at source ↗

**Figure 3.** Figure 3: We validate the interpretability of HypCBM through view at source ↗

**Figure 4.** Figure 4: a shows that HypCBM exhibits a steeper confidence decay than the Euclidean baseline, indicating stronger responsiveness to corrective edits. Under manual intervention, HypCBM successfully flips incorrect predictions to the correct class for 19% more samples than LF-CBM, confirming the practical utility of hierarchically propagated interventions. Random interventions cause negligible confidence changes (Ap… view at source ↗

**Figure 5.** Figure 5: Complete Intervention Analysis. We compare the probability response of HypCBM (Red) and LF-CBM (Blue) across three distinct strategies. Left (Manual): We intervene on the concept with the highest contribution that is ground-truth absent (a false positive). HypCBM displays the sharpest confidence decay, indicating it is highly responsive to valid human corrections. Center (Top-Contributing): We intervene on… view at source ↗

**Figure 6.** Figure 6: Data efficiency on CIFAR100. HypCBM outperforms LF-CBM (CLIP-20M) for any data budget view at source ↗

**Figure 7.** Figure 7: Semantic stability across five severities of input corruption. The plot shows the average Jaccard similarity across all 15 corruption types as a function of severity. The shaded regions represent one standard deviation. The baseline (LF-CBM) shows a rapid decrease in concept stability as severity increases, while HypCBM maintains high stability (J > 0.7) even at Severity 5. The stability gap remains consis… view at source ↗

**Figure 8.** Figure 8: Ablation on Hyperbolic Norm Filtering, SUN397. τ = 0.27. substantial gains. The Euclidean LF-CBM, in contrast, relies on angular similarity; pruning general concepts does not lead to a comparable reduction in trivial activations. B Concept Bank As the concept banks for CIFAR100, ImageNet and CUB-200 were already created and made public by Oikarinen et al. [32], we only apply the concept bank creation proce… view at source ↗

**Figure 9.** Figure 9: LLM Prompts for Concept Generation. We use three distinct prompt templates (Important Features, Superclass, Context) to generate diverse visual attributes. These few-shot examples are fed to GPT-3 to produce the raw concept bank. After this initial set of candidate concepts is created, a few processing steps are applied. First, concepts that are too long (longer than 30 tokens) are removed. Then, we calcul… view at source ↗

**Figure 10.** Figure 10: Image-text entailment distributions. The plot shows the entailment ratio view at source ↗

**Figure 11.** Figure 11: Accuracy vs. number of active concepts, SUN397. We sweep ηimg on a validation set to determine the optimal value on datasets where the distribution shift between proxy class labels and the concept bank is large. C.2 Finding the Optimal Intra-Modal ηtext Experimental Setup. Due to the exponential expansion of volume in hyperbolic space, the aperture required to capture a semantic subtree varies drastically… view at source ↗

**Figure 12.** Figure 12: Geometric Properties and Calibration. (a) We calibrate the cone scaling factor K to ensure that the distribution of concept apertures is well-posed (i.e. all arguments to arcsin are smaller than 1), avoiding numerical saturation limits (dashed line). For the default value of K = 0.1, we observe that all possible text embeddings are clipped to 1, leading to a constant half-aperture of ω(ci) = 1 2 π, ∀i. (b… view at source ↗

**Figure 13.** Figure 13: Global Explanations. We visualize the top contributing concepts (weight × activation) for the classes ’pantry’ and ’ocean’, along with two sample images from SUN397. More examples comparing LF-CBM and HypCBM are in the supplementary material. 0.29 0.26 0.24 0.23 GT: Locker room Pred: Server room X 0.73 X 0.18 0.37 0.33 electrical equipment self-service option several access point long, narrow space floori… view at source ↗

**Figure 14.** Figure 14: Intervention Propagation. An example that shows our intervention propagation mechanism on an image of a locker room that is misclassified as ’server room’. When intervening on ’electrical equipment’, HypCBM automatically intervenes on entailed children ’circuit breaker’ and ’technical equipment’ too. Without this propagation, the prediction is still wrong, whereas with propagation the prediction flips to… view at source ↗

read the original abstract

Concept Bottleneck Models (CBMs) have become a popular approach to enable interpretability in neural networks by constraining classifier inputs to a set of human-understandable concepts. While effective, current models embed concepts in flat Euclidean space, treating them as independent, orthogonal dimensions. Concepts, however, are highly structured and organized in semantic hierarchies. To resolve this mismatch, we propose Hyperbolic Concept Bottleneck Models (HypCBM), a post-hoc framework that grounds the bottleneck in this structure by reformulating concept activation as asymmetric geometric containment in hyperbolic space. Rather than treating entailment cones as a pre-training penalty, we show they encode a natural test-time activation signal: the margin of inclusion within a concept's entailment cone yields sparse, hierarchy-aware activations without any additional supervision or learned modules. We further introduce an adaptive scaling law for hierarchically faithful interventions, propagating user corrections coherently through the concept tree. Empirically, HypCBM rivals post-hoc Euclidean models trained on 20$\times$ more data in sparse regimes required for human interpretability, with stronger hierarchical consistency and improved robustness to input corruptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HypCBM moves CBM activations into hyperbolic entailment cones for post-hoc hierarchy awareness, but the no-extra-supervision claim depends on how the concept embeddings are obtained.

read the letter

The core move is to replace flat Euclidean concept activations with margins inside hyperbolic entailment cones. This gives sparse signals that automatically respect semantic hierarchies and comes with an adaptive scaling rule for interventions. The paper positions this as a drop-in post-hoc step that avoids retraining or new modules, and the abstract reports it matches Euclidean baselines trained on twenty times more data while improving hierarchical consistency and corruption robustness.

Referee Report

3 major / 2 minor

Summary. The paper proposes Hyperbolic Concept Bottleneck Models (HypCBM), a post-hoc framework that embeds concepts in hyperbolic space and reformulates activations as the margin of inclusion within entailment cones. This is claimed to produce sparse, hierarchy-aware signals without additional supervision or learned modules. An adaptive scaling law is introduced for propagating user interventions coherently through the concept tree. Empirically, HypCBM is said to rival post-hoc Euclidean CBMs trained on 20× more data in sparse regimes, while showing stronger hierarchical consistency and robustness to input corruptions.

Significance. If the no-additional-supervision property and empirical gains hold, the work would meaningfully advance interpretable ML by leveraging hyperbolic geometry to capture semantic hierarchies in CBMs, potentially lowering data requirements and enhancing robustness in human-interpretable settings. The post-hoc framing and geometric activation signal represent clear strengths if the derivations are parameter-light and reproducible.

major comments (3)

[Abstract and §3] Abstract and §3 (method): the claim that entailment-cone margins yield activations 'without any additional supervision or learned modules' is load-bearing yet unsupported by the given description; constructing the hyperbolic embedding of the concept taxonomy appears to presuppose a hierarchy that may be derived from the same labeled data used in standard CBMs, risking circularity with the 'no additional supervision' assertion.
[§4] §4 (adaptive scaling): the adaptive scaling law for hierarchically faithful interventions is introduced without an explicit equation or proof that it introduces no new learned parameters beyond the single 'adaptive scaling parameter' listed in the axiom ledger; this must be shown to confirm the parameter-free character of the intervention mechanism.
[Experiments] Experiments section: the claim of rivaling Euclidean models trained on 20× more data lacks reported details on exact datasets, concept counts, sparsity regimes, baseline implementations, and statistical tests; without these, the performance, hierarchical consistency, and robustness advantages cannot be verified as load-bearing results.

minor comments (2)

[Abstract] Abstract: the metric for 'hierarchical consistency' is not defined, making the comparative claim difficult to interpret.
[Notation] Notation: the precise definition of the entailment-cone margin (e.g., how it is computed from hyperbolic coordinates) should be stated early to distinguish it from standard hyperbolic distances.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which has helped us identify areas for clarification in the manuscript. We address each major comment below and will revise the paper accordingly to strengthen the presentation of our contributions.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (method): the claim that entailment-cone margins yield activations 'without any additional supervision or learned modules' is load-bearing yet unsupported by the given description; constructing the hyperbolic embedding of the concept taxonomy appears to presuppose a hierarchy that may be derived from the same labeled data used in standard CBMs, risking circularity with the 'no additional supervision' assertion.

Authors: We appreciate the referee raising this point of potential circularity. The concept taxonomy is supplied as a fixed, external input (analogous to the predefined concept set in standard CBMs) and is not derived from the task-specific labeled data. Hyperbolic embeddings are then constructed deterministically from this given hierarchy using a standard tree-embedding procedure with no trainable parameters or additional supervision. We will revise §3 to explicitly state the source of the taxonomy and the deterministic nature of the embedding step, thereby removing any ambiguity around the 'no additional supervision' claim. revision: partial
Referee: [§4] §4 (adaptive scaling): the adaptive scaling law for hierarchically faithful interventions is introduced without an explicit equation or proof that it introduces no new learned parameters beyond the single 'adaptive scaling parameter' listed in the axiom ledger; this must be shown to confirm the parameter-free character of the intervention mechanism.

Authors: We agree that §4 would benefit from greater formality. In the revision we will insert the explicit equation for the adaptive scaling law together with a short derivation demonstrating that the mechanism depends only on the single listed adaptive scaling parameter and introduces no additional learned parameters. This will confirm the parameter-light character of the intervention procedure. revision: yes
Referee: [Experiments] Experiments section: the claim of rivaling Euclidean models trained on 20× more data lacks reported details on exact datasets, concept counts, sparsity regimes, baseline implementations, and statistical tests; without these, the performance, hierarchical consistency, and robustness advantages cannot be verified as load-bearing results.

Authors: We acknowledge that the current experimental description is insufficiently detailed for independent verification. The revised manuscript will expand the Experiments section to report the precise datasets, concept counts, sparsity levels, baseline implementations (including any hyper-parameter choices), and the results of statistical significance tests (e.g., paired t-tests with p-values). These additions will allow readers to fully assess the reported performance, hierarchical consistency, and robustness gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's core step reformulates concept activations as the margin of inclusion inside hyperbolic entailment cones, presented as a direct geometric consequence rather than a fitted parameter or self-referential definition. No equations are shown that reduce by construction to inputs, no self-citation chains are load-bearing for the central claim, and the 'no additional supervision' property is asserted from the post-hoc framework itself. The derivation remains self-contained against Euclidean baselines without reducing to renamed fits or imported uniqueness theorems.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that semantic hierarchies can be faithfully represented by hyperbolic geometry and that entailment cones provide an unsupervised activation signal. One adaptive scaling parameter is introduced for interventions.

free parameters (1)

adaptive scaling parameter
Used to propagate user corrections coherently through the concept tree; its exact fitting procedure is not detailed in the abstract.

axioms (1)

domain assumption Concepts are organized in semantic hierarchies that hyperbolic space can represent via entailment cones.
Invoked to justify moving from Euclidean to hyperbolic embeddings.

invented entities (1)

entailment cone margin as activation signal no independent evidence
purpose: To generate sparse, hierarchy-aware concept activations at test time without supervision.
New use of geometric containment for activation; no independent evidence supplied in the abstract.

pith-pipeline@v0.9.0 · 5495 in / 1283 out tokens · 36751 ms · 2026-05-13T06:28:03.268382+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean; IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean dAlembert_cosh_solution_aczel; costAlphaLog_fourth_deriv_at_zero matches

?

matches
MATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.

exp0(v) = cosh(√c∥v∥)o + sinh(√c∥v∥)(v/∥v∥); ϕ(z,ci) via Lorentz inner product; ai = max(0, ηimg − ϕ(z,ci)/ω(ci)) with ω(ci) = arcsin(2K√c∥cĩ∥)
IndisputableMonolith/Foundation/AlexanderDuality.lean; IndisputableMonolith/Cost.lean alexander_duality_circle_linking; Jcost_pos_of_ne_one echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

norm-based filtering ∥c̃∥ ≥ τ; adaptive scaling ηtext(∥cparent∥) linear in parent norm; hierarchical propagation via entailment cones

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 2 internal anchors

[1]

Emergent visual-semantic hierarchies in image-text representations

Morris Alper and Hadar Averbuch-Elor. Emergent visual-semantic hierarchies in image-text representations. InProceedings of the European Conference on Computer Vision (ECCV), 2024

work page 2024
[2]

Hyperbolic Image Segmentation

Mina Ghadimi Atigh, Julian Schoep, Erman Acar, Nanne Van Noord, and Pascal Mettes. Hyperbolic Image Segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4453–4462, 2022

work page 2022
[3]

Relational concept bottleneck models.Advances in Neural Information Processing Systems, 37:77663–77685, 2024

Pietro Barbiero, Francesco Giannini, Gabriele Ciravegna, Michelangelo Diligenti, and Giuseppe Marra. Relational concept bottleneck models.Advances in Neural Information Processing Systems, 37:77663–77685, 2024

work page 2024
[4]

Hyperbolic geometry

James W Cannon, William J Floyd, Richard Kenyon, and Walter R Parry. Hyperbolic geometry. InFlavors of geometry, pages 59–115. Cambridge University Press, 1997

work page 1997
[5]

Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning, 2025

David Debot, Pietro Barbiero, Gabriele Dominici, and Giuseppe Marra. Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning, 2025. URL https://arxiv.org/abs/2506.21102

work page arXiv 2025
[6]

ImageNet:

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large- scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. doi: 10.1109/CVPR.2009.5206848

work page doi:10.1109/cvpr.2009.5206848 2009
[7]

Hyperbolic Image-Text Representations

Karan Desai, Maximilian Nickel, Tanmay Rajpurohit, Justin Johnson, and Shanmukha Ramakr- ishna Vedantam. Hyperbolic Image-Text Representations. InInternational Conference on Machine Learning, pages 7694–7731. PMLR, 2023

work page 2023
[8]

Hierarchical image classification using entailment cone embeddings

Ankit Dhall, Anastasia Makarova, Octavian Ganea, Dario Pavllo, Michael Greeff, and Andreas Krause. Hierarchical image classification using entailment cone embeddings. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2020

work page 2020
[9]

Causally Reliable Concept Bottleneck Models,

Giovanni De Felice, Arianna Casanova Flores, Francesco De Santis, Silvia Santini, Johannes Schneider, Pietro Barbiero, and Alberto Termine. Causally Reliable Concept Bottleneck Models,

work page
[10]

URLhttps://arxiv.org/abs/2503.04363

work page arXiv
[11]

Hyperbolic Entailment Cones for Learning Hierarchical Embeddings

Octavian Ganea, Gary Bécigneul, and Thomas Hofmann. Hyperbolic Entailment Cones for Learning Hierarchical Embeddings. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80, pages 1646–1655. PMLR, 2018

work page 2018
[12]

Benchmarking neural network robustness to common corruptions and perturbations.Proceedings of the International Conference on Learning Representations, 2019

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations.Proceedings of the International Conference on Learning Representations, 2019

work page 2019
[13]

Improving interpretation faithfulness for vision transformers

Lijie Hu, Yixin Liu, Ninghao Liu, Mengdi Huai, Lichao Sun, and Di Wang. Improving interpretation faithfulness for vision transformers. InForty-first International Conference on Machine Learning, 2023

work page 2023
[14]

Seat: stable and explainable attention

Lijie Hu, Yixin Liu, Ninghao Liu, Mengdi Huai, Lichao Sun, and Di Wang. Seat: stable and explainable attention. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 12907–12915, 2023

work page 2023
[15]

Open-finllms: Open multimodal large language models for financial applications.arXiv preprint arXiv:2408.11878, 2024

Jimin Huang, Mengxi Xiao, Dong Li, Zihao Jiang, Yuzhe Yang, Yifei Zhang, Lingfei Qian, Yan Wang, Xueqing Peng, Yang Ren, et al. Open-finllms: Open multimodal large language models for financial applications.arXiv preprint arXiv:2408.11878, 2024

work page arXiv 2024
[16]

Argent: Adaptive hierarchical image-text representations, 2026

Chuong Huynh, Hossein Souri, Abhinav Kumar, Vitali Petsiuk, Deen Dayal Mohan, and Suren Kumar. Argent: Adaptive hierarchical image-text representations, 2026. URL https: //arxiv.org/abs/2603.23311

work page arXiv 2026
[17]

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision. InProceedings of the 38th International Conference on Machine Learning (ICML), volume 139, pages 4904–4916. PMLR, 2021. 10

work page 2021
[18]

Hyperbolic Image Embeddings

Valentin Khrulkov, Leyla Mirvakhabova, Evgeniya Ustinova, Ivan Oseledets, and Victor Lempit- sky. Hyperbolic Image Embeddings. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6418–6428, 2020

work page 2020
[19]

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCA V)

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCA V). InProceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 2668–2677. PMLR,

work page
[20]

URLhttps://proceedings.mlr.press/v80/kim18d.html

work page
[21]

Concept bottleneck models

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. InInternational conference on machine learning, pages 5338–5348. PMLR, 2020

work page 2020
[22]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009

work page 2009
[23]

Inferring concept hierarchies from text corpora via hyperbolic embeddings

Matthew Le, Stephen Roller, Laetitia Papaxanthos, Douwe Kiela, and Maximilian Nickel. Inferring concept hierarchies from text corpora via hyperbolic embeddings. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3231–3241. Association for Computational Linguistics, July 2019. doi: 10.18653/v1/P19-1313. URL htt...

work page doi:10.18653/v1/p19-1313 2019
[24]

LLaV A-Med: Training a large language-and-vision assistant for biomedicine in one day.Advances in Neural Information Processing Systems, 36: 28541–28564, 2023

Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, and Jianfeng Gao. LLaV A-Med: Training a large language-and-vision assistant for biomedicine in one day.Advances in Neural Information Processing Systems, 36: 28541–28564, 2023

work page 2023
[25]

BLIP: Bootstrapping language- image pre-training for unified vision-language understanding and generation

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. BLIP: Bootstrapping language- image pre-training for unified vision-language understanding and generation. InInternational Conference on Machine Learning, pages 12888–12900. PMLR, 2022

work page 2022
[26]

Visual instruction tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

work page 2023
[27]

Teng Long, Pascal Mettes, Heng Tao Shen, and Cees G. M. Snoek. Searching for Actions on the Hyperbole. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1138–1147, 2020. doi: 10.1109/CVPR42600.2020.00122

work page doi:10.1109/cvpr42600.2020.00122 2020
[28]

Lundberg and Su-In Lee

Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InProceedings of the 31st International Conference on Neural Information Processing Sys- tems, NIPS’17, page 4768–4777, Red Hook, NY , USA, 2017. Curran Associates Inc. ISBN 9781510860964

work page 2017
[29]

George A. Miller. WordNet: a lexical database for English.Commun. ACM, 38(11):39–41, November 1995. ISSN 0001-0782. doi: 10.1145/219717.219748. URL https://doi.org/ 10.1145/219717.219748

work page doi:10.1145/219717.219748 1995
[30]

The Numerical Stability of Hyperbolic Representation Learning

Gal Mishne, Zhengchao Wan, Yusu Wang, and Sheng Yang. The Numerical Stability of Hyperbolic Representation Learning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, ...

work page 2023
[31]

Med-flamingo: a multimodal medical few-shot learner

Michael Moor, Qian Huang, Shirley Wu, Michihiro Yasunaga, Yash Dalmia, Jure Leskovec, Cyril Zakka, Eduardo Pontes Reis, and Pranav Rajpurkar. Med-flamingo: a multimodal medical few-shot learner. InMachine Learning for Health (ML4H), pages 353–367. PMLR, 2023

work page 2023
[32]

Poincaré embeddings for learning hierarchical repre- sentations

Maximilian Nickel and Douwe Kiela. Poincaré embeddings for learning hierarchical repre- sentations. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30, 2017. 11

work page 2017
[33]

Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry

Maximilian Nickel and Douwe Kiela. Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80, pages 3779–3788. PMLR, 2018

work page 2018
[34]

Label-Free Concept Bottleneck Models

Tuomas Oikarinen, Subhro Das, Lam M Nguyen, and Tsui-Wei Weng. Label-Free Concept Bottleneck Models. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[35]

Compositional entailment learning for hyperbolic vision-language models.arXiv preprint arXiv:2410.06912, 2024

Avik Pal, Max van Spengler, Guido Maria D’Amely di Melendugno, Alessandro Flaborea, Fabio Galasso, and Pascal Mettes. Compositional entailment learning for hyperbolic vision-language models.arXiv preprint arXiv:2410.06912, 2024

work page arXiv 2024
[36]

Panousis, Dino Ienco, and Diego Marcos

Konstantinos P. Panousis, Dino Ienco, and Diego Marcos. Coarse-to-fine concept bottleneck models. InProceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY , USA, 2024. Curran Associates Inc. ISBN 9798331314385

work page 2024
[37]

Grounding multimodal large language models to the world

Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Qixiang Ye, and Furu Wei. Grounding multimodal large language models to the world. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[38]

Federico Pittino, Vesna Dimitrievska, and Rudolf Heer. Hierarchical concept bottleneck models for vision and their application to explainable fine classification and tracking.Engineering Applications of Artificial Intelligence, 118:105674, 2023

work page 2023
[39]

Hyperbolic Safety-Aware Vision-Language Models

Tobia Poppi, Tejaswi Kasarla, Pascal Mettes, Lorenzo Baraldi, and Rita Cucchiara. Hyperbolic Safety-Aware Vision-Language Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025
[40]

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning Transferable Visual Models From Natural Language Supervision. InProceedings of the 38th International Conference on Machine Learning (ICML), volume 139, pages 8748–8763. PMLR, 2021

work page 2021
[41]

Accept the modality gap: An exploration in the hyperbolic space

Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham, and Ajanthan Thalaiyasingam. Accept the modality gap: An exploration in the hyperbolic space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27263– 27272, June 2024

work page 2024
[42]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical Text-Conditional Image Generation with CLIP Latents, 2022. URL https://arxiv.org/ abs/2204.06125

work page internal anchor Pith review Pith/arXiv arXiv 2022
[43]

Discover-then-name: Task- agnostic concept bottlenecks via automated concept discovery

Sukrut Rao, Sweta Mahajan, Moritz Böhle, and Bernt Schiele. Discover-then-name: Task- agnostic concept bottlenecks via automated concept discovery. InEuropean Conference on Computer Vision, 2024

work page 2024
[44]

"Why Should

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "Why Should I Trust You?": Explain- ing the Predictions of Any Classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 1135–1144. Association for Computing Machinery, 2016. ISBN 9781450342322. doi: 10.1145/2939672.2939778. URL ht...

work page doi:10.1145/2939672.2939778 2016
[45]

High- Resolution Image Synthesis With Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- Resolution Image Synthesis With Latent Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022

work page 2022
[46]

Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane

Rik Sarkar. Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane. In Marc van Kreveld and Bettina Speckmann, editors,Graph Drawing, pages 355–366, Berlin, Heidelberg,

work page
[47]

ISBN 978-3-642-25878-7

Springer Berlin Heidelberg. ISBN 978-3-642-25878-7

work page
[48]

A closer look at the intervention procedure of concept bottleneck models

Sungbin Shin, Yohan Jo, Sungsoo Ahn, and Namhoon Lee. A closer look at the intervention procedure of concept bottleneck models. InInternational Conference on Machine Learning, pages 31504–31520. PMLR, 2023. 12

work page 2023
[49]

VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance

Divyansh Srivastava, Ge Yan, and Tsui-Wei Weng. VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 79057–79094. Curran Associates, Inc., 2024. doi: 10.52202/ 079017-2510

work page 2024
[50]

Learning to intervene on concept bottlenecks

David Steinmann, Wolfgang Stammer, Felix Friedrich, and Kristian Kersting. Learning to intervene on concept bottlenecks. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024

work page 2024
[51]

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Yang Wang, Zhiyong Zhao, Kun Zhan, Peng Jia, Xianpeng Lang, and Hang Zhao. DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models, 2024. URLhttps://arxiv.org/abs/2402.12289

work page internal anchor Pith review Pith/arXiv arXiv 2024
[52]

LogicCBMs: Logic-Enhanced Concept-Based Learning, 2025

Deepika SN Vemuri, Gautham Bellamkonda, Aditya Pola, and Vineeth N Balasubramanian. LogicCBMs: Logic-Enhanced Concept-Based Learning, 2025. URL https://arxiv.org/ abs/2512.07383

work page arXiv 2025
[53]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. CUB200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011

work page 2011
[54]

Ehinger, Aude Oliva, and Antonio Torralba

Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3485–3492, 2010. doi: 10. 1109/CVPR.2010.5539970

work page arXiv 2010
[55]

Nguyen, and Tengfei Ma

Haotian Xu, Tsui-Wei Weng, Lam M. Nguyen, and Tengfei Ma. Graph concept bottleneck models.Transactions on Machine Learning Research, 2026. ISSN 2835-8856. URL https: //openreview.net/forum?id=a4azUYjRhU

work page 2026
[56]

Human-ai interactions in the communication era: Autophagy makes large models achieving local optima.CoRR, 2024

Shu Yang, Lijie Hu, Lu Yu, Muhammad Asif Ali, and Di Wang. Human-ai interactions in the communication era: Autophagy makes large models achieving local optima.CoRR, 2024

work page 2024
[57]

Language in a bottle: Language model guided concept bottlenecks for interpretable image classification

Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, and Mark Yatskar. Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19187–19197, 2023

work page 2023
[58]

Post-hoc concept bottleneck models

Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc concept bottleneck models. In The Eleventh International Conference on Learning Representations, 2023. URL https: //openreview.net/forum?id=nA5AZ8CEyow

work page 2023
[59]

Concept embedding models: Beyond the accuracy-explainability trade-off.Advances in Neural Information Processing Systems, 35, 2022

Mateo Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Zohreh Shams, Frederic Precioso, Stefano Melacci, Adrian Weller, Pietro Lio, and Mateja Jamnik. Concept embedding models: Beyond the accuracy-explainability trade-off.Advances in Neural Information Processing Systems, 35, 2022. 13 A Additional ...

work page 2022
[60]

Important Features "List the most important features for recognizing something as a ’goldfish’: - bright orange color - a small, round body - a long, flowing tail - a small mouth - orange fins List the most important features for recognizing something as a ’beer glass’: - a tall, cylindrical shape - clear or translucent color - opening at the top - a stur...

work page
[61]

Give superclasses for the word ’tench’: - fish - vertebrate - animal Give superclasses for the word ’beer glass’: - glass - container - object Give superclasses for the word ’{}’:

Superclass "Give superclasses for the word ’tench’: - fish - vertebrate - animal Give superclasses for the word ’beer glass’: - glass - container - object Give superclasses for the word ’{}’:"

work page
[62]

Persian cat

Context (Around) "List the things most commonly seen around a ’tench’: - a pond - fish - a net - a rod - a reel - a hook - bait List the things most commonly seen around a ’beer glass’: - beer - a bar - a coaster - a napkin - a straw - a lime - a person List the things most commonly seen around a ’{}’:" Figure 9:LLM Prompts for Concept Generation.We use t...

work page