pith. sign in

arxiv: 2606.22994 · v1 · pith:2XGECBLDnew · submitted 2026-06-22 · 💻 cs.LG

Do Sparse Autoencoders Learn Meaningful Concept Hierarchies?

Pith reviewed 2026-06-26 08:44 UTC · model grok-4.3

classification 💻 cs.LG
keywords sparse autoencodersconcept hierarchiesfeature absorptionunsupervised concept discoveryvisual datataxonomygeneralization specialization
0
0 comments X

The pith

Feature absorption in sparse autoencoders systematically compromises the quality of learned concept hierarchies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a set of requirements for meaningful generalization and specialization hierarchies in unsupervised concept discovery from semantic net and taxonomy research, then turns them into a concrete evaluation protocol. When this protocol is applied to current sparse autoencoder approaches trained on visual data, feature spaces generally support sensible hierarchies but good structure proves hard to establish. Feature absorption in both its hard form and a continuous soft form is found to systematically reduce hierarchy quality. The work identifies a fundamental tension that future methods must navigate.

Core claim

While SAE feature spaces generally provide a basis for sensible hierarchies, establishing good hierarchical structure remains challenging. In particular, feature absorption, both in its well-known hard form and in a continuous, soft form, systematically compromises hierarchy quality, pointing to a fundamental tension that future approaches will need to navigate.

What carries the argument

The evaluation protocol derived from key requirements for generalization/specialization hierarchies drawn from semantic net and taxonomy research.

If this is right

  • SAE feature spaces can provide a basis for sensible hierarchies.
  • Hard feature absorption reduces hierarchy quality.
  • Soft feature absorption reduces hierarchy quality.
  • Imposing hierarchical structure on SAEs creates a tension with the need to avoid absorption.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Training objectives that explicitly penalize both hard and soft absorption may be required to produce better hierarchies.
  • The same absorption problem could limit hierarchy quality in other unsupervised feature-learning methods beyond SAEs.
  • The derived protocol could be validated by comparing its scores against human judgments of feature hierarchies.

Load-bearing premise

The requirements for generalization/specialization hierarchies derived from semantic net and taxonomy research are the appropriate criteria for judging meaningful feature hierarchies in unsupervised SAE concept discovery.

What would settle it

Training an SAE that exhibits feature absorption yet achieves high scores on the hierarchy quality protocol, or one that avoids absorption but scores low, would test whether absorption systematically compromises quality.

Figures

Figures reproduced from arXiv: 2606.22994 by David Steinmann, Felix Friedrich, Kristian Kersting, Nils Grandien.

Figure 1
Figure 1. Figure 1: Hierarchical concept representations explicitly model relations between concepts and allow for consistent model steering. Typical unsupservised concept discovery approaches (left) provide large and unordered concept spaces. These lack clear connections between individual con￾cepts, making interpretation difficult and cumbersome. Similarly, model steering requires identifying and updating many concepts at t… view at source ↗
Figure 2
Figure 2. Figure 2: Requirements for concept hierarchies op￾erate on different graph aspects. Concept-level as￾sumptions concern individual features, structure-level requirements address the static graph structure, and co￾activation-level criteria evaluate the hierarchy given concrete sample-based activations. In this section, we derive requirements that capture the essential properties of con￾cept hierarchies, divided into c… view at source ↗
Figure 3
Figure 3. Figure 3: Graph metrics for all SAEs trained on CLIP and DINOv2 encodings. Average and standard deviation over all graph elements (nodes, edges, or parent-child groups) are shown. Random baseline illustrates the performance of a random selection of respective samples/nodes/edges where applicable. Full data distrubtions in [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Hierarchy quality of ActMSAE across depths. While the hierarchy abstraction diminishes at higher depth, semantic coherence among children and refinement frequency increase. Values are aggregated at the depth level of the node (sm) or of the parent for edge and group metrics. reveal a clear difference in feature specificity between levels 1 and 2, with deeper levels remaining rather similar. Refinement freq… view at source ↗
Figure 5
Figure 5. Figure 5: Learned hierarchies capture meaningful semantic abstractions. Left: various bird types are grouped under a common birds concept, with chickens further refined into roosters and hens. Right: CLIP’s vision-language pretraining shapes the learned hierarchies, causing swimming pools and billiards tables to be grouped under the shared concept pool. 5 Discussion Our evaluations show that SAE feature spaces gener… view at source ↗
Figure 6
Figure 6. Figure 6: Failure modes of learned hierarchies. Left: children are grouped under a common parent based on spurious co-occurrences rather than clear semantic connections. Right: several child features are themselves noisy, for example, activating on placeholder sprites, which could be mitigated by feature-interpretability filtering prior to graph construction. environmentally (a) H-SAE Subgraph Example (b) Parent (c)… view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative example of H-SAE’s failure mode. The parent concept represents a coherent concept about the environment and its pollution, while the children do not have clear semantic connections to the parent. Even more, the top-activating images of the leftmost child themselves do not show any coherent concept. semantics. Specifically, the hierarchical abstractness score sha reveals that parent features are… view at source ↗
Figure 8
Figure 8. Figure 8: Further qualitative example of H-SAE. In this example, the parent concept represents a living room or a kitchen, while the children again do not have clear semantic connections to the parent. Similarly, the top-activating images of the leftmost child do not show a coherent concept. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Distribution of the metrics per graph element for SAEs trained on CLIP. While the distributions of nodes, edges, or parent-child groups are mostly well-behaved, there is a clear two-modal distribution for ActMSAE visible, most likely due to the big differences between hierarchy depths. (sha takes only discrete values, meaning that the raincloud plots for this metric do not provide more information than the… view at source ↗
Figure 10
Figure 10. Figure 10: Distribution of the metrics per graph element for SAEs trained on DINOv2. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
read the original abstract

Sparse autoencoders (SAEs) have become an important tool for unsupervised concept discovery in large models. To make the resulting feature spaces more interpretable and manageable, recent approaches have begun imposing hierarchical structure, either explicitly or as an implicit effect of training constraints, yet rigorous comparison remains difficult. There are no agreed-upon requirements for what a meaningful feature hierarchy should satisfy, and evaluation has largely relied on qualitative illustrations with fragmented quantitative protocols. To address this, we derive a set of key requirements for generalization/specialization hierarchies in unsupervised concept discovery, drawing on semantic net and taxonomy research alongside recent SAE work, and use them to derive a concrete evaluation protocol. Applying this protocol to current SAE approaches trained on visual data, we find that while feature spaces generally provide a basis for sensible hierarchies, establishing good hierarchical structure remains challenging. In particular, feature absorption, both in its well-known hard form and in a continuous, soft form, systematically compromises hierarchy quality, pointing to a fundamental tension that future approaches will need to navigate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper derives a set of key requirements for generalization/specialization hierarchies from semantic net and taxonomy research, develops a concrete evaluation protocol from them, and applies the protocol to current SAE approaches trained on visual data. It concludes that while SAE feature spaces can support sensible hierarchies, feature absorption (both hard and soft forms) systematically compromises hierarchy quality, identifying a fundamental tension for future work.

Significance. If the evaluation criteria prove appropriate for the SAE setting, the work supplies a quantitative protocol that moves beyond qualitative illustrations and isolates feature absorption as a load-bearing obstacle to hierarchical structure. This could usefully inform SAE training objectives that aim for both sparsity and hierarchy.

major comments (1)
  1. [Abstract] Abstract: The central claim that feature absorption compromises 'meaningful' hierarchy quality rests on the untested modeling choice that requirements imported from semantic-net and taxonomy literature are the right standard for unsupervised activation-based features. The manuscript shows deviation from these criteria but provides no evidence (e.g., downstream task performance, human interpretability ratings, or ablation on utility) that hierarchies violating the criteria are less useful in the SAE context; this directly determines both the quantitative protocol and the absorption conclusion.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and the positive assessment of the paper's contribution in proposing a quantitative protocol. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that feature absorption compromises 'meaningful' hierarchy quality rests on the untested modeling choice that requirements imported from semantic-net and taxonomy literature are the right standard for unsupervised activation-based features. The manuscript shows deviation from these criteria but provides no evidence (e.g., downstream task performance, human interpretability ratings, or ablation on utility) that hierarchies violating the criteria are less useful in the SAE context; this directly determines both the quantitative protocol and the absorption conclusion.

    Authors: We agree that the choice of criteria constitutes a modeling decision and that the manuscript does not include direct empirical validation (such as downstream task performance, human ratings, or utility ablations) showing that violations reduce usefulness specifically within SAE applications. The requirements were selected because they formalize core properties of generalization/specialization hierarchies from established semantic-net and taxonomy research, providing a principled basis for evaluating unsupervised concept discovery rather than relying solely on qualitative illustrations. The paper's primary finding concerns the systematic impact of feature absorption under this protocol. We will revise the abstract and add a dedicated limitations paragraph in the discussion to explicitly frame the criteria as a proposed standard drawn from the literature, acknowledge the absence of direct utility evidence as an open question, and note that alternative criteria could yield different conclusions. This revision clarifies the scope without changing the reported results on absorption. revision: partial

Circularity Check

0 steps flagged

Requirements derived externally from semantic net/taxonomy research; no reduction to self-defined quantities or load-bearing self-citations

full rationale

The paper states it derives key requirements for hierarchies 'drawing on semantic net and taxonomy research alongside recent SAE work' and then applies the resulting evaluation protocol to existing SAE methods. No equations or claims reduce a prediction or central result to a fitted parameter or self-citation chain defined within the paper. The evaluation criteria are imported from external literature rather than constructed from the authors' own prior outputs. This is self-contained against external benchmarks and receives the default low circularity score.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields minimal ledger entries; the central claim rests on the suitability of externally derived hierarchy criteria and the existence of feature absorption effects in the evaluated models.

axioms (1)
  • domain assumption Requirements drawn from semantic net and taxonomy research are suitable criteria for evaluating meaningful generalization/specialization hierarchies in SAE feature spaces.
    The paper explicitly draws on this body of work to derive the evaluation protocol.

pith-pipeline@v0.9.1-grok · 5707 in / 1174 out tokens · 13958 ms · 2026-06-26T08:44:58.072993+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 4 linked inside Pith

  1. [1]

    Domain-specific knowledge graphs: A survey.Journal of Network and Computer Applications, 185:103076, 2021

    Bilal Abu-Salih. Domain-specific knowledge graphs: A survey.Journal of Network and Computer Applications, 185:103076, 2021

  2. [2]

    Saes are good for steering–if you select the right features

    Dana Arad, Aaron Mueller, and Yonatan Belinkov. Saes are good for steering–if you select the right features. InProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 10252–10270, 2025

  3. [3]

    The psychological nature of concepts

    E James Archer. The psychological nature of concepts. InAnalyses of concept learning, pages 37–49. Elsevier, 1966

  4. [4]

    Semeval-2016 task 13: Taxonomy extraction evaluation (texeval-2)

    Georgeta Bordea, Els Lefever, and Paul Buitelaar. Semeval-2016 task 13: Taxonomy extraction evaluation (texeval-2). InProceedings of the 10th international workshop on semantic evaluation (semeval), pages 1081–1091, 2016

  5. [5]

    Brachman

    Ronald J. Brachman. What is-a is and isn’t: An analysis of taxonomic links in semantic networks.Computer;(United States), 10, 1983

  6. [6]

    Towards monosemanticity: Decomposing language models with dictionary learning.Transformer Circuits Thread, 2023

    Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Con- erly, Nick Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Zac Hatfield-Dodds, Alex Tamkin, Karina Nguyen, Brayden McLean, Josiah E Burke, Tristan Hume, Shan Carter, Tom Henighan, and ...

  7. [7]

    Learning multi-level features with matryoshka sparse autoencoders.arXiv preprint arXiv:2503.17547, 2025

    Bart Bussmann, Noa Nabeshima, Adam Karvonen, and Neel Nanda. Learning multi-level features with matryoshka sparse autoencoders.arXiv preprint arXiv:2503.17547, 2025

  8. [8]

    A is for absorption: Studying feature splitting and absorption in sparse autoen- coders.arXiv preprint arXiv:2409.14507, 2024

    David Chanin, James Wilken-Smith, Tomáš Dulka, Hardik Bhatnagar, Satvik Golechha, and Joseph Bloom. A is for absorption: Studying feature splitting and absorption in sparse autoen- coders.arXiv preprint arXiv:2409.14507, 2024

  9. [9]

    Feature hedging: Correlated features break narrow sparse autoencoders.arXiv preprint arXiv:2505.11756, 2025

    David Chanin, Tomáš Dulka, and Adrià Garriga-Alonso. Feature hedging: Correlated features break narrow sparse autoencoders.arXiv preprint arXiv:2505.11756, 2025

  10. [10]

    From flat to hierarchical: Extracting sparse representations with matching pursuit.arXiv preprint arXiv:2506.03093, 2025

    Valérie Costa, Thomas Fel, Ekdeep Singh Lubana, Bahareh Tolooshams, and Demba Ba. From flat to hierarchical: Extracting sparse representations with matching pursuit.arXiv preprint arXiv:2506.03093, 2025

  11. [11]

    Sparse autoen- coders find highly interpretable features in language models.arXiv preprint arXiv:2309.08600, 2023

    Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse autoen- coders find highly interpretable features in language models.arXiv preprint arXiv:2309.08600, 2023

  12. [12]

    Mechanistic understanding and validation of large ai models with semanticlens.Nature Machine Intelligence, 7(9):1572–1585, 2025

    Maximilian Dreyer, Jim Berend, Tobias Labarta, Johanna Vielhaben, Thomas Wiegand, Sebas- tian Lapuschkin, and Wojciech Samek. Mechanistic understanding and validation of large ai models with semanticlens.Nature Machine Intelligence, 7(9):1572–1585, 2025

  13. [13]

    A holistic approach to unifying automatic concept extraction and concept importance estimation.Advances in Neural Information Processing Systems, 36:54805–54818, 2023

    Thomas Fel, Victor Boutin, Louis Béthune, Rémi Cadène, Mazda Moayeri, Léo Andéol, Mathieu Chalvidal, and Thomas Serre. A holistic approach to unifying automatic concept extraction and concept importance estimation.Advances in Neural Information Processing Systems, 36:54805–54818, 2023. 10

  14. [14]

    Sparse autoencoders are topic models.arXiv preprint arXiv:2511.16309, 2025

    Leander Girrbach and Zeynep Akata. Sparse autoencoders are topic models.arXiv preprint arXiv:2511.16309, 2025

  15. [15]

    The perception of relations.Trends in Cognitive Sciences, 25 (6):475–492, 2021

    Alon Hafri and Chaz Firestone. The perception of relations.Trends in Cognitive Sciences, 25 (6):475–492, 2021

  16. [16]

    Measuring and guiding monosemanticity.Advances in Neural Information Processing Systems (NeurIPS), 38, 2025

    Ruben Härle, Felix Friedrich, Manuel Brack, Stephan Wäldchen, Björn Deiseroth, Patrick Schramowski, and Kristian Kersting. Measuring and guiding monosemanticity.Advances in Neural Information Processing Systems (NeurIPS), 38, 2025

  17. [17]

    Automatic acquisition of hyponyms from large text corpora

    Marti A Hearst. Automatic acquisition of hyponyms from large text corpora. InCOLING 1992 volume 2: The 14th international conference on computational linguistics, 1992

  18. [18]

    Proximal methods for sparse hierarchical dictionary learning

    Rodolphe Jenatton, Julien Mairal, Francis R Bach, and Guillaume R Obozinski. Proximal methods for sparse hierarchical dictionary learning. InInternational Conference on Machine Learning (ICML), pages 487–494, 2010

  19. [19]

    Interpretable embed- dings with sparse autoencoders: A data analysis toolkit.arXiv preprint arXiv:2512.10092, 2025

    Nick Jiang, Xiaoqing Sun, Lisa Dunlap, Lewis Smith, and Neel Nanda. Interpretable embed- dings with sparse autoencoders: A data analysis toolkit.arXiv preprint arXiv:2512.10092, 2025

  20. [20]

    Saebench: A comprehen- sive benchmark for sparse autoencoders in language model interpretability.arXiv preprint arXiv:2503.09532, 2025

    Adam Karvonen, Can Rager, Johnny Lin, Curt Tigges, Joseph Bloom, David Chanin, Yeu-Tong Lau, Eoin Farrell, Callum McDougall, Kola Ayonrinde, et al. Saebench: A comprehen- sive benchmark for sparse autoencoders in language model interpretability.arXiv preprint arXiv:2503.09532, 2025

  21. [21]

    Representing and reasoning over a taxonomy of part– whole relations.Applied Ontology, 3(1-2):91–110, 2008

    C Maria Keet and Alessandro Artale. Representing and reasoning over a taxonomy of part– whole relations.Applied Ontology, 3(1-2):91–110, 2008

  22. [22]

    Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)

    Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). InInternational Conference on Machine Learning (ICML), pages 2668–2677. PMLR, 2018

  23. [23]

    Concept bottleneck models

    Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. InInternational Conference on Machine Learning (ICML), pages 5338–5348. PMLR, 2020

  24. [24]

    Sparse autoencoders do not find canonical units of analysis.arXiv preprint arXiv:2502.04878, 2025

    Patrick Leask, Bart Bussmann, Michael Pearce, Joseph Bloom, Curt Tigges, Noura Al Moubayed, Lee Sharkey, and Neel Nanda. Sparse autoencoders do not find canonical units of analysis.arXiv preprint arXiv:2502.04878, 2025

  25. [25]

    Unlocking hierarchical concept discovery in language models through geometric regularization

    Ed Li and Junyu Ren. Unlocking hierarchical concept discovery in language models through geometric regularization. InICLR 2025 Workshop on Building Trust in Language Models and Applications, 2025

  26. [26]

    The geometry of concepts: Sparse autoencoder feature structure.Entropy, 27(4):344, 2025

    Yuxiao Li, Eric J Michaud, David D Baek, Joshua Engels, Xiaoqing Sun, and Max Tegmark. The geometry of concepts: Sparse autoencoder feature structure.Entropy, 27(4):344, 2025

  27. [27]

    Controllable text generation for large language models: A survey.arXiv preprint arXiv:2408.12599, 2024

    Xun Liang, Hanyu Wang, Yezhaohui Wang, Shichao Song, Jiawei Yang, Simin Niu, Jie Hu, Dan Liu, Shunyu Yao, Feiyu Xiong, et al. Controllable text generation for large language models: A survey.arXiv preprint arXiv:2408.12599, 2024

  28. [28]

    Conceptnet—a practical commonsense reasoning tool-kit.BT technology journal, 22(4):211–226, 2004

    Hugo Liu and Push Singh. Conceptnet—a practical commonsense reasoning tool-kit.BT technology journal, 22(4):211–226, 2004

  29. [29]

    From atoms to trees: Building a structured feature forest with hierarchical sparse autoencoders

    Yifan Luo, Yang Zhan, Jiedong Jiang, Tianyang Liu, Mingrui Wu, Zhennan Zhou, and Bin Dong. From atoms to trees: Building a structured feature forest with hierarchical sparse autoencoders. arXiv preprint arXiv:2602.11881, 2026

  30. [30]

    The magical number seven, plus or minus two: Some limits on our capacity for processing information.Psychological review, 63(2):81, 1956

    George A Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information.Psychological review, 63(2):81, 1956

  31. [31]

    Incorporating hierarchical semantics in sparse autoencoder architectures.arXiv preprint arXiv:2506.01197, 2025

    Mark Muchane, Sean Richardson, Kiho Park, and Victor Veitch. Incorporating hierarchical semantics in sparse autoencoder architectures.arXiv preprint arXiv:2506.01197, 2025. 11

  32. [32]

    A method for taxonomy develop- ment and its application in information systems.European journal of information systems, 22 (3):336–359, 2013

    Robert C Nickerson, Upkar Varshney, and Jan Muntermann. A method for taxonomy develop- ment and its application in information systems.European journal of information systems, 22 (3):336–359, 2013

  33. [33]

    Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

  34. [34]

    Sparse autoencoders learn monosemantic features in vision-language models.arXiv preprint arXiv:2504.02821, 2025

    Mateusz Pach, Shyamgopal Karthik, Quentin Bouniot, Serge Belongie, and Zeynep Akata. Sparse autoencoders learn monosemantic features in vision-language models.arXiv preprint arXiv:2504.02821, 2025

  35. [35]

    Automatically interpreting millions of features in large language models.arXiv preprint arXiv:2410.13928, 2024

    Gonçalo Paulo, Alex Mallen, Caden Juang, and Nora Belrose. Automatically interpreting millions of features in large language models.arXiv preprint arXiv:2410.13928, 2024

  36. [36]

    Federico Pittino, Vesna Dimitrievska, and Rudolf Heer. Hierarchical concept bottleneck models for vision and their application to explainable fine classification and tracking.Engineering Applications of Artificial Intelligence, 118:105674, 2023

  37. [37]

    Concept- based explainable artificial intelligence: A survey.ACM Computing Surveys, 2023

    Eleonora Poeta, Gabriele Ciravegna, Eliana Pastor, Tania Cerquitelli, and Elena Baralis. Concept- based explainable artificial intelligence: A survey.ACM Computing Surveys, 2023

  38. [38]

    Manipulating and measuring model interpretability

    Forough Poursabzi-Sangdeh, Daniel G Goldstein, Jake M Hofman, Jennifer Wortman Wort- man Vaughan, and Hanna Wallach. Manipulating and measuring model interpretability. In Proceedings of the 2021 CHI conference on human factors in computing systems, pages 1–52, 2021

  39. [39]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning (ICML), pages 8748–8763. PmLR, 2021

  40. [40]

    Discover-then-name: Task- agnostic concept bottlenecks via automated concept discovery

    Sukrut Rao, Sweta Mahajan, Moritz Böhle, and Bernt Schiele. Discover-then-name: Task- agnostic concept bottlenecks via automated concept discovery. InEuropean Conference on Computer Vision (ECCV), pages 444–461. Springer, 2024

  41. [41]

    Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning

    Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2556–2565, 2018

  42. [42]

    A survey on sparse autoencoders: Interpreting the internal mechanisms of large language models.arXiv preprint arXiv:2503.05613, 2025

    Dong Shu, Xuansheng Wu, Haiyan Zhao, Daking Rai, Ziyu Yao, Ninghao Liu, and Mengnan Du. A survey on sparse autoencoders: Interpreting the internal mechanisms of large language models.arXiv preprint arXiv:2503.05613, 2025

  43. [43]

    Neural concept binder.Advances in Neural Information Processing Systems (NeurIPS), 37:71792–71830, 2024

    Wolfgang Stammer, Antonia Wüst, David Steinmann, and Kristian Kersting. Neural concept binder.Advances in Neural Information Processing Systems (NeurIPS), 37:71792–71830, 2024

  44. [44]

    Eliminating information leakage in hard concept bottleneck models with supervised, hierarchical concept learning.arXiv preprint arXiv:2402.05945, 2024

    Ao Sun, Yuanyuan Yuan, Pingchuan Ma, and Shuai Wang. Eliminating information leakage in hard concept bottleneck models with supervised, hierarchical concept learning.arXiv preprint arXiv:2402.05945, 2024

  45. [45]

    Ontological evaluation and validation

    Samir Tartir, I Budak Arpinar, and Amit P Sheth. Ontological evaluation and validation. In Theory and applications of ontology: Computer applications, pages 115–130. Springer, 2010

  46. [46]

    Leveraging explanations in interactive machine learning: An overview.Frontiers in Artificial Intelligence, 6:1066049, 2023

    Stefano Teso, Öznur Alkan, Wolfgang Stammer, and Elizabeth Daly. Leveraging explanations in interactive machine learning: An overview.Frontiers in Artificial Intelligence, 6:1066049, 2023

  47. [47]

    A compendium and evaluation of taxonomy quality attributes.Expert Systems, 40(1):e13098, 2023

    Michael Unterkalmsteiner and Waleed Abdeen. A compendium and evaluation of taxonomy quality attributes.Expert Systems, 40(1):e13098, 2023

  48. [48]

    Automated taxonomy construction using large language models: A comparative study of fine-tuning and prompt engineering.Eng, 6(11):283, 2025

    Binh Vu, Rashmi Govindraju Naik, Bao Khanh Nguyen, Sina Mehraeen, and Matthias Hemmje. Automated taxonomy construction using large language models: A comparative study of fine-tuning and prompt engineering.Eng, 6(11):283, 2025. 12

  49. [49]

    Cfm: Language-aligned concept foundation model for vision.arXiv preprint arXiv:2601.13798, 2026

    Kai Wittenmayer, Sukrut Rao, Amin Parchami-Araghi, Bernt Schiele, and Jonas Fischer. Cfm: Language-aligned concept foundation model for vision.arXiv preprint arXiv:2601.13798, 2026

  50. [50]

    No gold standard, no problem: Reference-free evaluation of taxonomies.arXiv preprint arXiv:2505.11470, 2025

    Pascal Wullschleger, Majid Zarharan, Donnacha Daly, Marc Pouly, and Jennifer Foster. No gold standard, no problem: Reference-free evaluation of taxonomies.arXiv preprint arXiv:2505.11470, 2025

  51. [51]

    Sparse autoencoders for dense text embeddings reveal hierarchical feature sub-structure

    Christine Ye, Charles O’Neill, John F Wu, and Kartheik G Iyer. Sparse autoencoders for dense text embeddings reveal hierarchical feature sub-structure. InNeurIPS 2024 Workshop on Scientific Methods for Understanding Deep Learning, 2024

  52. [52]

    Interpreting clip with hierarchi- cal sparse autoencoders.arXiv preprint arXiv:2502.20578, 2025

    Vladimir Zaigrajew, Hubert Baniecki, and Przemyslaw Biecek. Interpreting clip with hierarchi- cal sparse autoencoders.arXiv preprint arXiv:2502.20578, 2025

  53. [53]

    Chain-of-layer: Iteratively prompting large language models for taxonomy induction from limited examples

    Qingkai Zeng, Yuyang Bai, Zhaoxuan Tan, Shangbin Feng, Zhenwen Liang, Zhihan Zhang, and Meng Jiang. Chain-of-layer: Iteratively prompting large language models for taxonomy induction from limited examples. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 3093–3102, 2024. 13 A Supplementary material A.1 ...