OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models
Pith reviewed 2026-05-20 11:06 UTC · model grok-4.3
The pith
OCCAM estimates causal contributions of visual concepts by removing them from images and induces a global ontology from those effects in black-box vision models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OCCAM discovers visual concepts in an open-set manner, localizes them via text-guided segmentation, performs object-level interventions by removing concepts to measure changes in class confidence, and aggregates the interventional evidence across a dataset to induce a structured concept ontology that captures how classifiers globally organize visual concepts, revealing consistent dependencies, latent causal relations, and systematic model biases.
What carries the argument
Object-level interventions that remove text-guided localized concepts from images to quantify their causal effect on model class confidence, followed by aggregation of those effects into an induced ontology.
If this is right
- Classifiers' decisions decompose into measurable causal contributions from the discovered concepts.
- Aggregated interventions expose consistent dependencies between concepts in the model's reasoning.
- Latent causal relations and systematic biases become visible at the global level.
- Explanation quality rises in open-set black-box settings relative to per-image attribution methods.
- The induced ontology supplies richer global insight into how the model organizes visual information.
Where Pith is reading between the lines
- The same intervention-plus-ontology pipeline could be adapted to non-vision modalities by replacing segmentation with equivalent localization methods.
- The resulting ontology offers a concrete starting point for auditing whether a model relies on spurious correlations that would break under distribution shift.
- One could test whether the ontology correctly predicts the effect of removing multiple concepts at once, which the paper does not examine.
Load-bearing premise
Performing object-level interventions by removing localized concepts via text-guided segmentation validly estimates causal contributions without introducing artifacts or confounding factors.
What would settle it
A set of images in which removing a concept according to the segmentation produces confidence changes that contradict the dependencies predicted by the induced ontology, or where the ontology fails to generalize to new images containing the same concepts.
Figures
read the original abstract
Interpreting the decisions of deep image classifiers remains challenging, particularly in black-box settings where model internals are inaccessible. We introduce OCCAM, a framework for open-set causal concept explanation and ontology induction in vision models. OCCAM discovers visual concepts in an open-set manner, localizes them via text-guided segmentation, and performs object-level interventions by removing concepts to measure changes in class confidence, estimating each concept's causal contribution. Beyond local explanations, OCCAM aggregates interventional evidence across a dataset to induce a structured concept ontology that captures how classifiers globally organize visual concepts. Reasoning over this ontology reveals consistent dependencies between concepts, exposes latent causal relations, and uncovers systematic model biases. Experiments on Broden and ImageNet-S across multiple classifiers show that OCCAM improves explanation quality in open-set black-box settings while providing richer global insight than per-image attribution methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces OCCAM, a framework for open-set causal concept explanation and ontology induction for black-box vision models. It discovers visual concepts without predefined labels, localizes them via text-guided segmentation, performs object-level interventions by removing localized concepts to measure changes in class confidence, and aggregates interventional evidence across datasets to induce a structured concept ontology. This ontology is used to reveal concept dependencies, latent causal relations, and model biases. Experiments on Broden and ImageNet-S across multiple classifiers claim improved explanation quality over per-image attribution methods and richer global insights.
Significance. If the intervention-based estimates prove robust, OCCAM would offer a meaningful advance in explainable AI by moving beyond local attributions to structured, causal global ontologies in open-set regimes. The aggregation step for ontology induction is a clear strength that could enable falsifiable predictions about model behavior.
major comments (2)
- [§3.2] §3.2 (Intervention procedure): The central claim that object-level interventions (text-guided segmentation followed by removal) yield valid causal contributions is load-bearing for both local explanations and the induced ontology. No quantitative validation is provided that segmentation isolates the target concept without boundary leakage, partial occlusion of correlated features, or that the removal (masking/inpainting) alters output solely via the intended concept rather than global statistics or new artifacts.
- [§4] §4 (Experiments): The reported improvements in explanation quality and global insight are stated without specific metrics, effect sizes, baseline comparisons, or statistical tests for the open-set setting. This makes it impossible to evaluate whether the data support the claim that OCCAM outperforms per-image methods while producing a reliable ontology.
minor comments (2)
- [§3] The notation for causal contribution (Δ confidence) should be formalized with an explicit equation early in the method section to improve clarity.
- [Figures] Figure captions for ontology visualizations could include quantitative measures of dependency strength to aid interpretation.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. The comments identify key areas where additional rigor can strengthen the presentation of the intervention procedure and experimental results. We respond to each major comment below and commit to revisions that directly address the concerns raised.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Intervention procedure): The central claim that object-level interventions (text-guided segmentation followed by removal) yield valid causal contributions is load-bearing for both local explanations and the induced ontology. No quantitative validation is provided that segmentation isolates the target concept without boundary leakage, partial occlusion of correlated features, or that the removal (masking/inpainting) alters output solely via the intended concept rather than global statistics or new artifacts.
Authors: We agree that quantitative validation of the intervention is essential to support the causal claims. The original manuscript relies primarily on qualitative visualizations of segmentations and the downstream consistency of the induced ontology to justify the procedure. In revision we will add a dedicated quantitative validation subsection to §3.2. This will include (i) IoU comparisons between text-guided masks and available ground-truth annotations on a sampled subset of the Broden dataset, (ii) controlled ablations that measure output change when removing the target concept versus removing spatially adjacent but semantically uncorrelated regions, and (iii) a brief discussion of inpainting artifacts with mitigation steps. These additions will provide direct evidence that the interventions act primarily through the intended concept. revision: yes
-
Referee: [§4] §4 (Experiments): The reported improvements in explanation quality and global insight are stated without specific metrics, effect sizes, baseline comparisons, or statistical tests for the open-set setting. This makes it impossible to evaluate whether the data support the claim that OCCAM outperforms per-image methods while producing a reliable ontology.
Authors: We acknowledge that the experimental reporting requires greater specificity to allow readers to assess the strength of the claims. Although the manuscript already contains comparative results on Broden and ImageNet-S, we will substantially expand §4. The revision will report concrete metrics (e.g., concept localization precision and ontology consistency scores), effect sizes for performance differences, explicit numerical comparisons against per-image baselines such as Grad-CAM and Integrated Gradients, and statistical tests (paired t-tests or Wilcoxon signed-rank tests with p-values) focused on the open-set regime. These changes will make the empirical support for OCCAM’s advantages transparent and reproducible. revision: yes
Circularity Check
No significant circularity: derivation relies on external interventions and dataset aggregation
full rationale
The OCCAM framework discovers concepts in open-set fashion, localizes them with text-guided segmentation, performs object-level removals to compute Δ class confidence as causal estimates, and aggregates those interventional results across Broden and ImageNet-S to induce an ontology. None of these steps reduce by construction to fitted parameters or self-citations; the causal estimates are produced by applying an external segmentation-and-masking procedure to the black-box model outputs, and the ontology is a post-hoc aggregation of those independent measurements. No equations or self-citation chains are shown that would make the final ontology or explanation quality claims tautological with the input concept discovery step. The derivation therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Visual concepts can be discovered and localized in an open-set manner using text guidance
invented entities (1)
-
Structured concept ontology
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
OCCAM discovers visual concepts in an open-set manner, localizes them via text-guided segmentation, and performs object-level interventions by removing concepts to measure changes in class confidence
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
aggregates interventional evidence across a dataset to induce a structured concept ontology
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Yong Hyun Ahn, Hyeon Bae Kim, and Seong Tae Kim. 2024. Www: a unified framework for explaining what where and why of neural networks by interpreta- tion of neuron concepts. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10968–10977
work page 2024
-
[2]
Saeid Asgari, Aliasghar Khani, Amir Hosein Khasahmadi, Aditya Sanghi, Karl DD Willis, and Ali Mahdavi Amiri. 2024. texplain: Post-hoc Textual Explanation of OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models Image Classifiers with Pre-trained Language Models. InICLR 2024 Workshop on Reliable and Responsible Found...
work page 2024
-
[3]
David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Network dissection: Quantifying interpretability of deep visual representations. InProceedings of the IEEE conference on computer vision and pattern recognition. 6541–6549
work page 2017
-
[4]
Tim Berners-Lee, James Hendler, and Ora Lassila. 2023. The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. InLinking the world’s information: essays on Tim Berners-Lee’s invention of the World Wide Web. ACM, 91–103
work page 2023
-
[5]
Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, An- drew Huang, et al. 2025. Sam 3: Segment anything with concepts.arXiv preprint arXiv:2511.16719(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
Simone Carnemolla, Matteo Pennisi, Sarinda Samarasinghe, Giovanni Bellitto, Simone Palazzo, Daniela Giordano, Mubarak Shah, and Concetto Spampinato
-
[7]
DEXTER: Diffusion-Guided EXplanations with TExtual Reasoning for Vision Models.Advances in Neural Information Processing Systems(2025)
work page 2025
-
[8]
J Harry Caufield, Harshad Hegde, Vincent Emonet, Nomi L Harris, Marcin P Joachimiak, Nicolas Matentzoglu, HyeongSik Kim, Sierra Moxon, Justin T Reese, Melissa A Haendel, et al. 2024. Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning.Bioinformatics40, 3 (2024), btae104
work page 2024
-
[9]
Jonathan Crabbé and Mihaela van der Schaar. 2022. Concept activation regions: A generalized framework for concept-based explanations.Advances in Neural Information Processing Systems35 (2022), 2590–2607
work page 2022
-
[10]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Im- agenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition. IEEE, 248–255
work page 2009
-
[11]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929(2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[12]
Thomas Fel, Agustin Picard, Louis Bethune, Thibaut Boissin, David Vigouroux, Julien Colin, Rémi Cadène, and Thomas Serre. 2023. Craft: Concept recursive ac- tivation factorization for explainability. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2711–2721
work page 2023
-
[13]
Ruth Fong and Andrea Vedaldi. 2018. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition. 8730–8738
work page 2018
-
[14]
Ruth C Fong and Andrea Vedaldi. 2017. Interpretable explanations of black boxes by meaningful perturbation. InProceedings of the IEEE international conference on computer vision. 3429–3437
work page 2017
-
[15]
Shanghua Gao, Zhong-Yu Li, Ming-Hsuan Yang, Ming-Ming Cheng, Junwei Han, and Philip Torr. 2022. Large-scale unsupervised semantic segmentation.IEEE transactions on pattern analysis and machine intelligence45, 6 (2022), 7457–7476
work page 2022
-
[16]
Julia García-Fernández, Jack Verhoosel, Jolien Ubacht, and Roos Marieke Bakker
-
[17]
Ontology Engineering with Large Language Models: Unveiling the poten- tial of human-LLM collaboration in the ontology extension process.extraction7 (2025), 15
work page 2025
-
[18]
Birte Glimm, Ian Horrocks, Boris Motik, Giorgos Stoilos, and Zhe Wang. 2014. HermiT: an OWL 2 reasoner.Journal of automated reasoning53, 3 (2014), 245– 269
work page 2014
-
[19]
Thomas R Gruber. 1993. A translation approach to portable ontology specifica- tions.Knowledge acquisition5, 2 (1993), 199–220
work page 1993
- [20]
-
[21]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 770–778
work page 2016
-
[22]
Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, and Zeynep Akata. 2018. Grounding visual explanations. InProceedings of the European conference on computer vision (ECCV). 264–279
work page 2018
-
[23]
Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, et al. 2025. Gemma 3 technical report.arXiv preprint arXiv:2503.19786 4 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fer- nanda Viegas, and Rory Sayres. 2017. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). arXiv [stat. ML]
work page 2017
-
[25]
Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pier- son, Been Kim, and Percy Liang. 2020. Concept bottleneck models. InInternational conference on machine learning. PMLR, 5338–5348
work page 2020
-
[26]
Michihiro Kuroki and Toshihiko Yamasaki. 2025. CE-FAM: Concept-Based Expla- nation via Fusion of Activation Maps. InProceedings of the IEEE/CVF International Conference on Computer Vision. 1413–1422
work page 2025
-
[27]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions.Advances in neural information processing systems30 (2017)
work page 2017
- [28]
-
[29]
Tuomas Oikarinen and Tsui-Wei Weng. 2023. CLIP-Dissect: Automatic De- scription of Neuron Representations in Deep Vision Networks. InThe Eleventh International Conference on Learning Representations. https://openreview.net/ forum?id=iPWiwWHc1V
work page 2023
-
[30]
Amin Parchami-Araghi, Sukrut Rao, Jonas Fischer, and Bernt Schiele. 2025. FaCT: Faithful Concept Traces for Explaining Neural Network Decisions.arXiv preprint arXiv:2510.25512(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[31]
Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, and Marcus Rohrbach. 2018. Multimodal explanations: Justifying decisions and pointing to the evidence. InProceedings of the IEEE conference on computer vision and pattern recognition. 8779–8788
work page 2018
-
[32]
Matteo Pennisi, Giovanni Bellitto, Simone Palazzo, Isaak Kavasidis, Mubarak Shah, and Concetto Spampinato. 2025. Diffexplainer: Towards cross-modal global explanations with diffusion models.Computer Vision and Image Understanding (2025), 104559
work page 2025
-
[33]
Vitali Petsiuk, Abir Das, and Kate Saenko. 2018. Rise: Randomized input sampling for explanation of black-box models.arXiv preprint arXiv:1806.07421(2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[34]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand- hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al
-
[35]
In International conference on machine learning
Learning transferable visual models from natural language supervision. In International conference on machine learning. PmLR, 8748–8763
-
[36]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Em- pirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[37]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144
work page 2016
-
[38]
Fawaz Sammani and Nikos Deligiannis. 2023. Uni-nlx: Unifying textual expla- nations for vision and vision-language tasks. InProceedings of the IEEE/CVF International Conference on Computer Vision. 4634–4639
work page 2023
-
[39]
Fawaz Sammani, Tanmoy Mukherjee, and Nikos Deligiannis. 2022. Nlx-gpt: A model for natural language explanations in vision and vision-language tasks. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8322–8332
work page 2022
-
[40]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedan- tam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE inter- national conference on computer vision. 618–626
work page 2017
-
[41]
Kartik Sharma, Peeyush Kumar, and Yunqing Li. 2025. OG-RAG: ontology- grounded retrieval-augmented generation for large language models. InProceed- ings of the 2025 Conference on Empirical Methods in Natural Language Processing. 32950–32969
work page 2025
-
[42]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps.arXiv preprint arXiv:1312.6034(2013)
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[43]
Suraj Srinivas and François Fleuret. 2019. Full-gradient representation for neural network visualization.Advances in neural information processing systems32 (2019)
work page 2019
-
[44]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. InInternational conference on machine learning. PMLR, 3319– 3328
work page 2017
-
[45]
S Suvorov, A Logachev, A Mashikhin, et al. 2021. LaMa: Resolution-robust large mask inpainting with Fourier convolutions.arXiv preprint(2021)
work page 2021
-
[46]
Jorg Wagner, Jan Mathias Kohler, Tobias Gindele, Leon Hetzel, Jakob Thaddaus Wiedemer, and Sven Behnke. 2019. Interpretable and fine-grained visual ex- planations for convolutional neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9097–9107
work page 2019
-
[47]
Xingyi Yang and Xinchao Wang. 2024. Language model as visual explainer. Advances in Neural Information Processing Systems37 (2024), 135094–135128
work page 2024
- [48]
-
[49]
Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolu- tional networks. InEuropean conference on computer vision. Springer, 818–833
work page 2014
-
[50]
Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. 2023. Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF international conference on computer vision. 11975–11986
work page 2023
-
[51]
Bolei Zhou, Yiyou Sun, David Bau, and Antonio Torralba. 2018. Interpretable basis decomposition for visual explanation. InProceedings of the European Conference Russo et al. on Computer Vision (ECCV). 119–134
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.