pith. sign in

arxiv: 2605.18481 · v1 · pith:BCWTZJOCnew · submitted 2026-05-18 · 💻 cs.AI

OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models

Pith reviewed 2026-05-20 11:06 UTC · model grok-4.3

classification 💻 cs.AI
keywords open-set concept explanationcausal interventionsontology inductionblack-box interpretabilityvision modelsconcept localizationmodel biases
0
0 comments X

The pith

OCCAM estimates causal contributions of visual concepts by removing them from images and induces a global ontology from those effects in black-box vision models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

OCCAM discovers visual concepts without a preset vocabulary, localizes each one with text-guided segmentation, and removes it from an image to measure the resulting drop in a classifier's confidence for a target class. These per-concept interventions are then aggregated across many images to build a structured ontology that maps how the model organizes concepts and which ones depend on others. A sympathetic reader cares because this supplies both local causal attributions for individual decisions and a higher-level view of the model's global conceptual structure, including hidden biases, rather than stopping at per-image heatmaps.

Core claim

OCCAM discovers visual concepts in an open-set manner, localizes them via text-guided segmentation, performs object-level interventions by removing concepts to measure changes in class confidence, and aggregates the interventional evidence across a dataset to induce a structured concept ontology that captures how classifiers globally organize visual concepts, revealing consistent dependencies, latent causal relations, and systematic model biases.

What carries the argument

Object-level interventions that remove text-guided localized concepts from images to quantify their causal effect on model class confidence, followed by aggregation of those effects into an induced ontology.

If this is right

  • Classifiers' decisions decompose into measurable causal contributions from the discovered concepts.
  • Aggregated interventions expose consistent dependencies between concepts in the model's reasoning.
  • Latent causal relations and systematic biases become visible at the global level.
  • Explanation quality rises in open-set black-box settings relative to per-image attribution methods.
  • The induced ontology supplies richer global insight into how the model organizes visual information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same intervention-plus-ontology pipeline could be adapted to non-vision modalities by replacing segmentation with equivalent localization methods.
  • The resulting ontology offers a concrete starting point for auditing whether a model relies on spurious correlations that would break under distribution shift.
  • One could test whether the ontology correctly predicts the effect of removing multiple concepts at once, which the paper does not examine.

Load-bearing premise

Performing object-level interventions by removing localized concepts via text-guided segmentation validly estimates causal contributions without introducing artifacts or confounding factors.

What would settle it

A set of images in which removing a concept according to the segmentation produces confidence changes that contradict the dependencies predicted by the induced ontology, or where the ontology fails to generalize to new images containing the same concepts.

Figures

Figures reproduced from arXiv: 2605.18481 by Chiara Maria Russo, Concetto Spampinato, Daniela Giordano, Matteo Pennisi, Simone Carnemolla, Simone Palazzo.

Figure 1
Figure 1. Figure 1: OCCAM overview. For an input image, open-set concepts are proposed, grounded spatially, and individually removed [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Progressive causal pruning on the volleyball class for different multimodal classifiers. For each model, the top-3 influential concepts identified by OCCAM are removed sequentially (𝑘 = 1, 2, 3). Prediction confidence decreases as explanatory factors are eliminated, revealing how different architectures distribute reliance over object and contextual cues. requiring access to internal representations, confi… view at source ↗
read the original abstract

Interpreting the decisions of deep image classifiers remains challenging, particularly in black-box settings where model internals are inaccessible. We introduce OCCAM, a framework for open-set causal concept explanation and ontology induction in vision models. OCCAM discovers visual concepts in an open-set manner, localizes them via text-guided segmentation, and performs object-level interventions by removing concepts to measure changes in class confidence, estimating each concept's causal contribution. Beyond local explanations, OCCAM aggregates interventional evidence across a dataset to induce a structured concept ontology that captures how classifiers globally organize visual concepts. Reasoning over this ontology reveals consistent dependencies between concepts, exposes latent causal relations, and uncovers systematic model biases. Experiments on Broden and ImageNet-S across multiple classifiers show that OCCAM improves explanation quality in open-set black-box settings while providing richer global insight than per-image attribution methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces OCCAM, a framework for open-set causal concept explanation and ontology induction for black-box vision models. It discovers visual concepts without predefined labels, localizes them via text-guided segmentation, performs object-level interventions by removing localized concepts to measure changes in class confidence, and aggregates interventional evidence across datasets to induce a structured concept ontology. This ontology is used to reveal concept dependencies, latent causal relations, and model biases. Experiments on Broden and ImageNet-S across multiple classifiers claim improved explanation quality over per-image attribution methods and richer global insights.

Significance. If the intervention-based estimates prove robust, OCCAM would offer a meaningful advance in explainable AI by moving beyond local attributions to structured, causal global ontologies in open-set regimes. The aggregation step for ontology induction is a clear strength that could enable falsifiable predictions about model behavior.

major comments (2)
  1. [§3.2] §3.2 (Intervention procedure): The central claim that object-level interventions (text-guided segmentation followed by removal) yield valid causal contributions is load-bearing for both local explanations and the induced ontology. No quantitative validation is provided that segmentation isolates the target concept without boundary leakage, partial occlusion of correlated features, or that the removal (masking/inpainting) alters output solely via the intended concept rather than global statistics or new artifacts.
  2. [§4] §4 (Experiments): The reported improvements in explanation quality and global insight are stated without specific metrics, effect sizes, baseline comparisons, or statistical tests for the open-set setting. This makes it impossible to evaluate whether the data support the claim that OCCAM outperforms per-image methods while producing a reliable ontology.
minor comments (2)
  1. [§3] The notation for causal contribution (Δ confidence) should be formalized with an explicit equation early in the method section to improve clarity.
  2. [Figures] Figure captions for ontology visualizations could include quantitative measures of dependency strength to aid interpretation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments identify key areas where additional rigor can strengthen the presentation of the intervention procedure and experimental results. We respond to each major comment below and commit to revisions that directly address the concerns raised.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Intervention procedure): The central claim that object-level interventions (text-guided segmentation followed by removal) yield valid causal contributions is load-bearing for both local explanations and the induced ontology. No quantitative validation is provided that segmentation isolates the target concept without boundary leakage, partial occlusion of correlated features, or that the removal (masking/inpainting) alters output solely via the intended concept rather than global statistics or new artifacts.

    Authors: We agree that quantitative validation of the intervention is essential to support the causal claims. The original manuscript relies primarily on qualitative visualizations of segmentations and the downstream consistency of the induced ontology to justify the procedure. In revision we will add a dedicated quantitative validation subsection to §3.2. This will include (i) IoU comparisons between text-guided masks and available ground-truth annotations on a sampled subset of the Broden dataset, (ii) controlled ablations that measure output change when removing the target concept versus removing spatially adjacent but semantically uncorrelated regions, and (iii) a brief discussion of inpainting artifacts with mitigation steps. These additions will provide direct evidence that the interventions act primarily through the intended concept. revision: yes

  2. Referee: [§4] §4 (Experiments): The reported improvements in explanation quality and global insight are stated without specific metrics, effect sizes, baseline comparisons, or statistical tests for the open-set setting. This makes it impossible to evaluate whether the data support the claim that OCCAM outperforms per-image methods while producing a reliable ontology.

    Authors: We acknowledge that the experimental reporting requires greater specificity to allow readers to assess the strength of the claims. Although the manuscript already contains comparative results on Broden and ImageNet-S, we will substantially expand §4. The revision will report concrete metrics (e.g., concept localization precision and ontology consistency scores), effect sizes for performance differences, explicit numerical comparisons against per-image baselines such as Grad-CAM and Integrated Gradients, and statistical tests (paired t-tests or Wilcoxon signed-rank tests with p-values) focused on the open-set regime. These changes will make the empirical support for OCCAM’s advantages transparent and reproducible. revision: yes

Circularity Check

0 steps flagged

No significant circularity: derivation relies on external interventions and dataset aggregation

full rationale

The OCCAM framework discovers concepts in open-set fashion, localizes them with text-guided segmentation, performs object-level removals to compute Δ class confidence as causal estimates, and aggregates those interventional results across Broden and ImageNet-S to induce an ontology. None of these steps reduce by construction to fitted parameters or self-citations; the causal estimates are produced by applying an external segmentation-and-masking procedure to the black-box model outputs, and the ontology is a post-hoc aggregation of those independent measurements. No equations or self-citation chains are shown that would make the final ontology or explanation quality claims tautological with the input concept discovery step. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework depends on the assumption that concept removal constitutes a valid causal intervention and that the induced ontology accurately reflects model behavior.

axioms (1)
  • domain assumption Visual concepts can be discovered and localized in an open-set manner using text guidance
    Fundamental to enabling interventions on arbitrary concepts.
invented entities (1)
  • Structured concept ontology no independent evidence
    purpose: To represent global organization of visual concepts by the classifier
    Induced from aggregated interventional evidence across the dataset.

pith-pipeline@v0.9.0 · 5688 in / 1290 out tokens · 58255 ms · 2026-05-20T11:06:07.729921+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 7 internal anchors

  1. [1]

    Yong Hyun Ahn, Hyeon Bae Kim, and Seong Tae Kim. 2024. Www: a unified framework for explaining what where and why of neural networks by interpreta- tion of neuron concepts. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10968–10977

  2. [2]

    Saeid Asgari, Aliasghar Khani, Amir Hosein Khasahmadi, Aditya Sanghi, Karl DD Willis, and Ali Mahdavi Amiri. 2024. texplain: Post-hoc Textual Explanation of OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models Image Classifiers with Pre-trained Language Models. InICLR 2024 Workshop on Reliable and Responsible Found...

  3. [3]

    David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Network dissection: Quantifying interpretability of deep visual representations. InProceedings of the IEEE conference on computer vision and pattern recognition. 6541–6549

  4. [4]

    Tim Berners-Lee, James Hendler, and Ora Lassila. 2023. The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. InLinking the world’s information: essays on Tim Berners-Lee’s invention of the World Wide Web. ACM, 91–103

  5. [5]

    Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, An- drew Huang, et al. 2025. Sam 3: Segment anything with concepts.arXiv preprint arXiv:2511.16719(2025)

  6. [6]

    Simone Carnemolla, Matteo Pennisi, Sarinda Samarasinghe, Giovanni Bellitto, Simone Palazzo, Daniela Giordano, Mubarak Shah, and Concetto Spampinato

  7. [7]

    DEXTER: Diffusion-Guided EXplanations with TExtual Reasoning for Vision Models.Advances in Neural Information Processing Systems(2025)

  8. [8]

    J Harry Caufield, Harshad Hegde, Vincent Emonet, Nomi L Harris, Marcin P Joachimiak, Nicolas Matentzoglu, HyeongSik Kim, Sierra Moxon, Justin T Reese, Melissa A Haendel, et al. 2024. Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning.Bioinformatics40, 3 (2024), btae104

  9. [9]

    Jonathan Crabbé and Mihaela van der Schaar. 2022. Concept activation regions: A generalized framework for concept-based explanations.Advances in Neural Information Processing Systems35 (2022), 2590–2607

  10. [10]

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Im- agenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition. IEEE, 248–255

  11. [11]

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929(2020)

  12. [12]

    Thomas Fel, Agustin Picard, Louis Bethune, Thibaut Boissin, David Vigouroux, Julien Colin, Rémi Cadène, and Thomas Serre. 2023. Craft: Concept recursive ac- tivation factorization for explainability. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2711–2721

  13. [13]

    Ruth Fong and Andrea Vedaldi. 2018. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition. 8730–8738

  14. [14]

    Ruth C Fong and Andrea Vedaldi. 2017. Interpretable explanations of black boxes by meaningful perturbation. InProceedings of the IEEE international conference on computer vision. 3429–3437

  15. [15]

    Shanghua Gao, Zhong-Yu Li, Ming-Hsuan Yang, Ming-Ming Cheng, Junwei Han, and Philip Torr. 2022. Large-scale unsupervised semantic segmentation.IEEE transactions on pattern analysis and machine intelligence45, 6 (2022), 7457–7476

  16. [16]

    Julia García-Fernández, Jack Verhoosel, Jolien Ubacht, and Roos Marieke Bakker

  17. [17]

    Ontology Engineering with Large Language Models: Unveiling the poten- tial of human-LLM collaboration in the ontology extension process.extraction7 (2025), 15

  18. [18]

    Birte Glimm, Ian Horrocks, Boris Motik, Giorgos Stoilos, and Zhe Wang. 2014. HermiT: an OWL 2 reasoner.Journal of automated reasoning53, 3 (2014), 245– 269

  19. [19]

    Thomas R Gruber. 1993. A translation approach to portable ontology specifica- tions.Knowledge acquisition5, 2 (1993), 199–220

  20. [20]

    Haoyu Han, Kai Guo, Harry Shomer, Yu Wang, Yucheng Chu, Hang Li, Li Ma, and Jiliang Tang. 2025. Reasoning by Exploration: A Unified Approach to Retrieval and Generation over Graphs.arXiv preprint arXiv:2510.07484(2025)

  21. [21]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 770–778

  22. [22]

    Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, and Zeynep Akata. 2018. Grounding visual explanations. InProceedings of the European conference on computer vision (ECCV). 264–279

  23. [23]

    Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, et al. 2025. Gemma 3 technical report.arXiv preprint arXiv:2503.19786 4 (2025)

  24. [24]

    Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fer- nanda Viegas, and Rory Sayres. 2017. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). arXiv [stat. ML]

  25. [25]

    Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pier- son, Been Kim, and Percy Liang. 2020. Concept bottleneck models. InInternational conference on machine learning. PMLR, 5338–5348

  26. [26]

    Michihiro Kuroki and Toshihiko Yamasaki. 2025. CE-FAM: Concept-Based Expla- nation via Fusion of Activation Maps. InProceedings of the IEEE/CVF International Conference on Computer Vision. 1413–1422

  27. [27]

    Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions.Advances in neural information processing systems30 (2017)

  28. [28]

    Navapat Nananukul, Yue Zhang, Ryan Lee, Eric Boxer, Jonathan May, Vib- hav Giridhar Gogate, Jay Pujara, and Mayank Kejriwal. 2025. LOGicalThought: Logic-Based Ontological Grounding of LLMs for High-Assurance Reasoning. arXiv preprint arXiv:2510.01530(2025)

  29. [29]

    Tuomas Oikarinen and Tsui-Wei Weng. 2023. CLIP-Dissect: Automatic De- scription of Neuron Representations in Deep Vision Networks. InThe Eleventh International Conference on Learning Representations. https://openreview.net/ forum?id=iPWiwWHc1V

  30. [30]

    Amin Parchami-Araghi, Sukrut Rao, Jonas Fischer, and Bernt Schiele. 2025. FaCT: Faithful Concept Traces for Explaining Neural Network Decisions.arXiv preprint arXiv:2510.25512(2025)

  31. [31]

    Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, and Marcus Rohrbach. 2018. Multimodal explanations: Justifying decisions and pointing to the evidence. InProceedings of the IEEE conference on computer vision and pattern recognition. 8779–8788

  32. [32]

    Matteo Pennisi, Giovanni Bellitto, Simone Palazzo, Isaak Kavasidis, Mubarak Shah, and Concetto Spampinato. 2025. Diffexplainer: Towards cross-modal global explanations with diffusion models.Computer Vision and Image Understanding (2025), 104559

  33. [33]

    Vitali Petsiuk, Abir Das, and Kate Saenko. 2018. Rise: Randomized input sampling for explanation of black-box models.arXiv preprint arXiv:1806.07421(2018)

  34. [34]

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand- hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al

  35. [35]

    In International conference on machine learning

    Learning transferable visual models from natural language supervision. In International conference on machine learning. PmLR, 8748–8763

  36. [36]

    Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Em- pirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084

  37. [37]

    Why should i trust you?

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144

  38. [38]

    Fawaz Sammani and Nikos Deligiannis. 2023. Uni-nlx: Unifying textual expla- nations for vision and vision-language tasks. InProceedings of the IEEE/CVF International Conference on Computer Vision. 4634–4639

  39. [39]

    Fawaz Sammani, Tanmoy Mukherjee, and Nikos Deligiannis. 2022. Nlx-gpt: A model for natural language explanations in vision and vision-language tasks. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8322–8332

  40. [40]

    Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedan- tam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE inter- national conference on computer vision. 618–626

  41. [41]

    Kartik Sharma, Peeyush Kumar, and Yunqing Li. 2025. OG-RAG: ontology- grounded retrieval-augmented generation for large language models. InProceed- ings of the 2025 Conference on Empirical Methods in Natural Language Processing. 32950–32969

  42. [42]

    Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps.arXiv preprint arXiv:1312.6034(2013)

  43. [43]

    Suraj Srinivas and François Fleuret. 2019. Full-gradient representation for neural network visualization.Advances in neural information processing systems32 (2019)

  44. [44]

    Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. InInternational conference on machine learning. PMLR, 3319– 3328

  45. [45]

    S Suvorov, A Logachev, A Mashikhin, et al. 2021. LaMa: Resolution-robust large mask inpainting with Fourier convolutions.arXiv preprint(2021)

  46. [46]

    Jorg Wagner, Jan Mathias Kohler, Tobias Gindele, Leon Hetzel, Jakob Thaddaus Wiedemer, and Sven Behnke. 2019. Interpretable and fine-grained visual ex- planations for convolutional neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9097–9107

  47. [47]

    Xingyi Yang and Xinchao Wang. 2024. Language model as visual explainer. Advances in Neural Information Processing Systems37 (2024), 135094–135128

  48. [48]

    Ă Zablocki, Valentin Gerard, Amaia Cardiel, Eric Gaussier, Matthieu Cord, Ed- uardo Valle, et al. 2024. GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers.arXiv preprint arXiv:2411.15605(2024)

  49. [49]

    Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolu- tional networks. InEuropean conference on computer vision. Springer, 818–833

  50. [50]

    Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. 2023. Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF international conference on computer vision. 11975–11986

  51. [51]

    Bolei Zhou, Yiyou Sun, David Bau, and Antonio Torralba. 2018. Interpretable basis decomposition for visual explanation. InProceedings of the European Conference Russo et al. on Computer Vision (ECCV). 119–134