arxiv: 2605.06640 · v1 · submitted 2026-05-07 · 💻 cs.LG · cs.AI

Recognition: unknown

Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models

Ronaldo Canizales , Divya Gopinath , Corina P\u{a}s\u{a}reanu , Ravi Mangal

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:02 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords concept-based explanationsabductive explanationscontrastive explanationscausal explanationsvision modelsdeep neural networksmodel interpretability

0 comments

The pith

Minimal sets of high-level concepts causally determine vision model predictions and behaviors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper merges concept-based explanations, which use human-understandable high-level concepts, with formal abductive and contrastive explanations that identify minimal causal features. Existing approaches either fail to prove causality or restrict analysis to single concepts or low-level pixels. It defines concept-based abductive and contrastive explanations as the smallest sets of high-level concepts with proven causal impact on model outputs. Algorithms enumerate every minimal set by applying concept erasure to test whether removing a concept changes the prediction. This supports explanations for single images as well as aggregated behaviors across collections of images that share a user-specified pattern.

Core claim

We propose the notion of concept-based abductive and contrastive explanations that capture the minimal sets of high-level concepts causally relevant for model outcomes. We then present a family of algorithms that enumerate all minimal explanations while using concept erasure procedures to establish causal relationships. By appropriately aggregating such explanations, we are not only able to understand model predictions on individual images but also on collections of images where the model exhibits a user-specified, common behavior.

What carries the argument

Concept-based abductive and contrastive explanations, which are minimal sets of high-level concepts verified as causally relevant through enumeration algorithms and concept erasure procedures.

If this is right

Explanations become available for both individual image predictions and shared behaviors across groups of images.
All minimal causal concept sets can be enumerated rather than only single-concept accounts.
High-level concepts replace low-level pixel features, improving user interpretability.
The same framework applies across different models, datasets, and user-defined behaviors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These explanations could highlight when a model depends on spurious high-level concepts that are not semantically relevant.
Aggregated behavior explanations might surface systematic biases that affect entire classes of inputs.
The approach could be adapted to other data types if suitable concept erasure techniques are developed.

Load-bearing premise

Concept erasure procedures reliably establish causal relationships between high-level concepts and model predictions without introducing artifacts or missing interactions.

What would settle it

An experiment where erasing the concepts in a reported minimal explanation leaves the model's prediction unchanged, or where a proper subset of those concepts already alters the prediction.

Figures

Figures reproduced from arXiv: 2605.06640 by Corina P\u{a}s\u{a}reanu, Divya Gopinath, Ravi Mangal, Ronaldo Canizales.

**Figure 1.** Figure 1: Examples of concept-based explanations of model behaviors. Concept polarity is determined by the view at source ↗

**Figure 2.** Figure 2: Concept-based abductive and contrastive explanations for correctly classified images of view at source ↗

**Figure 3.** Figure 3: Overview of pipeline for finding concept-based explanations. The process begins with a vision model view at source ↗

**Figure 4.** Figure 4: Generalizability@K for the model behaviors we analyze. See Section 5.1 and Definition 3.7 for details view at source ↗

**Figure 5.** Figure 5: Maximum Coverage@K for the model behaviors we analyze. More results in Appendix G.2. view at source ↗

**Figure 6.** Figure 6: Individual coverages for the model behaviors we analyze. While the top-ranked explanations show view at source ↗

**Figure 7.** Figure 7: Individual coverage on Mixed Behavior Sets for the models we analyze. view at source ↗

**Figure 8.** Figure 8: Explanation Size vs. Individual Coverage for the model behaviors we analyze. view at source ↗

**Figure 9.** Figure 9: Fraction of the total number of explanations that are fully plausible (green), partially plausible (light view at source ↗

**Figure 10.** Figure 10: Compute time per image, measured in seconds, for the model behaviors we analyze. See Figure 17 view at source ↗

**Figure 11.** Figure 11: Selecting concepts based on (a) average absolute activation strength leads to optimal view at source ↗

**Figure 12.** Figure 12: Each behavior is represented as a 2×3 matrix of subplots, where the columns correspond to erasure algorithms, and the rows correspond to ConCXps and ConAXps, respectively. Inside each subplot, results for all three explanation enumeration algorithms are presented, when available. The data consists of five columns, each one with a Generalizability score at K ∈[1,5]; higher is better, following the logic th… view at source ↗

**Figure 12.** Figure 12: Supplementary experimental results for RQ1. (Metric: Generalizability@K) view at source ↗

**Figure 13.** Figure 13: Supplementary experimental results for RQ2. (Metric: MaximumCoverage@K) view at source ↗

**Figure 14.** Figure 14: Supplementary experimental results for RQ2. (Metric: Individual Coverage) view at source ↗

**Figure 15.** Figure 15: Supplementary experimental results for RQ3. (Metric: |Xp| vs. IndCov) view at source ↗

**Figure 16.** Figure 16: Supplementary experimental results for RQ4. Plots show the fraction of fully plausible (green), view at source ↗

**Figure 17.** Figure 17: Compute time per image, measured in seconds, for additional behaviors. view at source ↗

**Figure 18.** Figure 18: Relative Cumulative Frequency at Length K for all model behaviors we analyze. to note that this model shows a greater tendency to reuse the same concepts across different erasure algorithms than other models. For instance, in behavior BM1 (Deer), the most frequent ConAXp and ConCXp are both {Hunting(+)} via Ortho and LEACE, with relative frequencies of 74% and 58%, respectively. Similarly, in the case of … view at source ↗

**Figure 19.** Figure 19: Pixel-space transformations performed using a diffusion model to remove concepts from images view at source ↗

**Figure 20.** Figure 20: Supplementary examples for pixel-space transformations. Prompt used: ‘Remove the { view at source ↗

**Figure 21.** Figure 21: Supplementary examples for pixel-space transformations. Prompt used: ‘Remove the { view at source ↗

read the original abstract

*Concept-based explanations* offer a promising approach for explaining the predictions of deep neural networks in terms of high-level, human-understandable concepts. However, existing methods either do not establish a causal connection between the concepts and model predictions or are limited in expressivity and only able to infer causal explanations involving single concepts. At the same time, the parallel line of work on *formal abductive and contrastive explanations* computes the minimal set of input features causally relevant for model outcomes but only considers low-level features such as pixels. Merging these two threads, in this work, we propose the notion of *concept-based abductive and contrastive explanations* that capture the minimal sets of high-level concepts causally relevant for model outcomes. We then present a family of algorithms that enumerate all minimal explanations while using *concept erasure* procedures to establish causal relationships. By appropriately aggregating such explanations, we are not only able to understand model predictions on individual images but also on collections of images where the model exhibits a user-specified, common *behavior*. We evaluate our approach on multiple models, datasets, and behaviors, and demonstrate its effectiveness in computing helpful, user-friendly explanations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They merge concept-based explanations with formal abductive/contrastive methods via erasure to get minimal causal concept sets for both single predictions and group behaviors in vision models, but the causality step looks fragile.

read the letter

The core move here is lifting minimal abductive and contrastive explanations from low-level features up to high-level concepts, then using concept erasure to test which sets actually change the model output. They also aggregate those explanations to cover user-specified behaviors across collections of images rather than just one at a time. That framing is new and cleanly stated in the abstract. The algorithms for enumerating all minimal sets and the multi-model, multi-dataset evaluation show they have a workable implementation path and some practical results to report. Credit for connecting the two threads without obvious circularity or invented entities. The main weakness is the load-bearing assumption that erasure procedures isolate the causal effect of a concept without side effects. In vision models concepts are usually entangled in the latent space, so masking or projecting one can leave residuals or trigger compensatory shifts in others. That directly threatens whether the enumerated sets are truly minimal or causal. The abstract gives no detail on how they check for clean interventions or handle interactions, so the soundness claim stays unverified at this stage. This is aimed at XAI people who already work with concept activations and want formal minimality guarantees plus group-level analysis. A reader focused on debugging deployed vision systems could extract usable ideas from the aggregation step and the enumeration approach. I would send it to peer review because the new framing is clear and the empirical scope is reasonable, even though the causality part will need tighter validation or caveats in revision.

Referee Report

2 major / 3 minor

Summary. The paper introduces the notion of concept-based abductive and contrastive explanations, defined as the minimal sets of high-level concepts that are causally relevant to a vision model's predictions or to user-specified behaviors across collections of images. It presents a family of algorithms that enumerate all such minimal explanations by applying concept erasure procedures to test causal effects, and demonstrates aggregation of these explanations to characterize model behaviors. The approach is evaluated empirically on multiple models, datasets, and behaviors.

Significance. If the erasure procedures reliably isolate causal influences without residual effects or compensatory changes, the work would usefully bridge formal abductive explanation methods (which guarantee minimality) with concept-based interpretability, extending beyond single-concept or pixel-level explanations. The multi-model and multi-behavior evaluation is a strength, as is the focus on aggregated behaviors rather than isolated predictions. The contribution is incremental but potentially impactful for debugging vision models if the causality assumption holds.

major comments (2)

[§3] §3 (Definition of concept-based explanations): The claim that erasure establishes causal relevance for minimality rests on the assumption that erasure removes only the target concept's influence. The manuscript describes standard erasure techniques but provides no formal argument or diagnostic that entanglement or downstream interactions are avoided, which is load-bearing for the soundness of the enumerated minimal sets.
[§5] §5 (Experimental evaluation): While results are shown across models and datasets, there are no control experiments (e.g., synthetic data with known concept correlations or measurements of non-target concept activations post-erasure) to verify that erasure produces clean counterfactuals. This directly affects the validity of the causal claims underlying both individual and aggregated behavior explanations.

minor comments (3)

[§3] The notation distinguishing abductive from contrastive explanations could be made more explicit with a small example in the method section.
[Figure 4] Figure captions for behavior aggregation visualizations would benefit from additional detail on how common behaviors are operationalized across images.
[§2] A few recent papers on causal interventions in concept spaces are missing from the related work.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback. We address each major comment below, clarifying our position and outlining the revisions we will incorporate to strengthen the manuscript's treatment of causal assumptions.

read point-by-point responses

Referee: [§3] §3 (Definition of concept-based explanations): The claim that erasure establishes causal relevance for minimality rests on the assumption that erasure removes only the target concept's influence. The manuscript describes standard erasure techniques but provides no formal argument or diagnostic that entanglement or downstream interactions are avoided, which is load-bearing for the soundness of the enumerated minimal sets.

Authors: We agree that the validity of the minimal explanations hinges on the erasure procedure isolating the target concept's influence. Our work relies on standard erasure techniques established in the concept interpretability literature, which have been empirically validated in prior studies for approximating interventions. However, we acknowledge that the current manuscript does not provide an explicit formal argument or built-in diagnostics for ruling out entanglement or compensatory effects. In the revised version, we will expand the discussion in §3 to explicitly state the assumptions underlying these erasure methods, reference relevant literature on concept entanglement, and introduce basic post-erasure diagnostics (e.g., activation monitoring of non-target concepts) as part of the algorithm description. These additions will clarify the scope of our causal claims without claiming a full formal guarantee. revision: partial
Referee: [§5] §5 (Experimental evaluation): While results are shown across models and datasets, there are no control experiments (e.g., synthetic data with known concept correlations or measurements of non-target concept activations post-erasure) to verify that erasure produces clean counterfactuals. This directly affects the validity of the causal claims underlying both individual and aggregated behavior explanations.

Authors: This observation is correct and highlights a gap in the empirical validation. While our evaluations demonstrate the approach across diverse models, datasets, and behaviors, we did not include dedicated controls for verifying counterfactual cleanliness. In the revised manuscript, we will add such controls to §5: specifically, experiments on synthetic data with known ground-truth concept correlations, along with quantitative measurements of non-target concept activations before and after erasure. These will be presented in new tables or figures to directly support the causal interpretations for both individual and aggregated explanations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; algorithmic proposal builds on external erasure methods without self-referential reduction

full rationale

The paper introduces a new notion of concept-based abductive/contrastive explanations and algorithms to enumerate minimal concept sets, relying on concept erasure procedures to establish causality. No equations, fitted parameters, or derivations are presented that reduce by construction to the inputs. The central definitions and algorithms are self-contained algorithmic contributions that invoke external (non-self-cited in a load-bearing way) erasure techniques rather than deriving causality internally or renaming known results. No self-citation chains, uniqueness theorems from prior author work, or ansatzes smuggled via citation appear in the provided text. This is the common case of an honest non-finding for an algorithmic paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that concept erasure can isolate causal effects of high-level concepts. No free parameters or new invented entities are introduced in the abstract.

axioms (1)

domain assumption Concept erasure procedures can be used to establish causal relationships between high-level concepts and model predictions
The algorithms rely on this to link concepts to outcomes; it is invoked when describing how explanations are computed.

pith-pipeline@v0.9.0 · 5521 in / 1164 out tokens · 48131 ms · 2026-05-08T12:02:08.921646+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

82 extracted references · 33 canonical work pages · 5 internal anchors

[1]

Advances in Neural Information Processing Systems , volume=

Interpreting clip with sparse linear concept embeddings (splice) , author=. Advances in Neural Information Processing Systems , volume=
[2]

Pasareanu and Nina Narodytska and Ravi Mangal and Susmit Jha , title =

Boyue Caroline Hu and Divya Gopinath and Corina S. Pasareanu and Nina Narodytska and Ravi Mangal and Susmit Jha , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2503.17416 , eprinttype =. 2503.17416 , timestamp =

work page doi:10.48550/arxiv.2503.17416 2025
[3]

Cai and James Wexler and Fernanda B

Been Kim and Martin Wattenberg and Justin Gilmer and Carrie J. Cai and James Wexler and Fernanda B. Vi. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors. Proceedings of the 35th International Conference on Machine Learning,. 2018 , url =

2018
[4]

Yanai Elazar and Shauli Ravfogel and Alon Jacovi and Yoav Goldberg , title =. Trans. Assoc. Comput. Linguistics , volume =. 2021 , url =. doi:10.1162/TACL\_A\_00359 , timestamp =

work page internal anchor Pith review doi:10.1162/tacl 2021
[5]

Shauli Ravfogel, Grusha Prasad, Tal Linzen, and Yoav Goldberg

Shauli Ravfogel and Grusha Prasad and Tal Linzen and Yoav Goldberg , editor =. Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction , booktitle =. 2021 , url =. doi:10.18653/V1/2021.CONLL-1.15 , timestamp =

work page doi:10.18653/v1/2021.conll-1.15 2021
[6]

The Causal-Neural Connection: Expressiveness, Learnability, and Inference , booktitle =

Kevin Xia and Kai. The Causal-Neural Connection: Expressiveness, Learnability, and Inference , booktitle =. 2021 , url =

2021
[7]

Nora Belrose and David Schneider. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , year =

2023
[8]

Causal Analysis for Robust Interpretability of Neural Networks , booktitle =

Ola Ahmad and Nicolas B. Causal Analysis for Robust Interpretability of Neural Networks , booktitle =. 2024 , url =. doi:10.1109/WACV57701.2024.00462 , timestamp =

work page doi:10.1109/wacv57701.2024.00462 2024
[9]

Balasubramanian and Satyanarayan Kar , editor =

Abbavaram Gowtham Reddy and Saketh Bachu and Harsharaj Pathak and Benin Godfrey L and Varshaneya V and Vineeth N. Balasubramanian and Satyanarayan Kar , editor =. Towards Learning and Explaining Indirect Causal Effects in Neural Networks , booktitle =. 2024 , url =. doi:10.1609/AAAI.V38I13.29399 , timestamp =

work page doi:10.1609/aaai.v38i13.29399 2024
[10]

CUBE: Causal Intervention-Based Counterfactual Explanation for Prediction Models , year=

Shao, Xinyue and Wang, Hongzhi and Chen, Xiang and Zhu, Xiao and Zhang, Yan , journal=. CUBE: Causal Intervention-Based Counterfactual Explanation for Prediction Models , year=
[11]

International Conference on Machine Learning , pages=

Text-to-concept (and back) via cross-model alignment , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023
[12]

International Conference of the Italian Association for Artificial Intelligence , pages=

From contrastive to abductive explanations and back again , author=. International Conference of the Italian Association for Artificial Intelligence , pages=. 2020 , organization=

2020
[13]

International Symposium on AI Verification , pages=

Concept-based analysis of neural networks via vision-language models , author=. International Symposium on AI Verification , pages=. 2024 , organization=

2024
[14]

Abduction-Based Explanations for Machine Learning Models , booktitle =

Alexey Ignatiev and Nina Narodytska and Jo. Abduction-Based Explanations for Machine Learning Models , booktitle =
[15]

AI*IA 2020 -- Advances in Artificial Intelligence , editor =

Alexey Ignatiev and Nina Narodytska and Nicholas Asher and Joao Marques-Silva , title =. AI*IA 2020 -- Advances in Artificial Intelligence , editor =

2020
[16]

Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE 2024) , pages =

Xinyue Shao and Hongzhi Wang and Xiang Chen and Xiao Zhu and Yan Zhang , title =. Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE 2024) , pages =

2024
[17]

Formal Explanations of Neural Network Predictions , booktitle =

Alexey Ignatiev and Nina Narodytska and Jo. Formal Explanations of Neural Network Predictions , booktitle =
[18]

Godfrey and Varshaneya V and Vineeth N

Abbavaram Gowtham Reddy and Saketh Bachu and Harsharaj Pathak and Benin L. Godfrey and Varshaneya V and Vineeth N. Balasubramanian and Satyanarayan Kar , title =. Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI 2024) , volume =

2024
[19]

SAT-based Rigorous Explanations for Decision Lists , booktitle =

Alexey Ignatiev and Jo. SAT-based Rigorous Explanations for Decision Lists , booktitle =
[20]

On Explaining Random Forests with

Hadi Izza and Jo. On Explaining Random Forests with. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI 2021) , pages =

2021
[21]

Alexey Ignatiev and Hadi Izza and Jo. Using. Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2022) , volume =

2022
[22]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI 2023) , pages =

Gilles Audemard and Steve Bellart and Jean-Marie Lagniez and Pierre Marquis , title =. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI 2023) , pages =

2023
[23]

Proceedings of the Twenty-Sixth European Conference on Artificial Intelligence (ECAI 2023) , pages =

Gilles Audemard and Jean-Marie Lagniez and Pierre Marquis and Nicolas Szczepanski , title =. Proceedings of the Twenty-Sixth European Conference on Artificial Intelligence (ECAI 2023) , pages =

2023
[24]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI 2024) , pages =

Gilles Audemard and Jean-Marie Lagniez and Pierre Marquis and Nicolas Szczepanski , title =. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI 2024) , pages =

2024
[25]

Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2023) , series =

Shahaf Bassan and Guy Katz , title =. Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2023) , series =

2023
[26]

Stuckey , title =

Jinqiang Yu and Graham Farr and Alexey Ignatiev and Peter J. Stuckey , title =. 27th International Conference on Theory and Applications of Satisfiability Testing (SAT 2024) , series =

2024
[27]

Shieber , title =

Jesse Vig and Sebastian Gehrmann and Yonatan Belinkov and Sharon Qian and Daniel Nevo and Yaron Singer and Stuart M. Shieber , title =. Advances in Neural Information Processing Systems 33 (NeurIPS 2020) , year =

2020
[28]

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) , pages =

Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov , title =. Advances in Neural Information Processing Systems 35 (NeurIPS 2022) , pages =

2022
[29]

Mavor-Parker and Aengus Lynch and Stefan Heimersheim and Adri

Arthur Conmy and Augustine N. Mavor-Parker and Aengus Lynch and Stefan Heimersheim and Adri. Towards Automated Circuit Discovery for Mechanistic Interpretability , booktitle =
[30]

International Conference on Learning Representations (ICLR 2024) , year =

Fred Zhang and Neel Nanda , title =. International Conference on Learning Representations (ICLR 2024) , year =

2024
[31]

Causal Analysis for Robust Interpretability of Neural Networks , booktitle =

Ola Ahmad and Nicolas B. Causal Analysis for Robust Interpretability of Neural Networks , booktitle =
[32]

Advances in Neural Information Processing Systems 34 (NeurIPS 2021) , pages =

Kevin Xia and Kai-Zhan Lee and Yoshua Bengio and Elias Bareinboim , title =. Advances in Neural Information Processing Systems 34 (NeurIPS 2021) , pages =

2021
[33]

Advances in Neural Information Processing Systems 36 (NeurIPS 2023) , year =

Nora Belrose and David Schneider-Joseph and Shauli Ravfogel and Ryan Cotterell and Edward Raff and Stella Biderman , title =. Advances in Neural Information Processing Systems 36 (NeurIPS 2023) , year =

2023
[34]

Transactions of the Association for Computational Linguistics , volume =

Yanai Elazar and Shauli Ravfogel and Alon Jacovi and Yoav Goldberg , title =. Transactions of the Association for Computational Linguistics , volume =
[35]

Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL 2021) , pages =

Shauli Ravfogel and Grusha Prasad and Tal Linzen and Yoav Goldberg , title =. Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL 2021) , pages =

2021
[36]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023) , pages =

Siwon Kim and Jinoh Oh and Sungjin Lee and Seunghak Yu and Jaeyoung Do and Tara Taghavi , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023) , pages =

2023
[37]

Efros and Jacob Steinhardt , title =

Yossi Gandelsman and Alexei A. Efros and Jacob Steinhardt , title =. International Conference on Learning Representations (ICLR 2024) , year =

2024
[38]

Decomposing and Interpreting Image Representations via Text in

Sriram Balasubramanian and Samyadeep Basu and Soheil Feizi , journal =. Decomposing and Interpreting Image Representations via Text in
[39]

In- terpreting the linear structure of vision-language model embedding spaces.arXiv preprint arXiv:2504.11695, 2025

Interpreting the Linear Structure of Vision-Language Model Embedding Spaces , author =. arXiv preprint arXiv:2504.11695 , year =

work page arXiv
[40]

Pasareanu and Nina Narodytska and Ravi Mangal and Susmit Jha , title =

Boyue Caroline Hu and Divya Gopinath and Corina S. Pasareanu and Nina Narodytska and Ravi Mangal and Susmit Jha , title =. Proceedings of the 4th IEEE/ACM International Conference on AI Engineering - Software Engineering for AI (CAIN 2025) , pages =

2025
[41]

Edward Kim and Divya Gopinath and Corina S. P. A Programmatic and Semantic Approach to Explaining and Debugging Neural Network Based Object Detectors , booktitle =
[42]

Proceedings of the 23rd Conference on Formal Methods in Computer-Aided Design (FMCAD 2023) , editor =

Shahaf Bassan and Guy Amir and Davide Corsi and Idan Refaeli and Guy Katz , title =. Proceedings of the 23rd Conference on Formal Methods in Computer-Aided Design (FMCAD 2023) , editor =

2023
[43]

From Causal to Concept-Based Representation Learning , booktitle =

Goutham Rajendran and Simon Buchholz and Bryon Aragam and Bernhard Sch. From Causal to Concept-Based Representation Learning , booktitle =
[44]

and Harrison, Phil and Murray, William R

Clark, Peter and Fellbaum, Christiane and Hobbs, Jerry R. and Harrison, Phil and Murray, William R. and Thompson, John , editor=. Augmenting WordNet for Deep Understanding of Text , url=. Semantics in Text Processing. STEP 2008 Conference Proceedings , publisher=. 2008 , pages=

2008
[45]

Extending and Improving Wordnet via Unsupervised Word Embeddings , url=

Khodak, Mikhail and Risteski, Andrej and Fellbaum, Christiane and Arora, Sanjeev , year=. Extending and Improving Wordnet via Unsupervised Word Embeddings , url=. doi:10.48550/arXiv.1705.00217 , abstractNote=

work page doi:10.48550/arxiv.1705.00217
[46]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , month =

A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , month =
[47]

EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification

Helber, Patrick and Bischke, Benjamin and Dengel, Andreas and Borth, Damian , year=. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , url=. doi:10.48550/arXiv.1709.00029 , note=

work page Pith review doi:10.48550/arxiv.1709.00029
[48]

Learning Transferable Visual Models From Natural Language Supervision

Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and Krueger, Gretchen and Sutskever, Ilya , year=. Learning Transferable Visual Models From Natural Language Supervision , url=. doi:10.48550/arXiv.2103.00020 , note=

work page internal anchor Pith review doi:10.48550/arxiv.2103.00020
[49]

Deep Residual Learning for Image Recognition

He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , year=. Deep Residual Learning for Image Recognition , url=. doi:10.48550/arXiv.1512.03385 , note=

work page internal anchor Pith review doi:10.48550/arxiv.1512.03385
[50]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, Karen and Zisserman, Andrew , year=. Very Deep Convolutional Networks for Large-Scale Image Recognition , url=. doi:10.48550/arXiv.1409.1556 , note=

work page Pith review doi:10.48550/arxiv.1409.1556
[51]

and Zhu, Xiao Xiang , year=

Wang, Yi and Braham, Nassim Ait Ali and Xiong, Zhitong and Liu, Chenying and Albrecht, Conrad M. and Zhu, Xiao Xiang , year=. SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation , url=. doi:10.48550/arXiv.2211.07044 , note=

work page doi:10.48550/arxiv.2211.07044
[52]

Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing

Explainability Fact Sheets: A Framework for Systematic Assessment of Explainable Approaches , url=. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency , author=. 2020 , month=jan, pages=. doi:10.1145/3351095.3372870 , note=

work page doi:10.1145/3351095.3372870 2020
[53]

Towards Faithful Model Explanation in NLP: A Survey , shorttitle =

Lyu, Qing and Apidianaki, Marianna and Callison-Burch, Chris , year=. Towards Faithful Model Explanation in NLP: A Survey , url=. doi:10.48550/arXiv.2209.11326 , note=

work page doi:10.48550/arxiv.2209.11326
[54]

Vera and Vaughan, Jennifer Wortman , year =

Liao, Q. Vera and Vaughan, Jennifer Wortman , year=. AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap , url=. doi:10.48550/arXiv.2306.01941 , note=

work page doi:10.48550/arxiv.2306.01941
[55]

and Li, Chun-Liang and Pfister, Tomas and Ravikumar, Pradeep , year=

Yeh, Chih-Kuan and Kim, Been and Arik, Sercan O. and Li, Chun-Liang and Pfister, Tomas and Ravikumar, Pradeep , year=. On Completeness-aware Concept-Based Explanations in Deep Neural Networks , url=. doi:10.48550/arXiv.1910.07969 , note=

work page doi:10.48550/arxiv.1910.07969 1910
[56]

plausibility: On the (un) reliability of explanations from large language models , author=

Agarwal, Chirag and Tanneru, Sree Harsha and Lakkaraju, Himabindu , year=. Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models , url=. doi:10.48550/arXiv.2402.04614 , note=

work page doi:10.48550/arxiv.2402.04614
[57]

and Yeung-Levy, Serena , year=

Dunlap, Lisa and Zhang, Yuhui and Wang, Xiaohan and Zhong, Ruiqi and Darrell, Trevor and Steinhardt, Jacob and Gonzalez, Joseph E. and Yeung-Levy, Serena , year=. Describing Differences in Image Sets with Natural Language , url=. doi:10.48550/arXiv.2312.02974 , note=

work page doi:10.48550/arxiv.2312.02974
[58]

Label-free concept bottleneck models.arXiv preprint arXiv:2304.06129, 2023

Oikarinen, Tuomas and Das, Subhro and Nguyen, Lam M. and Weng, Tsui-Wei , year=. Label-Free Concept Bottleneck Models , url=. doi:10.48550/arXiv.2304.06129 , note=

work page doi:10.48550/arxiv.2304.06129
[59]

Qwen-Image Technical Report

Wu, Chenfei and Li, Jiahao and Zhou, Jingren and Lin, Junyang and Gao, Kaiyuan and Yan, Kun and Yin, Sheng-ming and Bai, Shuai and Xu, Xiao and Chen, Yilei and Chen, Yuxiang and Tang, Zecheng and Zhang, Zekai and Wang, Zhengyi and Yang, An and Yu, Bowen and Cheng, Chen and Liu, Dayiheng and Li, Deqing and Zhang, Hang and Meng, Hao and Wei, Hu and Ni, Jing...

work page internal anchor Pith review doi:10.48550/arxiv.2508.02324
[60]

Occam’ s Razor , volume=

Rasmussen, Carl and Ghahramani, Zoubin , year=. Occam’ s Razor , volume=. Advances in Neural Information Processing Systems , publisher=
[61]

Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent , url=

Li, Christy and Camuñas, Josep Lopez and Touchet, Jake Thomas and Andreas, Jacob and Lapedriza, Agata and Torralba, Antonio and Shaham, Tamar Rott , year=. Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent , url=. doi:10.48550/arXiv.2510.21704 , note=

work page doi:10.48550/arxiv.2510.21704
[62]

and Doran, Gary and Francis, Raymond and Lee, Jake and Mandrake, Lukas , year=

Wagstaff, Kiri and Lu, Steven and Dunkel, Emily and Grimes, Kevin and Zhao, Brandon and Cai, Jesse and Cole, Shoshanna B. and Doran, Gary and Francis, Raymond and Lee, Jake and Mandrake, Lukas , year=. Mars Image Content Classification: Three Years of NASA Deployment and Recent Advances , url=. doi:10.48550/arXiv.2102.05011 , note=

work page doi:10.48550/arxiv.2102.05011
[63]

Jia, C., Yang, Y ., Xia, Y ., Chen, Y .-T., Parekh, Z., Pham, H., Le, Q., Sung, Y .-H., Li, Z., and Duerig, T

Swan, R. Michael and Atha, Deegan and Leopold, Henry A. and Gildner, Matthew and Oij, Stephanie and Chiu, Cindy and Ono, Masahiro , year=. AI4MARS: A Dataset for Terrain-Aware Autonomous Driving on Mars , rights=. doi:10.1109/CVPRW53098.2021.00226 , booktitle=

work page doi:10.1109/cvprw53098.2021.00226 2021
[64]

doi:10.2514/1.A35767 , note=

work page doi:10.2514/1.a35767
[65]

CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning , url=

Lee, Dongmyeong and Adkins, Amanda and Biswas, Joydeep , year=. CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning , url=. doi:10.48550/arXiv.2407.09718 , note=

work page doi:10.48550/arxiv.2407.09718
[66]

Unifying Formal Explanations: A Complexity-Theoretic Perspective , url=

Bassan, Shahaf and Huang, Xuanxiang and Katz, Guy , year=. Unifying Formal Explanations: A Complexity-Theoretic Perspective , url=. doi:10.48550/arXiv.2602.18160 , note=

work page doi:10.48550/arxiv.2602.18160
[67]

On Finding Minimum Satisfying Assignments , volume=

Ignatiev, Alexey and Previti, Alessandro and Marques-Silva, Joao , editor=. On Finding Minimum Satisfying Assignments , volume=. 2016 , pages=. doi:10.1007/978-3-319-44953-1_19 , booktitle=

work page doi:10.1007/978-3-319-44953-1_19 2016
[68]

Concept Bottleneck Models , url=

Koh, Pang Wei and Nguyen, Thao and Tang, Yew Siang and Mussmann, Stephen and Pierson, Emma and Kim, Been and Liang, Percy , year=. Concept Bottleneck Models , url=. doi:10.48550/arXiv.2007.04612 , note=

work page doi:10.48550/arxiv.2007.04612 2007
[69]

Sparse Autoencoders Find Highly Interpretable Features in Language Models

Cunningham, Hoagy and Ewart, Aidan and Riggs, Logan and Huben, Robert and Sharkey, Lee , year=. Sparse Autoencoders Find Highly Interpretable Features in Language Models , url=. doi:10.48550/arXiv.2309.08600 , note=

work page Pith review doi:10.48550/arxiv.2309.08600
[70]

Microsoft COCO: Common Objects in Context

Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Bourdev, Lubomir and Girshick, Ross and Hays, James and Perona, Pietro and Ramanan, Deva and Zitnick, C. Lawrence and Dollár, Piotr , year=. Microsoft COCO: Common Objects in Context , url=. doi:10.48550/arXiv.1405.0312 , note=

work page internal anchor Pith review doi:10.48550/arxiv.1405.0312
[71]

Mechanistic understanding and validation of large AI models with SemanticLens , volume=

Dreyer, Maximilian and Berend, Jim and Labarta, Tobias and Vielhaben, Johanna and Wiegand, Thomas and Lapuschkin, Sebastian and Samek, Wojciech , year=. Mechanistic understanding and validation of large AI models with SemanticLens , volume=. Nature Machine Intelligence , publisher=. doi:10.1038/s42256-025-01084-w , number=

work page doi:10.1038/s42256-025-01084-w
[72]

Gemini 3: A Family of Highly Capable Multimodal Reasoning Models , year =
[73]

Hochbaum, D.S. , year=. Approximation Algorithms for NP-hard Problems , ISBN=
[74]

Linearly mapping from image to text space

Merullo, Jack and Castricato, Louis and Eickhoff, Carsten and Pavlick, Ellie , year=. Linearly Mapping from Image to Text Space , url=. doi:10.48550/arXiv.2209.15162 , note=

work page doi:10.48550/arxiv.2209.15162
[75]

and Kim, Sunnie S

Ramaswamy, Vikram V. and Kim, Sunnie S. Y. and Fong, Ruth and Russakovsky, Olga , urldate =. Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability , url =. doi:10.48550/arXiv.2207.09615 , shorttitle =. 2207.09615 [cs] , keywords =

work page doi:10.48550/arxiv.2207.09615
[76]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Preserving Task-Relevant Information Under Linear Concept Removal , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
[77]

International Conference on Machine Learning , pages=

Linear adversarial concept erasure , author=. International Conference on Machine Learning , pages=. 2022 , organization=

2022
[78]

Delivering Trustworthy AI through Formal XAI , volume =

Marques-Silva, Joao and Ignatiev, Alexey , journal =. Delivering Trustworthy AI through Formal XAI , volume =
[79]

The Fourteenth International Conference on Learning Representations , year=

Unifying Complexity-Theoretic Perspectives on Provable Explanations , author=. The Fourteenth International Conference on Learning Representations , year=
[80]

2022 , journal=

Toy Models of Superposition , author=. 2022 , journal=

2022

Showing first 80 references.