Recognition: unknown
Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models
Pith reviewed 2026-05-08 12:02 UTC · model grok-4.3
The pith
Minimal sets of high-level concepts causally determine vision model predictions and behaviors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose the notion of concept-based abductive and contrastive explanations that capture the minimal sets of high-level concepts causally relevant for model outcomes. We then present a family of algorithms that enumerate all minimal explanations while using concept erasure procedures to establish causal relationships. By appropriately aggregating such explanations, we are not only able to understand model predictions on individual images but also on collections of images where the model exhibits a user-specified, common behavior.
What carries the argument
Concept-based abductive and contrastive explanations, which are minimal sets of high-level concepts verified as causally relevant through enumeration algorithms and concept erasure procedures.
If this is right
- Explanations become available for both individual image predictions and shared behaviors across groups of images.
- All minimal causal concept sets can be enumerated rather than only single-concept accounts.
- High-level concepts replace low-level pixel features, improving user interpretability.
- The same framework applies across different models, datasets, and user-defined behaviors.
Where Pith is reading between the lines
- These explanations could highlight when a model depends on spurious high-level concepts that are not semantically relevant.
- Aggregated behavior explanations might surface systematic biases that affect entire classes of inputs.
- The approach could be adapted to other data types if suitable concept erasure techniques are developed.
Load-bearing premise
Concept erasure procedures reliably establish causal relationships between high-level concepts and model predictions without introducing artifacts or missing interactions.
What would settle it
An experiment where erasing the concepts in a reported minimal explanation leaves the model's prediction unchanged, or where a proper subset of those concepts already alters the prediction.
Figures
read the original abstract
*Concept-based explanations* offer a promising approach for explaining the predictions of deep neural networks in terms of high-level, human-understandable concepts. However, existing methods either do not establish a causal connection between the concepts and model predictions or are limited in expressivity and only able to infer causal explanations involving single concepts. At the same time, the parallel line of work on *formal abductive and contrastive explanations* computes the minimal set of input features causally relevant for model outcomes but only considers low-level features such as pixels. Merging these two threads, in this work, we propose the notion of *concept-based abductive and contrastive explanations* that capture the minimal sets of high-level concepts causally relevant for model outcomes. We then present a family of algorithms that enumerate all minimal explanations while using *concept erasure* procedures to establish causal relationships. By appropriately aggregating such explanations, we are not only able to understand model predictions on individual images but also on collections of images where the model exhibits a user-specified, common *behavior*. We evaluate our approach on multiple models, datasets, and behaviors, and demonstrate its effectiveness in computing helpful, user-friendly explanations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the notion of concept-based abductive and contrastive explanations, defined as the minimal sets of high-level concepts that are causally relevant to a vision model's predictions or to user-specified behaviors across collections of images. It presents a family of algorithms that enumerate all such minimal explanations by applying concept erasure procedures to test causal effects, and demonstrates aggregation of these explanations to characterize model behaviors. The approach is evaluated empirically on multiple models, datasets, and behaviors.
Significance. If the erasure procedures reliably isolate causal influences without residual effects or compensatory changes, the work would usefully bridge formal abductive explanation methods (which guarantee minimality) with concept-based interpretability, extending beyond single-concept or pixel-level explanations. The multi-model and multi-behavior evaluation is a strength, as is the focus on aggregated behaviors rather than isolated predictions. The contribution is incremental but potentially impactful for debugging vision models if the causality assumption holds.
major comments (2)
- [§3] §3 (Definition of concept-based explanations): The claim that erasure establishes causal relevance for minimality rests on the assumption that erasure removes only the target concept's influence. The manuscript describes standard erasure techniques but provides no formal argument or diagnostic that entanglement or downstream interactions are avoided, which is load-bearing for the soundness of the enumerated minimal sets.
- [§5] §5 (Experimental evaluation): While results are shown across models and datasets, there are no control experiments (e.g., synthetic data with known concept correlations or measurements of non-target concept activations post-erasure) to verify that erasure produces clean counterfactuals. This directly affects the validity of the causal claims underlying both individual and aggregated behavior explanations.
minor comments (3)
- [§3] The notation distinguishing abductive from contrastive explanations could be made more explicit with a small example in the method section.
- [Figure 4] Figure captions for behavior aggregation visualizations would benefit from additional detail on how common behaviors are operationalized across images.
- [§2] A few recent papers on causal interventions in concept spaces are missing from the related work.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback. We address each major comment below, clarifying our position and outlining the revisions we will incorporate to strengthen the manuscript's treatment of causal assumptions.
read point-by-point responses
-
Referee: [§3] §3 (Definition of concept-based explanations): The claim that erasure establishes causal relevance for minimality rests on the assumption that erasure removes only the target concept's influence. The manuscript describes standard erasure techniques but provides no formal argument or diagnostic that entanglement or downstream interactions are avoided, which is load-bearing for the soundness of the enumerated minimal sets.
Authors: We agree that the validity of the minimal explanations hinges on the erasure procedure isolating the target concept's influence. Our work relies on standard erasure techniques established in the concept interpretability literature, which have been empirically validated in prior studies for approximating interventions. However, we acknowledge that the current manuscript does not provide an explicit formal argument or built-in diagnostics for ruling out entanglement or compensatory effects. In the revised version, we will expand the discussion in §3 to explicitly state the assumptions underlying these erasure methods, reference relevant literature on concept entanglement, and introduce basic post-erasure diagnostics (e.g., activation monitoring of non-target concepts) as part of the algorithm description. These additions will clarify the scope of our causal claims without claiming a full formal guarantee. revision: partial
-
Referee: [§5] §5 (Experimental evaluation): While results are shown across models and datasets, there are no control experiments (e.g., synthetic data with known concept correlations or measurements of non-target concept activations post-erasure) to verify that erasure produces clean counterfactuals. This directly affects the validity of the causal claims underlying both individual and aggregated behavior explanations.
Authors: This observation is correct and highlights a gap in the empirical validation. While our evaluations demonstrate the approach across diverse models, datasets, and behaviors, we did not include dedicated controls for verifying counterfactual cleanliness. In the revised manuscript, we will add such controls to §5: specifically, experiments on synthetic data with known ground-truth concept correlations, along with quantitative measurements of non-target concept activations before and after erasure. These will be presented in new tables or figures to directly support the causal interpretations for both individual and aggregated explanations. revision: yes
Circularity Check
No significant circularity; algorithmic proposal builds on external erasure methods without self-referential reduction
full rationale
The paper introduces a new notion of concept-based abductive/contrastive explanations and algorithms to enumerate minimal concept sets, relying on concept erasure procedures to establish causality. No equations, fitted parameters, or derivations are presented that reduce by construction to the inputs. The central definitions and algorithms are self-contained algorithmic contributions that invoke external (non-self-cited in a load-bearing way) erasure techniques rather than deriving causality internally or renaming known results. No self-citation chains, uniqueness theorems from prior author work, or ansatzes smuggled via citation appear in the provided text. This is the common case of an honest non-finding for an algorithmic paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Concept erasure procedures can be used to establish causal relationships between high-level concepts and model predictions
Reference graph
Works this paper leans on
-
[1]
Advances in Neural Information Processing Systems , volume=
Interpreting clip with sparse linear concept embeddings (splice) , author=. Advances in Neural Information Processing Systems , volume=
-
[2]
Pasareanu and Nina Narodytska and Ravi Mangal and Susmit Jha , title =
Boyue Caroline Hu and Divya Gopinath and Corina S. Pasareanu and Nina Narodytska and Ravi Mangal and Susmit Jha , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2503.17416 , eprinttype =. 2503.17416 , timestamp =
-
[3]
Cai and James Wexler and Fernanda B
Been Kim and Martin Wattenberg and Justin Gilmer and Carrie J. Cai and James Wexler and Fernanda B. Vi. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors. Proceedings of the 35th International Conference on Machine Learning,. 2018 , url =
2018
-
[4]
Yanai Elazar and Shauli Ravfogel and Alon Jacovi and Yoav Goldberg , title =. Trans. Assoc. Comput. Linguistics , volume =. 2021 , url =. doi:10.1162/TACL\_A\_00359 , timestamp =
work page internal anchor Pith review doi:10.1162/tacl 2021
-
[5]
Shauli Ravfogel, Grusha Prasad, Tal Linzen, and Yoav Goldberg
Shauli Ravfogel and Grusha Prasad and Tal Linzen and Yoav Goldberg , editor =. Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction , booktitle =. 2021 , url =. doi:10.18653/V1/2021.CONLL-1.15 , timestamp =
-
[6]
The Causal-Neural Connection: Expressiveness, Learnability, and Inference , booktitle =
Kevin Xia and Kai. The Causal-Neural Connection: Expressiveness, Learnability, and Inference , booktitle =. 2021 , url =
2021
-
[7]
Nora Belrose and David Schneider. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , year =
2023
-
[8]
Causal Analysis for Robust Interpretability of Neural Networks , booktitle =
Ola Ahmad and Nicolas B. Causal Analysis for Robust Interpretability of Neural Networks , booktitle =. 2024 , url =. doi:10.1109/WACV57701.2024.00462 , timestamp =
-
[9]
Balasubramanian and Satyanarayan Kar , editor =
Abbavaram Gowtham Reddy and Saketh Bachu and Harsharaj Pathak and Benin Godfrey L and Varshaneya V and Vineeth N. Balasubramanian and Satyanarayan Kar , editor =. Towards Learning and Explaining Indirect Causal Effects in Neural Networks , booktitle =. 2024 , url =. doi:10.1609/AAAI.V38I13.29399 , timestamp =
-
[10]
CUBE: Causal Intervention-Based Counterfactual Explanation for Prediction Models , year=
Shao, Xinyue and Wang, Hongzhi and Chen, Xiang and Zhu, Xiao and Zhang, Yan , journal=. CUBE: Causal Intervention-Based Counterfactual Explanation for Prediction Models , year=
-
[11]
International Conference on Machine Learning , pages=
Text-to-concept (and back) via cross-model alignment , author=. International Conference on Machine Learning , pages=. 2023 , organization=
2023
-
[12]
International Conference of the Italian Association for Artificial Intelligence , pages=
From contrastive to abductive explanations and back again , author=. International Conference of the Italian Association for Artificial Intelligence , pages=. 2020 , organization=
2020
-
[13]
International Symposium on AI Verification , pages=
Concept-based analysis of neural networks via vision-language models , author=. International Symposium on AI Verification , pages=. 2024 , organization=
2024
-
[14]
Abduction-Based Explanations for Machine Learning Models , booktitle =
Alexey Ignatiev and Nina Narodytska and Jo. Abduction-Based Explanations for Machine Learning Models , booktitle =
-
[15]
AI*IA 2020 -- Advances in Artificial Intelligence , editor =
Alexey Ignatiev and Nina Narodytska and Nicholas Asher and Joao Marques-Silva , title =. AI*IA 2020 -- Advances in Artificial Intelligence , editor =
2020
-
[16]
Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE 2024) , pages =
Xinyue Shao and Hongzhi Wang and Xiang Chen and Xiao Zhu and Yan Zhang , title =. Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE 2024) , pages =
2024
-
[17]
Formal Explanations of Neural Network Predictions , booktitle =
Alexey Ignatiev and Nina Narodytska and Jo. Formal Explanations of Neural Network Predictions , booktitle =
-
[18]
Godfrey and Varshaneya V and Vineeth N
Abbavaram Gowtham Reddy and Saketh Bachu and Harsharaj Pathak and Benin L. Godfrey and Varshaneya V and Vineeth N. Balasubramanian and Satyanarayan Kar , title =. Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI 2024) , volume =
2024
-
[19]
SAT-based Rigorous Explanations for Decision Lists , booktitle =
Alexey Ignatiev and Jo. SAT-based Rigorous Explanations for Decision Lists , booktitle =
-
[20]
On Explaining Random Forests with
Hadi Izza and Jo. On Explaining Random Forests with. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI 2021) , pages =
2021
-
[21]
Alexey Ignatiev and Hadi Izza and Jo. Using. Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2022) , volume =
2022
-
[22]
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI 2023) , pages =
Gilles Audemard and Steve Bellart and Jean-Marie Lagniez and Pierre Marquis , title =. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI 2023) , pages =
2023
-
[23]
Proceedings of the Twenty-Sixth European Conference on Artificial Intelligence (ECAI 2023) , pages =
Gilles Audemard and Jean-Marie Lagniez and Pierre Marquis and Nicolas Szczepanski , title =. Proceedings of the Twenty-Sixth European Conference on Artificial Intelligence (ECAI 2023) , pages =
2023
-
[24]
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI 2024) , pages =
Gilles Audemard and Jean-Marie Lagniez and Pierre Marquis and Nicolas Szczepanski , title =. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI 2024) , pages =
2024
-
[25]
Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2023) , series =
Shahaf Bassan and Guy Katz , title =. Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2023) , series =
2023
-
[26]
Stuckey , title =
Jinqiang Yu and Graham Farr and Alexey Ignatiev and Peter J. Stuckey , title =. 27th International Conference on Theory and Applications of Satisfiability Testing (SAT 2024) , series =
2024
-
[27]
Shieber , title =
Jesse Vig and Sebastian Gehrmann and Yonatan Belinkov and Sharon Qian and Daniel Nevo and Yaron Singer and Stuart M. Shieber , title =. Advances in Neural Information Processing Systems 33 (NeurIPS 2020) , year =
2020
-
[28]
Advances in Neural Information Processing Systems 35 (NeurIPS 2022) , pages =
Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov , title =. Advances in Neural Information Processing Systems 35 (NeurIPS 2022) , pages =
2022
-
[29]
Mavor-Parker and Aengus Lynch and Stefan Heimersheim and Adri
Arthur Conmy and Augustine N. Mavor-Parker and Aengus Lynch and Stefan Heimersheim and Adri. Towards Automated Circuit Discovery for Mechanistic Interpretability , booktitle =
-
[30]
International Conference on Learning Representations (ICLR 2024) , year =
Fred Zhang and Neel Nanda , title =. International Conference on Learning Representations (ICLR 2024) , year =
2024
-
[31]
Causal Analysis for Robust Interpretability of Neural Networks , booktitle =
Ola Ahmad and Nicolas B. Causal Analysis for Robust Interpretability of Neural Networks , booktitle =
-
[32]
Advances in Neural Information Processing Systems 34 (NeurIPS 2021) , pages =
Kevin Xia and Kai-Zhan Lee and Yoshua Bengio and Elias Bareinboim , title =. Advances in Neural Information Processing Systems 34 (NeurIPS 2021) , pages =
2021
-
[33]
Advances in Neural Information Processing Systems 36 (NeurIPS 2023) , year =
Nora Belrose and David Schneider-Joseph and Shauli Ravfogel and Ryan Cotterell and Edward Raff and Stella Biderman , title =. Advances in Neural Information Processing Systems 36 (NeurIPS 2023) , year =
2023
-
[34]
Transactions of the Association for Computational Linguistics , volume =
Yanai Elazar and Shauli Ravfogel and Alon Jacovi and Yoav Goldberg , title =. Transactions of the Association for Computational Linguistics , volume =
-
[35]
Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL 2021) , pages =
Shauli Ravfogel and Grusha Prasad and Tal Linzen and Yoav Goldberg , title =. Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL 2021) , pages =
2021
-
[36]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023) , pages =
Siwon Kim and Jinoh Oh and Sungjin Lee and Seunghak Yu and Jaeyoung Do and Tara Taghavi , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023) , pages =
2023
-
[37]
Efros and Jacob Steinhardt , title =
Yossi Gandelsman and Alexei A. Efros and Jacob Steinhardt , title =. International Conference on Learning Representations (ICLR 2024) , year =
2024
-
[38]
Decomposing and Interpreting Image Representations via Text in
Sriram Balasubramanian and Samyadeep Basu and Soheil Feizi , journal =. Decomposing and Interpreting Image Representations via Text in
-
[39]
Interpreting the Linear Structure of Vision-Language Model Embedding Spaces , author =. arXiv preprint arXiv:2504.11695 , year =
-
[40]
Pasareanu and Nina Narodytska and Ravi Mangal and Susmit Jha , title =
Boyue Caroline Hu and Divya Gopinath and Corina S. Pasareanu and Nina Narodytska and Ravi Mangal and Susmit Jha , title =. Proceedings of the 4th IEEE/ACM International Conference on AI Engineering - Software Engineering for AI (CAIN 2025) , pages =
2025
-
[41]
Edward Kim and Divya Gopinath and Corina S. P. A Programmatic and Semantic Approach to Explaining and Debugging Neural Network Based Object Detectors , booktitle =
-
[42]
Proceedings of the 23rd Conference on Formal Methods in Computer-Aided Design (FMCAD 2023) , editor =
Shahaf Bassan and Guy Amir and Davide Corsi and Idan Refaeli and Guy Katz , title =. Proceedings of the 23rd Conference on Formal Methods in Computer-Aided Design (FMCAD 2023) , editor =
2023
-
[43]
From Causal to Concept-Based Representation Learning , booktitle =
Goutham Rajendran and Simon Buchholz and Bryon Aragam and Bernhard Sch. From Causal to Concept-Based Representation Learning , booktitle =
-
[44]
and Harrison, Phil and Murray, William R
Clark, Peter and Fellbaum, Christiane and Hobbs, Jerry R. and Harrison, Phil and Murray, William R. and Thompson, John , editor=. Augmenting WordNet for Deep Understanding of Text , url=. Semantics in Text Processing. STEP 2008 Conference Proceedings , publisher=. 2008 , pages=
2008
-
[45]
Extending and Improving Wordnet via Unsupervised Word Embeddings , url=
Khodak, Mikhail and Risteski, Andrej and Fellbaum, Christiane and Arora, Sanjeev , year=. Extending and Improving Wordnet via Unsupervised Word Embeddings , url=. doi:10.48550/arXiv.1705.00217 , abstractNote=
-
[46]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , month =
A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , month =
-
[47]
EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
Helber, Patrick and Bischke, Benjamin and Dengel, Andreas and Borth, Damian , year=. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , url=. doi:10.48550/arXiv.1709.00029 , note=
-
[48]
Learning Transferable Visual Models From Natural Language Supervision
Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and Krueger, Gretchen and Sutskever, Ilya , year=. Learning Transferable Visual Models From Natural Language Supervision , url=. doi:10.48550/arXiv.2103.00020 , note=
work page internal anchor Pith review doi:10.48550/arxiv.2103.00020
-
[49]
Deep Residual Learning for Image Recognition
He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , year=. Deep Residual Learning for Image Recognition , url=. doi:10.48550/arXiv.1512.03385 , note=
work page internal anchor Pith review doi:10.48550/arxiv.1512.03385
-
[50]
Very Deep Convolutional Networks for Large-Scale Image Recognition
Simonyan, Karen and Zisserman, Andrew , year=. Very Deep Convolutional Networks for Large-Scale Image Recognition , url=. doi:10.48550/arXiv.1409.1556 , note=
-
[51]
Wang, Yi and Braham, Nassim Ait Ali and Xiong, Zhitong and Liu, Chenying and Albrecht, Conrad M. and Zhu, Xiao Xiang , year=. SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation , url=. doi:10.48550/arXiv.2211.07044 , note=
-
[52]
Explainability Fact Sheets: A Framework for Systematic Assessment of Explainable Approaches , url=. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency , author=. 2020 , month=jan, pages=. doi:10.1145/3351095.3372870 , note=
-
[53]
Towards Faithful Model Explanation in NLP: A Survey , shorttitle =
Lyu, Qing and Apidianaki, Marianna and Callison-Burch, Chris , year=. Towards Faithful Model Explanation in NLP: A Survey , url=. doi:10.48550/arXiv.2209.11326 , note=
-
[54]
Vera and Vaughan, Jennifer Wortman , year =
Liao, Q. Vera and Vaughan, Jennifer Wortman , year=. AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap , url=. doi:10.48550/arXiv.2306.01941 , note=
-
[55]
and Li, Chun-Liang and Pfister, Tomas and Ravikumar, Pradeep , year=
Yeh, Chih-Kuan and Kim, Been and Arik, Sercan O. and Li, Chun-Liang and Pfister, Tomas and Ravikumar, Pradeep , year=. On Completeness-aware Concept-Based Explanations in Deep Neural Networks , url=. doi:10.48550/arXiv.1910.07969 , note=
-
[56]
plausibility: On the (un) reliability of explanations from large language models , author=
Agarwal, Chirag and Tanneru, Sree Harsha and Lakkaraju, Himabindu , year=. Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models , url=. doi:10.48550/arXiv.2402.04614 , note=
-
[57]
and Yeung-Levy, Serena , year=
Dunlap, Lisa and Zhang, Yuhui and Wang, Xiaohan and Zhong, Ruiqi and Darrell, Trevor and Steinhardt, Jacob and Gonzalez, Joseph E. and Yeung-Levy, Serena , year=. Describing Differences in Image Sets with Natural Language , url=. doi:10.48550/arXiv.2312.02974 , note=
-
[58]
Label-free concept bottleneck models.arXiv preprint arXiv:2304.06129, 2023
Oikarinen, Tuomas and Das, Subhro and Nguyen, Lam M. and Weng, Tsui-Wei , year=. Label-Free Concept Bottleneck Models , url=. doi:10.48550/arXiv.2304.06129 , note=
-
[59]
Wu, Chenfei and Li, Jiahao and Zhou, Jingren and Lin, Junyang and Gao, Kaiyuan and Yan, Kun and Yin, Sheng-ming and Bai, Shuai and Xu, Xiao and Chen, Yilei and Chen, Yuxiang and Tang, Zecheng and Zhang, Zekai and Wang, Zhengyi and Yang, An and Yu, Bowen and Cheng, Chen and Liu, Dayiheng and Li, Deqing and Zhang, Hang and Meng, Hao and Wei, Hu and Ni, Jing...
work page internal anchor Pith review doi:10.48550/arxiv.2508.02324
-
[60]
Occam’ s Razor , volume=
Rasmussen, Carl and Ghahramani, Zoubin , year=. Occam’ s Razor , volume=. Advances in Neural Information Processing Systems , publisher=
-
[61]
Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent , url=
Li, Christy and Camuñas, Josep Lopez and Touchet, Jake Thomas and Andreas, Jacob and Lapedriza, Agata and Torralba, Antonio and Shaham, Tamar Rott , year=. Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent , url=. doi:10.48550/arXiv.2510.21704 , note=
-
[62]
and Doran, Gary and Francis, Raymond and Lee, Jake and Mandrake, Lukas , year=
Wagstaff, Kiri and Lu, Steven and Dunkel, Emily and Grimes, Kevin and Zhao, Brandon and Cai, Jesse and Cole, Shoshanna B. and Doran, Gary and Francis, Raymond and Lee, Jake and Mandrake, Lukas , year=. Mars Image Content Classification: Three Years of NASA Deployment and Recent Advances , url=. doi:10.48550/arXiv.2102.05011 , note=
-
[63]
Swan, R. Michael and Atha, Deegan and Leopold, Henry A. and Gildner, Matthew and Oij, Stephanie and Chiu, Cindy and Ono, Masahiro , year=. AI4MARS: A Dataset for Terrain-Aware Autonomous Driving on Mars , rights=. doi:10.1109/CVPRW53098.2021.00226 , booktitle=
-
[64]
doi:10.2514/1.A35767 , note=
-
[65]
Lee, Dongmyeong and Adkins, Amanda and Biswas, Joydeep , year=. CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning , url=. doi:10.48550/arXiv.2407.09718 , note=
-
[66]
Unifying Formal Explanations: A Complexity-Theoretic Perspective , url=
Bassan, Shahaf and Huang, Xuanxiang and Katz, Guy , year=. Unifying Formal Explanations: A Complexity-Theoretic Perspective , url=. doi:10.48550/arXiv.2602.18160 , note=
-
[67]
On Finding Minimum Satisfying Assignments , volume=
Ignatiev, Alexey and Previti, Alessandro and Marques-Silva, Joao , editor=. On Finding Minimum Satisfying Assignments , volume=. 2016 , pages=. doi:10.1007/978-3-319-44953-1_19 , booktitle=
-
[68]
Concept Bottleneck Models , url=
Koh, Pang Wei and Nguyen, Thao and Tang, Yew Siang and Mussmann, Stephen and Pierson, Emma and Kim, Been and Liang, Percy , year=. Concept Bottleneck Models , url=. doi:10.48550/arXiv.2007.04612 , note=
-
[69]
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Cunningham, Hoagy and Ewart, Aidan and Riggs, Logan and Huben, Robert and Sharkey, Lee , year=. Sparse Autoencoders Find Highly Interpretable Features in Language Models , url=. doi:10.48550/arXiv.2309.08600 , note=
-
[70]
Microsoft COCO: Common Objects in Context
Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Bourdev, Lubomir and Girshick, Ross and Hays, James and Perona, Pietro and Ramanan, Deva and Zitnick, C. Lawrence and Dollár, Piotr , year=. Microsoft COCO: Common Objects in Context , url=. doi:10.48550/arXiv.1405.0312 , note=
work page internal anchor Pith review doi:10.48550/arxiv.1405.0312
-
[71]
Mechanistic understanding and validation of large AI models with SemanticLens , volume=
Dreyer, Maximilian and Berend, Jim and Labarta, Tobias and Vielhaben, Johanna and Wiegand, Thomas and Lapuschkin, Sebastian and Samek, Wojciech , year=. Mechanistic understanding and validation of large AI models with SemanticLens , volume=. Nature Machine Intelligence , publisher=. doi:10.1038/s42256-025-01084-w , number=
-
[72]
Gemini 3: A Family of Highly Capable Multimodal Reasoning Models , year =
-
[73]
Hochbaum, D.S. , year=. Approximation Algorithms for NP-hard Problems , ISBN=
-
[74]
Linearly mapping from image to text space
Merullo, Jack and Castricato, Louis and Eickhoff, Carsten and Pavlick, Ellie , year=. Linearly Mapping from Image to Text Space , url=. doi:10.48550/arXiv.2209.15162 , note=
-
[75]
Ramaswamy, Vikram V. and Kim, Sunnie S. Y. and Fong, Ruth and Russakovsky, Olga , urldate =. Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability , url =. doi:10.48550/arXiv.2207.09615 , shorttitle =. 2207.09615 [cs] , keywords =
-
[76]
The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
Preserving Task-Relevant Information Under Linear Concept Removal , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
-
[77]
International Conference on Machine Learning , pages=
Linear adversarial concept erasure , author=. International Conference on Machine Learning , pages=. 2022 , organization=
2022
-
[78]
Delivering Trustworthy AI through Formal XAI , volume =
Marques-Silva, Joao and Ignatiev, Alexey , journal =. Delivering Trustworthy AI through Formal XAI , volume =
-
[79]
The Fourteenth International Conference on Learning Representations , year=
Unifying Complexity-Theoretic Perspectives on Provable Explanations , author=. The Fourteenth International Conference on Learning Representations , year=
-
[80]
2022 , journal=
Toy Models of Superposition , author=. 2022 , journal=
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.