OPTIMUS-Prime: Minimal and Sufficient Concept Explanations for Deep Vision Models

Arthur Hoarau; Chenrui Zhu; Vu Linh Nguyen

arxiv: 2606.07180 · v1 · pith:FNO26PG7new · submitted 2026-06-05 · 💻 cs.CV · cs.LG

OPTIMUS-Prime: Minimal and Sufficient Concept Explanations for Deep Vision Models

Arthur Hoarau , Chenrui Zhu , Vu Linh Nguyen This is my paper

Pith reviewed 2026-06-27 21:56 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords concept explanationsprime implicantssufficiencyminimalitydeep vision modelsXAIheatmapsinterpretability

0 comments

The pith

OPTIMUS generates visual heatmaps for deep vision models that provably guarantee the prediction using the smallest set of concepts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents OPTIMUS as a method to create concept explanations for image classifiers in the form of heatmaps. These explanations rest on the theory of prime implicants to deliver two formal properties: the highlighted concepts are sufficient to ensure the model makes its prediction, and they are minimal because removing any one of them breaks that guarantee. Current explanation techniques often focus on visual appeal without such logical assurances, so OPTIMUS targets the gap by making the explanations both human-readable and rigorously tight. Validation occurs on a standard visual classification benchmark where the heatmaps surface the concepts the model actually relies on.

Core claim

OPTIMUS explanations take the form of visual heatmaps grounded in prime implicants of the classifier's decision process. They satisfy sufficiency, meaning the concepts highlighted provably guarantee the model's prediction, and minimality, meaning no strict subset of those concepts retains the guarantee. This combination produces explanations that are logically tight and visually coherent for deep classification models.

What carries the argument

Prime implicants identified or approximated from the model's internal activations, rendered as heatmaps that enforce both sufficiency and minimality for the selected visual concepts.

If this is right

The resulting heatmaps remain interpretable to end users while carrying explicit logical guarantees absent from most saliency methods.
No smaller collection of concepts will still guarantee the classifier output.
The approach applies directly to standard deep vision classification models and surfaces decision-relevant concepts on benchmarks.
Explanations become both visually coherent and free of redundant concepts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the prime-implicant approximation holds across architectures, it could support systematic comparison of what different models treat as essential for the same input.
The minimality property might help isolate the exact features a model uses when predictions change under small input perturbations.
Extending the method beyond vision could test whether similar implicant extraction works for other data types where decision boundaries are less spatially organized.

Load-bearing premise

Prime implicants can be identified or approximated from the internal activations of a deep neural network in a way that directly yields the claimed formal guarantees for visual concepts.

What would settle it

A concrete falsifier would be a generated heatmap where the isolated concepts fail to force the model's original prediction, or where removing one concept leaves a subset that still guarantees the prediction.

Figures

Figures reproduced from arXiv: 2606.07180 by Arthur Hoarau, Chenrui Zhu, Vu Linh Nguyen.

**Figure 2.** Figure 2: DeepLIFT-OPTIMUS explanations. 2.3.2 DeepLIFT While Integrated Gradients is computationally expensive due to its integration over [0, 1] and the repeated backpropagation of gradients, DeepLIFT proposes a single-point estimate instead. We first define ∆z ℓ i = z ℓ i (x) − z ℓ i (x ′ ) as the activation difference with respect to the baseline x ′ introduced earlier. A multiplier m∆z ℓ i ∆z ℓ+1 j is then intr… view at source ↗

**Figure 3.** Figure 3: On the left: DeepLIFT-OPTIMUS. On the right: the difference between full concepts [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Difference between DeepLIFT and DeepLIFT-OPTIMUS: unnecessary concepts. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Artificial-data comparison of 4-class PI search [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Artificial-data comparison of 8-class PI search [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Artificial-data comparison of 12-class PI search [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: IG-OPTIMUS explanations [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Difference between IG and IG-OPTIMUS: unnecessary concepts. [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

read the original abstract

The growing demand for transparency in automated decision-making has propelled eXplainable Artificial Intelligence (XAI) to the forefront of machine learning research. In computer vision, however, existing explanation methods often prioritize end-user accessibility at the expense of formal guarantees, leaving a critical gap between practical utility and theoretical rigor. In this paper, we address this gap by introducing OPTIMUS, a novel framework for generating concept-based visual explanations for deep classification models. OPTIMUS explanations take the form of visual heatmaps that not only remain interpretable to end users, but are grounded in the well-established theory of prime implicants, providing formal guarantees that have been largely absent from existing saliency-based methods. Specifically, OPTIMUS explanations satisfy two desirable properties: sufficiency, ensuring that the highlighted concepts provably guarantee the classifier's prediction, and minimality, ensuring that no strict subset of those concepts retains this guarantee. Together, these properties yield explanations that are both logically tight and visually coherent. We validate our approach on a visual classification benchmark, demonstrating that OPTIMUS heatmaps naturally and faithfully surface the decision-relevant concepts underlying model predictions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OPTIMUS claims prime-implicant guarantees for concept heatmaps but the abstract gives no derivation or discretization method, so the formal claims stay unverified.

read the letter

The main thing to know is that this paper wants to add formal sufficiency and minimality to concept explanations by treating them as prime implicants. The abstract states the heatmaps should provably entail the model output and contain no redundant concepts.

It does a straightforward job naming a real gap: most saliency and concept methods in vision lack logical guarantees. Framing the target properties as sufficiency plus minimality is clean and matches what some XAI users actually want.

The soft spot is the missing bridge from continuous DNN activations to exact boolean literals. Prime implicants need a discrete function over a finite set of variables with no counterexamples outside the defined domain. The abstract mentions no algorithm for defining concepts via thresholds or superpixels, no proof that the extracted implicant is prime, and no error bounds on sampling or optimization. Without those steps the 'provably guarantee' language reduces to an empirical observation on the benchmark. The validation is described only at high level, so it does not yet show the guarantees hold.

No equations or pseudocode appear in the provided text, which makes it hard to judge soundness or novelty against prior logic-based explanation work.

This is for XAI researchers who care about tightening concept explanations with logic. A reader already working on formal methods in ML could extract useful framing even if the technical details need checking.

It deserves peer review so the methods section can be examined for whether the discretization actually preserves the claimed properties.

Referee Report

2 major / 2 minor

Summary. The paper introduces OPTIMUS, a framework for generating concept-based visual explanations (heatmaps) for deep vision classification models. It claims these explanations are grounded in prime implicant theory, satisfying formal guarantees of sufficiency (highlighted concepts provably entail the model's prediction) and minimality (no strict subset retains the guarantee).

Significance. If the formal guarantees are rigorously established, the work would be significant for XAI in computer vision by supplying theoretically grounded explanations with logical tightness, a property largely absent from saliency methods. The approach of leveraging boolean function theory for visual concepts is a promising direction if the extraction process can be shown to preserve exact entailment.

major comments (2)

[Abstract and §3] Abstract and §3 (method): The abstract asserts that OPTIMUS explanations 'provably guarantee' the classifier's prediction via prime implicants, but no derivation, algorithm, or proof is supplied showing how continuous DNN activations are mapped to discrete boolean literals such that the conjunction exactly entails the output for all inputs (not merely sampled ones). Without discretization error bounds or a demonstration that no counterexamples exist, the formal sufficiency claim reduces to an empirical property.
[§4] §4 (experiments): The validation is described only at a high level ('visual classification benchmark') with no quantitative assessment of whether the extracted implicants satisfy exact minimality or sufficiency on held-out data; this is load-bearing because the central claim requires the guarantees to hold beyond the training distribution.

minor comments (2)

[§3] Notation for visual concepts (e.g., how superpixels or activation thresholds become literals) should be introduced with an explicit example early in §3 to aid readability.
[Abstract] The abstract mentions 'a visual classification benchmark' but does not name the dataset; adding the name would improve clarity without altering the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (method): The abstract asserts that OPTIMUS explanations 'provably guarantee' the classifier's prediction via prime implicants, but no derivation, algorithm, or proof is supplied showing how continuous DNN activations are mapped to discrete boolean literals such that the conjunction exactly entails the output for all inputs (not merely sampled ones). Without discretization error bounds or a demonstration that no counterexamples exist, the formal sufficiency claim reduces to an empirical property.

Authors: We acknowledge the referee's observation. Section 3 describes the discretization of continuous concept activations into boolean literals via thresholding and the subsequent application of prime implicant extraction on the resulting boolean function. The formal guarantees of sufficiency and minimality are established exactly within this discretized boolean representation. However, the manuscript does not include explicit discretization error bounds or a proof that entailment holds without counterexamples in the original continuous input space. We will revise the paper to add a dedicated subsection deriving the discretization step, stating the assumptions under which the guarantees transfer, and discussing the distinction between the boolean and continuous domains. revision: yes
Referee: [§4] §4 (experiments): The validation is described only at a high level ('visual classification benchmark') with no quantitative assessment of whether the extracted implicants satisfy exact minimality or sufficiency on held-out data; this is load-bearing because the central claim requires the guarantees to hold beyond the training distribution.

Authors: We agree that quantitative assessment on held-out data is necessary to substantiate the claims. The current §4 presents qualitative results on the visual classification benchmark to illustrate the coherence of the generated heatmaps. In the revised version we will incorporate quantitative evaluations, including the fraction of test samples on which the extracted prime implicants preserve both sufficiency and minimality when the underlying boolean function is evaluated on unseen data. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation applies external prime-implicant theory to DNN activations without self-referential reduction.

full rationale

The paper grounds its sufficiency and minimality claims in the established theory of prime implicants, an external boolean-logic framework independent of the present work. No equations, self-citations, or definitional loops appear in the provided abstract that would make the guarantees tautological or force predictions from fitted inputs. The mapping from continuous activations to boolean literals is presented as a methodological step rather than a self-defining equivalence, leaving the central claims dependent on external mathematical properties rather than internal construction. This is the most common honest outcome for a methods paper that invokes a pre-existing formal theory.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; the central claim rests on unstated assumptions about applying prime implicant theory to neural networks.

pith-pipeline@v0.9.1-grok · 5728 in / 1067 out tokens · 22533 ms · 2026-06-27T21:56:44.406477+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 4 linked inside Pith

[1]

Sanity checks for saliency maps

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. InProceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pages 9525–9536, Red Hook, NY , USA, December 2018. Curran Associates Inc

2018
[2]

David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network Dissec- tion: Quantifying Interpretability of Deep Visual Representations.2017 IEEE Conference on 9 Computer Vision and Pattern Recognition (CVPR), pages 3319–3327, July 2017

2017
[3]

Understanding the role of individual units in a deep neural network.Proceedings of the National Academy of Sciences, 117(48):30071–30078, December 2020

David Bau, Jun-Yan Zhu, Hendrik Strobelt, Agata Lapedriza, Bolei Zhou, and Antonio Tor- ralba. Understanding the role of individual units in a deep neural network.Proceedings of the National Academy of Sciences, 117(48):30071–30078, December 2020

2020
[4]

Su- perposition of many models into one

Brian Cheung, Alexander Terekhov, Yubei Chen, Pulkit Agrawal, and Bruno Olshausen. Su- perposition of many models into one. InAdvances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

2019
[5]

Opportunities and challenges in explainable artificial intelligence (xai): A survey.arXiv preprint arXiv:2006.11371, 2020

Arun Das and Paul Rad. Opportunities and challenges in explainable artificial intelligence (xai): A survey.arXiv preprint arXiv:2006.11371, 2020

arXiv 2006
[6]

On the inter- pretability of part-prototype based classifiers: a human centric analysis.Scientific Reports, 13(1):23088, December 2023

Omid Davoodi, Shayan Mohammadizadehsamakosh, and Majid Komeili. On the inter- pretability of part-prototype based classifiers: a human centric analysis.Scientific Reports, 13(1):23088, December 2023

2023
[7]

Toy Models of Superposition, September 2022

Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, and Christopher Olah. Toy Models of Superposition, September 2022

2022
[8]

From contrastive to abductive explanations and back again

Alexey Ignatiev, Nina Narodytska, Nicholas Asher, and Joao Marques-Silva. From contrastive to abductive explanations and back again. InInternational Conference of the Italian Associa- tion for Artificial Intelligence, 2020

2020
[9]

On Explaining Decision Trees, October

Yacine Izza, Alexey Ignatiev, and Joao Marques-Silva. On Explaining Decision Trees, October
[10]

arXiv:2010.11034 [cs]

arXiv 2010
[11]

Visualizing and Understanding Recurrent Networks, November 2015

Andrej Karpathy, Justin Johnson, and Li Fei-Fei. Visualizing and Understanding Recurrent Networks, November 2015. arXiv:1506.02078 [cs]

Pith/arXiv arXiv 2015
[12]

Wattenberg, J

Been Kim, M. Wattenberg, J. Gilmer, Carrie J. Cai, James Wexler, F. Viégas, and R. Sayres. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vec- tors (TCA V). InInternational Conference on Machine Learning, November 2017

2017
[13]

Concept bottleneck models

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. InProceedings of the 37th International Conference on Machine Learning, volume 119 ofICML’20, pages 5338–5348. JMLR.org, July 2020

2020
[14]

Captum: A unified and generic model interpretability library for PyTorch, September 2020

Narine Kokhlikyan, Vivek Miglani, Miguel Martin, Edward Wang, Bilal Alsallakh, Jonathan Reynolds, Alexander Melnikov, Natalia Kliushkina, Carlos Araya, Siqi Yan, and Orion Reblitz-Richardson. Captum: A unified and generic model interpretability library for PyTorch, September 2020. arXiv:2009.07896 [cs]

arXiv 2020
[15]

Similarity of Neural Network Representations Revisited, July 2019

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of Neural Network Representations Revisited, July 2019. arXiv:1905.00414 [cs]

Pith/arXiv arXiv 2019
[16]

Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, and John Hopcroft. Convergent Learning: Do different neural networks learn the same representations? InProceedings of the 1st Inter- national Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, pages 196–212. PMLR, December 2015

2015
[17]

Lundberg and S-I

S. Lundberg and S-I. Lee. A unified approach to interpreting model predictions. InProc. of NIPS’17, 2017

2017
[18]

High Resolution Cat-Dog-Bird Image Dataset (13000)

MahmoudNoor. High Resolution Cat-Dog-Bird Image Dataset (13000)
[19]

Logic-based explainability in machine learning, 2023

Joao Marques-Silva. Logic-based explainability in machine learning, 2023

2023
[20]

Explaining naive bayes and other linear classifiers with polynomial time and delay

Joao Marques-Silva, Thomas Gerspacher, Martin Cooper, Alexey Ignatiev, and Nina Naro- dytska. Explaining naive bayes and other linear classifiers with polynomial time and delay. Advances in Neural Information Processing Systems, 33:20590–20600, 2020. 10

2020
[21]

Linguistic regularities in continuous space word representations

Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic regularities in continuous space word representations. In Lucy Vanderwende, Hal Daumé III, and Katrin Kirchhoff, editors, Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 746–751, Atlanta, Georgia,...

2013
[22]

why should I trust you?

M. T. Ribeiro, S. Singh, and C. Guestrin. "why should I trust you?": Explaining the predictions of any classifier. InSIGKDD, pages 1135–1144, 2016

2016
[23]

Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient- based Localization.International Journal of Computer Vision, 128(2):336–359, February
[24]

arXiv:1610.02391 [cs]

arXiv
[25]

A. Shih, A. Choi, and A. Darwiche. A symbolic approach to explaining bayesian network classifiers. InProc. of IJCAI’18, pages 5103–5111, 2018

2018
[26]

Learning important features through propagating activation differences

Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. InProceedings of the 34th International Confer- ence on Machine Learning - Volume 70, ICML’17, pages 3145–3153, Sydney, NSW, Australia, August 2017. JMLR.org

2017
[27]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, April 2014

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, April 2014. arXiv:1312.6034 [cs]

Pith/arXiv arXiv 2014
[28]

Formal Abductive Latent Explanations for Prototype-Based Networks.Pro- ceedings of the AAAI Conference on Artificial Intelligence, 40(30):25590–25598, March 2026

Jules Soria, Zakaria Chihani, Julien Girard-Satabin, Alban Grastien, Romain Xu-Darme, and Daniela Cancila. Formal Abductive Latent Explanations for Prototype-Based Networks.Pro- ceedings of the AAAI Conference on Artificial Intelligence, 40(30):25590–25598, March 2026

2026
[29]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InProceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pages 3319–3328, Sydney, NSW, Australia, August 2017. JMLR.org

2017
[30]

Zeiler and Rob Fergus

Matthew D. Zeiler and Rob Fergus. Visualizing and Understanding Convolutional Networks. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors,Computer Vision – ECCV 2014, volume 8689, pages 818–833. Springer International Publishing, Cham, 2014. Series Title: Lecture Notes in Computer Science

2014
[31]

Zintgraf, Taco S

Luisa M. Zintgraf, Taco S. Cohen, Tameem Adel, and Max Welling. Visualizing Deep Neural Network Decisions: Prediction Difference Analysis, February 2017. arXiv:1702.04595 [cs]. 11 A Notations, Acronyms and Datasets Notations/Acronyms Description XInput spaceX ⊆R P PInput dimension YLabel space{1, . . . , K} (xn, yn)i.i.d. draw from the joint distributionP...

Pith/arXiv arXiv 2017

[1] [1]

Sanity checks for saliency maps

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. InProceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pages 9525–9536, Red Hook, NY , USA, December 2018. Curran Associates Inc

2018

[2] [2]

David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network Dissec- tion: Quantifying Interpretability of Deep Visual Representations.2017 IEEE Conference on 9 Computer Vision and Pattern Recognition (CVPR), pages 3319–3327, July 2017

2017

[3] [3]

Understanding the role of individual units in a deep neural network.Proceedings of the National Academy of Sciences, 117(48):30071–30078, December 2020

David Bau, Jun-Yan Zhu, Hendrik Strobelt, Agata Lapedriza, Bolei Zhou, and Antonio Tor- ralba. Understanding the role of individual units in a deep neural network.Proceedings of the National Academy of Sciences, 117(48):30071–30078, December 2020

2020

[4] [4]

Su- perposition of many models into one

Brian Cheung, Alexander Terekhov, Yubei Chen, Pulkit Agrawal, and Bruno Olshausen. Su- perposition of many models into one. InAdvances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

2019

[5] [5]

Opportunities and challenges in explainable artificial intelligence (xai): A survey.arXiv preprint arXiv:2006.11371, 2020

Arun Das and Paul Rad. Opportunities and challenges in explainable artificial intelligence (xai): A survey.arXiv preprint arXiv:2006.11371, 2020

arXiv 2006

[6] [6]

On the inter- pretability of part-prototype based classifiers: a human centric analysis.Scientific Reports, 13(1):23088, December 2023

Omid Davoodi, Shayan Mohammadizadehsamakosh, and Majid Komeili. On the inter- pretability of part-prototype based classifiers: a human centric analysis.Scientific Reports, 13(1):23088, December 2023

2023

[7] [7]

Toy Models of Superposition, September 2022

Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, and Christopher Olah. Toy Models of Superposition, September 2022

2022

[8] [8]

From contrastive to abductive explanations and back again

Alexey Ignatiev, Nina Narodytska, Nicholas Asher, and Joao Marques-Silva. From contrastive to abductive explanations and back again. InInternational Conference of the Italian Associa- tion for Artificial Intelligence, 2020

2020

[9] [9]

On Explaining Decision Trees, October

Yacine Izza, Alexey Ignatiev, and Joao Marques-Silva. On Explaining Decision Trees, October

[10] [10]

arXiv:2010.11034 [cs]

arXiv 2010

[11] [11]

Visualizing and Understanding Recurrent Networks, November 2015

Andrej Karpathy, Justin Johnson, and Li Fei-Fei. Visualizing and Understanding Recurrent Networks, November 2015. arXiv:1506.02078 [cs]

Pith/arXiv arXiv 2015

[12] [12]

Wattenberg, J

Been Kim, M. Wattenberg, J. Gilmer, Carrie J. Cai, James Wexler, F. Viégas, and R. Sayres. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vec- tors (TCA V). InInternational Conference on Machine Learning, November 2017

2017

[13] [13]

Concept bottleneck models

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. InProceedings of the 37th International Conference on Machine Learning, volume 119 ofICML’20, pages 5338–5348. JMLR.org, July 2020

2020

[14] [14]

Captum: A unified and generic model interpretability library for PyTorch, September 2020

Narine Kokhlikyan, Vivek Miglani, Miguel Martin, Edward Wang, Bilal Alsallakh, Jonathan Reynolds, Alexander Melnikov, Natalia Kliushkina, Carlos Araya, Siqi Yan, and Orion Reblitz-Richardson. Captum: A unified and generic model interpretability library for PyTorch, September 2020. arXiv:2009.07896 [cs]

arXiv 2020

[15] [15]

Similarity of Neural Network Representations Revisited, July 2019

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of Neural Network Representations Revisited, July 2019. arXiv:1905.00414 [cs]

Pith/arXiv arXiv 2019

[16] [16]

Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, and John Hopcroft. Convergent Learning: Do different neural networks learn the same representations? InProceedings of the 1st Inter- national Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, pages 196–212. PMLR, December 2015

2015

[17] [17]

Lundberg and S-I

S. Lundberg and S-I. Lee. A unified approach to interpreting model predictions. InProc. of NIPS’17, 2017

2017

[18] [18]

High Resolution Cat-Dog-Bird Image Dataset (13000)

MahmoudNoor. High Resolution Cat-Dog-Bird Image Dataset (13000)

[19] [19]

Logic-based explainability in machine learning, 2023

Joao Marques-Silva. Logic-based explainability in machine learning, 2023

2023

[20] [20]

Explaining naive bayes and other linear classifiers with polynomial time and delay

Joao Marques-Silva, Thomas Gerspacher, Martin Cooper, Alexey Ignatiev, and Nina Naro- dytska. Explaining naive bayes and other linear classifiers with polynomial time and delay. Advances in Neural Information Processing Systems, 33:20590–20600, 2020. 10

2020

[21] [21]

Linguistic regularities in continuous space word representations

Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic regularities in continuous space word representations. In Lucy Vanderwende, Hal Daumé III, and Katrin Kirchhoff, editors, Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 746–751, Atlanta, Georgia,...

2013

[22] [22]

why should I trust you?

M. T. Ribeiro, S. Singh, and C. Guestrin. "why should I trust you?": Explaining the predictions of any classifier. InSIGKDD, pages 1135–1144, 2016

2016

[23] [23]

Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient- based Localization.International Journal of Computer Vision, 128(2):336–359, February

[24] [24]

arXiv:1610.02391 [cs]

arXiv

[25] [25]

A. Shih, A. Choi, and A. Darwiche. A symbolic approach to explaining bayesian network classifiers. InProc. of IJCAI’18, pages 5103–5111, 2018

2018

[26] [26]

Learning important features through propagating activation differences

Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. InProceedings of the 34th International Confer- ence on Machine Learning - Volume 70, ICML’17, pages 3145–3153, Sydney, NSW, Australia, August 2017. JMLR.org

2017

[27] [27]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, April 2014

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, April 2014. arXiv:1312.6034 [cs]

Pith/arXiv arXiv 2014

[28] [28]

Formal Abductive Latent Explanations for Prototype-Based Networks.Pro- ceedings of the AAAI Conference on Artificial Intelligence, 40(30):25590–25598, March 2026

Jules Soria, Zakaria Chihani, Julien Girard-Satabin, Alban Grastien, Romain Xu-Darme, and Daniela Cancila. Formal Abductive Latent Explanations for Prototype-Based Networks.Pro- ceedings of the AAAI Conference on Artificial Intelligence, 40(30):25590–25598, March 2026

2026

[29] [29]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InProceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pages 3319–3328, Sydney, NSW, Australia, August 2017. JMLR.org

2017

[30] [30]

Zeiler and Rob Fergus

Matthew D. Zeiler and Rob Fergus. Visualizing and Understanding Convolutional Networks. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors,Computer Vision – ECCV 2014, volume 8689, pages 818–833. Springer International Publishing, Cham, 2014. Series Title: Lecture Notes in Computer Science

2014

[31] [31]

Zintgraf, Taco S

Luisa M. Zintgraf, Taco S. Cohen, Tameem Adel, and Max Welling. Visualizing Deep Neural Network Decisions: Prediction Difference Analysis, February 2017. arXiv:1702.04595 [cs]. 11 A Notations, Acronyms and Datasets Notations/Acronyms Description XInput spaceX ⊆R P PInput dimension YLabel space{1, . . . , K} (xn, yn)i.i.d. draw from the joint distributionP...

Pith/arXiv arXiv 2017