Generative Counterfactual Introspection for Explainable Deep Learning

Bhavya Kailkhura; Donald Loveland; Shusen Liu; Yong Han

arxiv: 1907.03077 · v1 · pith:5WLEZMELnew · submitted 2019-07-06 · 💻 cs.LG · cs.AI· cs.CV· stat.ML

Generative Counterfactual Introspection for Explainable Deep Learning

Shusen Liu , Bhavya Kailkhura , Donald Loveland , Yong Han This is my paper

Pith reviewed 2026-05-25 01:41 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CVstat.ML

keywords counterfactual introspectiongenerative modelsexplainable AIdeep neural networksimage editingMNISTCelebA

0 comments

The pith

A generative model edits input images to answer counterfactual questions about deep neural network predictions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an introspection technique for deep neural networks that uses a generative model to perform salient edits on input images. These edits act as interventions that let users ask what meaningful change would alter a classifier's output. The method is shown on the MNIST dataset of handwritten digits and the CelebA dataset of face attributes. A sympathetic reader would care because it shifts explanation from passive feature highlighting to active what-if analysis of model decisions.

Core claim

The central claim is that a generative model can instigate salient editing of the input image, supplying the fundamental interventional operation needed to obtain answers to counterfactual inquiries about what meaningful change alters the prediction of a given classifier.

What carries the argument

Generative model for instigating salient editing of the input image to enable interventional counterfactual inquiries.

If this is right

Reveals interesting properties of the given classifiers on MNIST and CelebA.
Supplies a direct way to answer what change to the input would alter the prediction.
Provides an active editing operation that supports model interpretation beyond visualization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same editing principle could be tested on image classifiers trained on datasets larger than MNIST or CelebA.
If the edits prove consistent across multiple models, the technique might serve as a standardized test for decision boundaries.
Extending the generative steering to other data types such as text or audio would require new generators but follows the same logic.

Load-bearing premise

The generative model can be steered to produce edits that are both salient to the classifier and semantically meaningful to humans.

What would settle it

Running the method on a held-out image set and finding that the generated edits neither flip the classifier prediction nor produce human-recognizable semantic changes would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.03077 by Bhavya Kailkhura, Donald Loveland, Shusen Liu, Yong Han.

**Figure 2.** Figure 2: Finding criticism of the digit 9 class. As shown in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Finding prototypes of different digits. Without Regularization With Regularization [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: The effect of regularization on the optimization path for finding criticism of 9 in the direction of 7. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Illustrate different editing scheme for the input image. The original image is shown in (a). In (c), we show the edited image [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Illustrate attributes changes (beside the young/old attribute) that will make the images appear older for the given classifier. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: The potential bias in the CelebA dataset. The percentage of people having eye glass is much higher in the population [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Prototype and criticism for the images with ground truth label “old”. The left most column shows the original image. The [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

read the original abstract

In this work, we propose an introspection technique for deep neural networks that relies on a generative model to instigate salient editing of the input image for model interpretation. Such modification provides the fundamental interventional operation that allows us to obtain answers to counterfactual inquiries, i.e., what meaningful change can be made to the input image in order to alter the prediction. We demonstrate how to reveal interesting properties of the given classifiers by utilizing the proposed introspection approach on both the MNIST and the CelebA dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Paper sketches a generative approach to counterfactual explanations for deep image classifiers with demos on MNIST and CelebA, but the description is too high-level to evaluate the method's novelty or correctness.

read the letter

The paper's main idea is to use a generative model to make targeted edits to input images that change a classifier's output, treating those edits as the intervention needed for counterfactual questions about the model. They show examples on MNIST and CelebA to illustrate how this can surface properties of the classifier. What stands out is the direct link between generative editing and counterfactual introspection, which gives a concrete way to probe decision boundaries without relying solely on gradients or attention maps. The demos are straightforward and use familiar datasets, so the basic setup is easy to follow. The soft spots are the missing technical pieces. The abstract supplies no equations for how the generative model is steered or conditioned, no metrics for edit quality or minimality, and no comparison to earlier counterfactual or generative explanation methods. Without those, the claim that the edits provide meaningful answers to counterfactual inquiries rests on an unexamined assumption about salience and semantic validity. This is a real gap rather than a minor omission. The work is aimed at researchers already working on explainable AI for vision models. Someone in that area might pick up an idea or two from the framing, but the current version does not look like a finished or distinctive result. I would send it to peer review because the core direction is reasonable and the demonstrations exist, even though the authors would need to add substantial detail on the method and its validation before it could be considered solid.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an introspection technique for deep neural networks that relies on a generative model to perform salient edits on input images. This is presented as providing the interventional operation needed for counterfactual inquiries (what meaningful change alters the prediction), with demonstrations on MNIST and CelebA to reveal classifier properties.

Significance. If the generative edits can be rigorously shown to be both classifier-salient and semantically meaningful, the approach would offer a practical visual tool for counterfactual explanations in explainable AI. The demonstrations on standard datasets suggest potential utility, but the absence of equations, error analysis, or explicit validation of the steering mechanism limits the strength of the central claim.

major comments (2)

[Abstract] Abstract: The central claim that the generative modification 'provides the fundamental interventional operation' for counterfactual inquiries is not supported by any derivation, formal definition of the edit operation, or quantitative evidence that the edits are classifier-salient; the demonstrations are described only qualitatively.
[Abstract] The manuscript supplies no equations or error analysis (as noted in the abstract), which is load-bearing for validating that the edits achieve the claimed counterfactual interpretation rather than arbitrary or non-meaningful changes.

minor comments (2)

Clarify the precise architecture and training procedure of the generative model used for editing, including how it is steered to produce salient changes.
Add quantitative metrics (e.g., prediction change rates, human evaluation scores) to support the qualitative demonstrations on MNIST and CelebA.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the generative modification 'provides the fundamental interventional operation' for counterfactual inquiries is not supported by any derivation, formal definition of the edit operation, or quantitative evidence that the edits are classifier-salient; the demonstrations are described only qualitatively.

Authors: The manuscript describes the edit as optimization in the latent space of a generative model to flip the classifier output while preserving other attributes. We agree the abstract is high-level and lacks a formal definition or quantitative saliency metrics. We will add a precise definition of the interventional edit and report quantitative measures such as prediction change rate and semantic consistency scores. revision: yes
Referee: [Abstract] The manuscript supplies no equations or error analysis (as noted in the abstract), which is load-bearing for validating that the edits achieve the claimed counterfactual interpretation rather than arbitrary or non-meaningful changes.

Authors: The current version emphasizes the empirical procedure over mathematical formalization. We acknowledge that explicit equations for the latent optimization and an accompanying error analysis would strengthen validation of the counterfactual claim. These will be incorporated in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper proposes a generative-model-based introspection method for producing counterfactual edits on image classifiers, demonstrated on MNIST and CelebA. No derivation chain, equations, fitted parameters renamed as predictions, or self-citation load-bearing steps appear in the provided material. The central claim rests on empirical demonstrations rather than any reduction of outputs to inputs by construction, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations, no fitted parameters, and no explicit assumptions beyond the existence of a generative model capable of salient edits.

pith-pipeline@v0.9.0 · 5613 in / 1034 out tokens · 23286 ms · 2026-05-25T01:41:45.536841+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

min A' λ·lossC,c'(I(A')) + ||I−I(A')||p s.t. c'=C(I(A')) I(A')=G(I;A') (Eq. 2, §3.2)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use both of these formulations in our framework depending on whether actionable attributes are known or unknown, where the latter uses the latent representations as our attributes (§3.1)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 10 internal anchors

[1]

Deep learning

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015

work page 2015
[2]

Searching for exotic particles in high-energy physics with deep learning

Pierre Baldi, Peter Sadowski, and Daniel Whiteson. Searching for exotic particles in high-energy physics with deep learning. Nature communications, 5:4308, 2014

work page 2014
[3]

Deep learning for computational biology

Christof Angermueller, Tanel Pärnamaa, Leopold Parts, and Oliver Stegle. Deep learning for computational biology. Molecular systems biology, 12(7):878, 2016

work page 2016
[4]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classiﬁcation models and saliency maps. arXiv preprint arXiv:1312.6034, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[5]

Visualizing and understanding convolutional networks

Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. InEuropean conference on computer vision, pages 818–833. Springer, 2014

work page 2014
[6]

Understanding Neural Networks Through Deep Visualization

Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579, 2015. 7 Generative Counterfactual Introspection for Explainable Deep Learning A PREPRINT

work page internal anchor Pith review Pith/arXiv arXiv 2015
[7]

Causal inference in statistics: An overview

Judea Pearl et al. Causal inference in statistics: An overview. Statistics surveys, 3:96–146, 2009

work page 2009
[8]

Materials discovery and design using machine learning

Yue Liu, Tianlu Zhao, Wangwei Ju, and Siqi Shi. Materials discovery and design using machine learning. Journal of Materiomics, 3(3):159–177, 2017

work page 2017
[9]

Examples are not enough, learn to criticize! criticism for interpretability

Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. Examples are not enough, learn to criticize! criticism for interpretability. In Advances in Neural Information Processing Systems, pages 2280–2288, 2016

work page 2016
[10]

Counterfactual Visual Explanations

Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, and Stefan Lee. Counterfactual visual explanations. CoRR, abs/1904.07451, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[11]

Attgan: Facial attribute editing by only changing what you want

Zhenliang He, Wangmeng Zuo, Meina Kan, Shiguang Shan, and Xilin Chen. Attgan: Facial attribute editing by only changing what you want. IEEE Transactions on Image Processing, 2019

work page 2019
[12]

Grad-cam: Visual explanations from deep networks via gradient-based localization

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, pages 618–626, 2017

work page 2017
[13]

On pixel-wise explanations for non-linear classiﬁer decisions by layer-wise relevance propagation

Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On pixel-wise explanations for non-linear classiﬁer decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015

work page 2015
[14]

Why should i trust you?: Explaining the predictions of any classiﬁer

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you?: Explaining the predictions of any classiﬁer. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016

work page 2016
[15]

TreeView: Peeking into Deep Neural Networks Via Feature-Space Partitioning

Jayaraman J Thiagarajan, Bhavya Kailkhura, Prasanna Sattigeri, and Karthikeyan Natesan Ramamurthy. Treeview: Peeking into deep neural networks via feature-space partitioning. arXiv preprint arXiv:1611.07429, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[16]

Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks

Anh Nguyen, Jason Yosinski, and Jeff Clune. Multifaceted feature visualization: Uncovering the different types of features learned by each neuron in deep neural networks. arXiv preprint arXiv:1602.03616, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[17]

Visualizing higher-layer features of a deep network

Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. Visualizing higher-layer features of a deep network. University of Montreal, 1341(3):1, 2009

work page 2009
[18]

Feature visualization

Chris Olah, Alexander Mordvintsev, and Ludwig Schubert. Feature visualization. Distill, 2(11):e7, 2017

work page 2017
[19]

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). arXiv preprint arXiv:1711.11279, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[20]

Counterfactual fairness

Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4066–4076, 2017

work page 2017
[21]

Explaining Deep Learning Models using Causal Inference

Tanmayee Narendra, Anush Sankaran, Deepak Vijaykeerthy, and Senthil Mani. Explaining deep learning models using causal inference. arXiv preprint arXiv:1811.04376, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Counterfactual visual explanations

Jan Ernst Dhruv Batra Devi Parikh Stefan Lee Yash Goyal, Ziyan Wu. Counterfactual visual explanations. In ICML, pages 264–279, 2019

work page 2019
[23]

Grounding visual explanations

Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, and Zeynep Akata. Grounding visual explanations. In Proceedings of the European Conference on Computer Vision (ECCV), pages 264–279, 2018

work page 2018
[24]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[25]

Adversarial examples: Attacks and defenses for deep learning

Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. Adversarial examples: Attacks and defenses for deep learning. IEEE transactions on neural networks and learning systems, 2019

work page 2019
[26]

Universal Decision-Based Black-Box Perturbations: Breaking Security-Through-Obscurity Defenses

Thomas A Hogan and Bhavya Kailkhura. Universal hard-label black-box perturbations: Breaking security- through-obscurity defenses. arXiv preprint arXiv:1811.03733, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[27]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014

work page 2014
[28]

MNIST handwritten digit database

Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010

work page 2010
[29]

Gradient-based learning applied to document recognition

Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998
[30]

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[31]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pages 3730–3738, 2015. 8

work page 2015

[1] [1]

Deep learning

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015

work page 2015

[2] [2]

Searching for exotic particles in high-energy physics with deep learning

Pierre Baldi, Peter Sadowski, and Daniel Whiteson. Searching for exotic particles in high-energy physics with deep learning. Nature communications, 5:4308, 2014

work page 2014

[3] [3]

Deep learning for computational biology

Christof Angermueller, Tanel Pärnamaa, Leopold Parts, and Oliver Stegle. Deep learning for computational biology. Molecular systems biology, 12(7):878, 2016

work page 2016

[4] [4]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classiﬁcation models and saliency maps. arXiv preprint arXiv:1312.6034, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[5] [5]

Visualizing and understanding convolutional networks

Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. InEuropean conference on computer vision, pages 818–833. Springer, 2014

work page 2014

[6] [6]

Understanding Neural Networks Through Deep Visualization

Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579, 2015. 7 Generative Counterfactual Introspection for Explainable Deep Learning A PREPRINT

work page internal anchor Pith review Pith/arXiv arXiv 2015

[7] [7]

Causal inference in statistics: An overview

Judea Pearl et al. Causal inference in statistics: An overview. Statistics surveys, 3:96–146, 2009

work page 2009

[8] [8]

Materials discovery and design using machine learning

Yue Liu, Tianlu Zhao, Wangwei Ju, and Siqi Shi. Materials discovery and design using machine learning. Journal of Materiomics, 3(3):159–177, 2017

work page 2017

[9] [9]

Examples are not enough, learn to criticize! criticism for interpretability

Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. Examples are not enough, learn to criticize! criticism for interpretability. In Advances in Neural Information Processing Systems, pages 2280–2288, 2016

work page 2016

[10] [10]

Counterfactual Visual Explanations

Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, and Stefan Lee. Counterfactual visual explanations. CoRR, abs/1904.07451, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[11] [11]

Attgan: Facial attribute editing by only changing what you want

Zhenliang He, Wangmeng Zuo, Meina Kan, Shiguang Shan, and Xilin Chen. Attgan: Facial attribute editing by only changing what you want. IEEE Transactions on Image Processing, 2019

work page 2019

[12] [12]

Grad-cam: Visual explanations from deep networks via gradient-based localization

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, pages 618–626, 2017

work page 2017

[13] [13]

On pixel-wise explanations for non-linear classiﬁer decisions by layer-wise relevance propagation

Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On pixel-wise explanations for non-linear classiﬁer decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015

work page 2015

[14] [14]

Why should i trust you?: Explaining the predictions of any classiﬁer

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you?: Explaining the predictions of any classiﬁer. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016

work page 2016

[15] [15]

TreeView: Peeking into Deep Neural Networks Via Feature-Space Partitioning

Jayaraman J Thiagarajan, Bhavya Kailkhura, Prasanna Sattigeri, and Karthikeyan Natesan Ramamurthy. Treeview: Peeking into deep neural networks via feature-space partitioning. arXiv preprint arXiv:1611.07429, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[16] [16]

Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks

Anh Nguyen, Jason Yosinski, and Jeff Clune. Multifaceted feature visualization: Uncovering the different types of features learned by each neuron in deep neural networks. arXiv preprint arXiv:1602.03616, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[17] [17]

Visualizing higher-layer features of a deep network

Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. Visualizing higher-layer features of a deep network. University of Montreal, 1341(3):1, 2009

work page 2009

[18] [18]

Feature visualization

Chris Olah, Alexander Mordvintsev, and Ludwig Schubert. Feature visualization. Distill, 2(11):e7, 2017

work page 2017

[19] [19]

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). arXiv preprint arXiv:1711.11279, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[20] [20]

Counterfactual fairness

Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4066–4076, 2017

work page 2017

[21] [21]

Explaining Deep Learning Models using Causal Inference

Tanmayee Narendra, Anush Sankaran, Deepak Vijaykeerthy, and Senthil Mani. Explaining deep learning models using causal inference. arXiv preprint arXiv:1811.04376, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

Counterfactual visual explanations

Jan Ernst Dhruv Batra Devi Parikh Stefan Lee Yash Goyal, Ziyan Wu. Counterfactual visual explanations. In ICML, pages 264–279, 2019

work page 2019

[23] [23]

Grounding visual explanations

Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, and Zeynep Akata. Grounding visual explanations. In Proceedings of the European Conference on Computer Vision (ECCV), pages 264–279, 2018

work page 2018

[24] [24]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[25] [25]

Adversarial examples: Attacks and defenses for deep learning

Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. Adversarial examples: Attacks and defenses for deep learning. IEEE transactions on neural networks and learning systems, 2019

work page 2019

[26] [26]

Universal Decision-Based Black-Box Perturbations: Breaking Security-Through-Obscurity Defenses

Thomas A Hogan and Bhavya Kailkhura. Universal hard-label black-box perturbations: Breaking security- through-obscurity defenses. arXiv preprint arXiv:1811.03733, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[27] [27]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014

work page 2014

[28] [28]

MNIST handwritten digit database

Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010

work page 2010

[29] [29]

Gradient-based learning applied to document recognition

Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998

[30] [30]

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[31] [31]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pages 3730–3738, 2015. 8

work page 2015