In Defense of Information Leakage in Concept-based Models

Mateo Espinosa Zarlenga

arxiv: 2606.10669 · v1 · pith:LH5A4AHEnew · submitted 2026-06-09 · 💻 cs.LG · cs.AI· cs.CR

In Defense of Information Leakage in Concept-based Models

Mateo Espinosa Zarlenga This is my paper

Pith reviewed 2026-06-27 13:42 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR

keywords concept-based modelsinformation leakagemodel interpretabilityconcept incompletenessbenign leakageintervenabilitydeep neural networksmachine learning

0 comments

The pith

In real-world settings with incomplete concepts, some information leakage is necessary for concept-based models to stay accurate and intervenable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper challenges the conventional view that leakage of concept-irrelevant information in concept-based models always reduces interpretability. It notes that evidence for this harm is often inconclusive and that complete concept sets are rare in practice. Under these conditions, the authors argue that some leakage becomes structurally required to achieve both high accuracy and the ability to intervene on concepts. They introduce the notion of benign leakage and show that a reframed training objective can encourage models to use it productively.

Core claim

Concept-based models learn representations that leak concept-irrelevant information, which is traditionally viewed as undesirable because it leads to uninterpretable models. This view is ill-posed because evidence linking leakage to reduced interpretability is often inconclusive, and the push to eradicate leakage produces impractical models. In real-world settings where concept incompleteness is the norm, some leakage is often necessary for constructing accurate and intervenable concept-based models. By optimizing a reframing of the typical concept-based model training objective, models can encourage and exploit benign leakage without sacrificing accuracy or intervenability.

What carries the argument

A reframing of the typical concept-based model training objective that encourages and exploits benign leakage.

If this is right

Concept-based models can reach high accuracy even when provided concepts fail to capture all relevant information.
Intervenability on individual concepts remains possible when benign leakage is present.
Eradicating leakage entirely produces models that are less practical under typical data constraints.
A reframed training objective allows models to use extra information without losing the benefits of concept alignment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tolerance for controlled leakage might apply to other interpretability methods that rely on incomplete human-provided features.
Datasets could be annotated with explicit measures of concept completeness to test when leakage becomes necessary.
Guidelines for concept selection in applications could shift from completeness to identifying which extra information is benign.

Load-bearing premise

The premise that concept incompleteness is the norm in real-world settings and that evidence linking leakage to reduced interpretability is often inconclusive.

What would settle it

An empirical demonstration that concept-based models can achieve high accuracy and full intervenability on incomplete concept sets while completely eliminating all leakage would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.10669 by Mateo Espinosa Zarlenga.

**Figure 1.** Figure 1: A generalized Concept Bottleneck Model (CBM). may be optimized independently, sequentially (training g before f), or jointly (minimizing a weighted sum). The simplest instantiation of this framework is a Sigmoidal CBM (Koh et al., 2020), where each concept is represented by a single sigmoidal scalar (m = 1) and s is the identity. More recent works have explored richer representations and scoring mechanisms… view at source ↗

**Figure 2.** Figure 2: Leakage happens when the concept representation Cˆi encodes information that is not attributable to the ground-truth concept Ci. This additional information can be attributed to (left) another concept Cj or (right) the downstream task Y . Intervenability Claims A first line of argument holds that leakage undermines the intervenability of CMs (Havasi et al., 2022; Espinosa Zarlenga et al., 2023a; Vandenhirt… view at source ↗

**Figure 3.** Figure 3: Task fidelity of sigmoidal (i.e., soft) CBMs as we vary the number k of training concepts on the CUB (Wah et al., 2011) dataset (see App. B for details). CBMs that limit task leakage (e.g., independent CBMs) are more affected by incompleteness. complete description (Yeh et al., 2020) of the downstream task y. Formally, this means CMs require that the mutual information between the downstream label y ∼ P(Y … view at source ↗

**Figure 4.** Figure 4: Task accuracy (y-axis) of leaky CMs trained on an incomplete version of CUB (k = 22, details in App. B) as we intervene on concepts at test time (x-axis). (left) Leaky CMs can remain intervenable in incompleteness. (right) Some leaky variants, however, lose their intervenability (curves are non-increasing). Simplicity Bias and Benign Leakage CMs that enable leakage can remain highly accurate and interven… view at source ↗

**Figure 5.** Figure 5: A DNN trained on an incomplete CUB task (k = 22) w/o any concept alignment loss becomes intervenable with Lint. in the penultimate layer of the DNN and intervened on them as if they were normal CBM soft concept representations (which they are not, as we do not introduce any concept alignment losses when training this model, and the neurons do not form a bottleneck). We see that this DNN, trained without an… view at source ↗

**Figure 6.** Figure 6: Effect of regularizing previously unintervenable leaky CMs with Lint on an incomplete version of CUB (k = 22). 7 [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: is not the same: for example, although CEMs without RandInt do increase their task accuracies as they are intervened on, the effect is minimal compared to that of other models. This implies that it is likely that incompleteness can lead to several leaky models to become unintervenable, but its effects on leaky CMs can vary. 0 20 40 60 80 100 Intervened Concepts (%) 40 60 80 100 Task Accuracy (%) Intervenab… view at source ↗

**Figure 8.** Figure 8: Extended results showing the effect of applying the sufficiency regularizer Lint to previously unintervenable leakage-enabling models on an incomplete version of CUB with k = 22 concepts. E. Verifying Trends on Additional Datasets To further verify the generality of the claims in Section 6, we replicate our key intervention experiments on four additional datasets that span a range of incompleteness and noi… view at source ↗

**Figure 9.** Figure 9: Intervention curves on additional tasks, verifying the results discussed in Section 6 for the incomplete version of CUB. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Top-m weight overlap for (1) a properly-conditioned Hybrid CBM MH and a non-leaky Independent CBM MI (orange); (2) two Independent CBMs trained with different seeds (green); (3) an ill-conditioned Hybrid CBM and a non-leaky Independent CBM MI (blue); and (4) a random bottleneck subset and Independent CBM MI (gray dashed line, random baseline). 19 [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

read the original abstract

Concept-based models (CMs), deep neural networks that ground their predictions on representations aligned with human-understandable concepts (e.g., "round", "stripes", etc.), have been shown to learn representations that leak concept-irrelevant information. As the traditional narrative goes, this leakage is undesirable and should be eradicated as it leads to uninterpretable models. In this paper, we posit that this conventional view of leakage in CMs is not only ill-posed, as the evidence of how leakage makes a model less interpretable is often inconclusive, but also bound to lead to impractical CMs under common real-world constraints. Specifically, we argue that in real-world settings where concept incompleteness is the norm, some leakage is often necessary for constructing accurate and intervenable CMs. To this end, we propose that there is such a thing as benign leakage and show that, by optimizing a reframing of the typical CM training objective, CMs can encourage and exploit this form of leakage without sacrificing accuracy or intervenability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper makes a fair case that zero-leakage is unrealistic for incomplete concept sets but offers no mechanism or results showing that encouraged leakage preserves intervenability.

read the letter

The core point is that this paper treats leakage in concept-based models as potentially useful rather than something to stamp out. It argues that concept incompleteness is the normal case, so some leakage helps accuracy and that a reframed objective can make the leakage benign without losing intervenability.

What is new is the explicit label of benign leakage and the suggestion to optimize toward it instead of minimizing all leakage. Prior work on concept models mostly treated leakage as a flaw to remove, so this reframing is a shift in how the training goal is stated.

The paper is right that evidence linking leakage directly to worse interpretability is often indirect or task-specific, and that forcing zero leakage on incomplete concepts can hurt performance. That observation is useful for anyone building these models on real data.

The main weakness is that the claim about preserved intervenability is asserted without a mechanism. If the model learns to use leaked features downstream, intervening on the concept activations may no longer produce the expected change in the output. The abstract gives no auxiliary loss, constraint, or experiment that would rule this out, so the central practical guarantee is untested on the information provided.

This is for people already working on concept-based explainable AI who want to question the zero-leakage default. It is worth sending to peer review so the full paper can be checked for experiments or a concrete training procedure that addresses the intervenability issue.

Referee Report

2 major / 1 minor

Summary. The paper argues that the standard view of information leakage in concept-based models (CMs) as inherently harmful to interpretability is ill-posed, since supporting evidence is often inconclusive, and that concept incompleteness is the norm in real settings. It introduces the notion of 'benign leakage' as sometimes necessary for accurate and intervenable CMs, and claims that a reframing of the typical CM training objective can encourage and exploit this leakage without loss of accuracy or intervenability.

Significance. If the claims hold, the work offers a conceptual reframing that could relax overly restrictive no-leakage requirements in CM design, enabling more practical models under incomplete concept supervision. The emphasis on intervenability preservation is a strength if demonstrated, as it directly engages a core desideratum of CMs.

major comments (2)

[Abstract] Abstract: the central claim that a reframed training objective encourages benign leakage 'without sacrificing ... intervenability' lacks any described mechanism (auxiliary loss, architectural constraint, or regularizer) ensuring that concept-irrelevant leaked features remain inert under interventions; without this, the skeptic concern that leakage may allow downstream compensation for concept changes is unaddressed.
[Abstract] Abstract, paragraph 2: the assertion that 'evidence of how leakage makes a model less interpretable is often inconclusive' is load-bearing for reclassifying leakage as potentially benign, yet no specific prior results, datasets, or quantitative re-analyses are referenced to substantiate the claim.

minor comments (1)

The term 'benign leakage' is introduced as a new category but receives no operational definition or distinction from other forms of leakage in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating revisions where the concerns are valid.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that a reframed training objective encourages benign leakage 'without sacrificing ... intervenability' lacks any described mechanism (auxiliary loss, architectural constraint, or regularizer) ensuring that concept-irrelevant leaked features remain inert under interventions; without this, the skeptic concern that leakage may allow downstream compensation for concept changes is unaddressed.

Authors: The manuscript (Section 3) defines the reframed objective as a modified concept-alignment loss that explicitly permits leakage of concept-irrelevant features while retaining the standard intervention protocol at test time. Section 4 then reports intervention experiments showing that prediction shifts upon concept edits remain consistent with the intended concept change and are not offset by leaked features. We agree the abstract omits this description and will revise it to name the reframed objective and note the empirical intervenability results. revision: yes
Referee: [Abstract] Abstract, paragraph 2: the assertion that 'evidence of how leakage makes a model less interpretable is often inconclusive' is load-bearing for reclassifying leakage as potentially benign, yet no specific prior results, datasets, or quantitative re-analyses are referenced to substantiate the claim.

Authors: The full manuscript reviews this literature in the introduction and related-work section, citing studies in which leakage was measured yet downstream interpretability metrics remained stable or context-dependent. The abstract itself contains no citations. We will add two to three representative references to the abstract to ground the claim. revision: yes

Circularity Check

0 steps flagged

No circularity; position paper with no derivation chain

full rationale

The manuscript is an argumentative position paper asserting that some information leakage can be 'benign' under concept incompleteness. No equations, fitted parameters, or formal derivations appear in the abstract or described content. The central claim is a normative reframing of prior literature rather than any prediction or result that reduces to its own inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked in the provided text. This matches the default expectation of a self-contained argument without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on two domain assumptions: that concept incompleteness is the norm in real applications and that leakage is required for accuracy and intervenability when incompleteness holds. No free parameters or invented entities beyond the term 'benign leakage' are introduced in the abstract.

axioms (2)

domain assumption concept incompleteness is the norm in real-world settings
Invoked in abstract paragraph 2 as the condition under which leakage becomes necessary.
domain assumption some leakage is often necessary for constructing accurate and intervenable CMs
Core premise that justifies the shift from eradication to exploitation of leakage.

invented entities (1)

benign leakage no independent evidence
purpose: A form of concept-irrelevant information that can be encouraged without harming accuracy or intervenability
New label introduced to distinguish desirable from undesirable leakage; no independent evidence supplied in abstract.

pith-pipeline@v0.9.1-grok · 5704 in / 1427 out tokens · 18210 ms · 2026-06-27T13:42:23.852146+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 3 canonical work pages

[1]

Dropout:

Srivastava, Nitish and Hinton, Geoffrey and Krizhevsky, Alex and Sutskever, Ilya and Salakhutdinov, Ruslan , journal=. Dropout:. 2014 , publisher=

2014
[2]

international conference on machine learning , pages=

Dropout as a bayesian approximation: Representing model uncertainty in deep learning , author=. international conference on machine learning , pages=. 2016 , organization=

2016
[3]

International conference on machine learning , pages=

Batch normalization: Accelerating deep network training by reducing internal covariate shift , author=. International conference on machine learning , pages=. 2015 , organization=

2015
[4]

Large-scale

Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou , journal=. Large-scale
[5]

Deng, Li , journal=. The. 2012 , publisher=

2012
[6]

and Branson, S

Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S. , Year =. The
[7]

2019 , publisher=

Johnson, Alistair EW and Pollard, Tom J and Berkowitz, Seth J and Greenbaum, Nathaniel R and Lungren, Matthew P and Deng, Chih-ying and Mark, Roger G and Horng, Steven , journal=. 2019 , publisher=

2019
[8]

Irvin, Jeremy and Rajpurkar, Pranav and Ko, Michael and Yu, Yifan and Ciurea-Ilcus, Silviana and Chute, Chris and Marklund, Henrik and Haghgoo, Behzad and Ball, Robyn and Shpanskaya, Katie and others , booktitle=
[9]

2020 , publisher=

Bustos, Aurelia and Pertusa, Antonio and Salinas, Jose-Maria and De La Iglesia-Vaya, Maria , journal=. 2020 , publisher=

2020
[10]

Tschandl, Philipp and Rosendahl, Cliff and Kittler, Harald , journal=. The. 2018 , publisher=

2018
[11]

Clinical rheumatology , volume=

Knee osteoarthritis: interpretation variability of radiological signs , author=. Clinical rheumatology , volume=. 2004 , publisher=

2004
[12]

Rheumatic Disease Clinics of North America , volume=

Imaging in osteoarthritis , author=. Rheumatic Disease Clinics of North America , volume=. 2008 , publisher=

2008
[13]

ACM computing surveys (CSUR) , volume=

A survey of deep active learning , author=. ACM computing surveys (CSUR) , volume=. 2021 , publisher=

2021
[14]

International Conference on Artificial Intelligence and Statistics , pages=

Learning to defer to a population: A meta-learning approach , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

2024
[15]

Proceedings of the 41st International Conference on Machine Learning , articleno =

Mao, Anqi and Mohri, Mehryar and Zhong, Yutao , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

2024
[16]

Mathematics , volume=

A survey on active learning: State-of-the-art, practical challenges and research directions , author=. Mathematics , volume=. 2023 , publisher=

2023
[17]

Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered

Monarch, Robert Munro , year=. Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered
[18]

2009 , number=

Learning multiple layers of features from tiny images , author=. 2009 , number=

2009
[19]

IEEE transactions on pattern analysis and machine intelligence , volume=

Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2018 , publisher=

2018
[20]

Visualizing data using

Van der Maaten, Laurens and Hinton, Geoffrey , journal=. Visualizing data using
[21]

Categorical Reparametrization with

Jang, Eric and Gu, Shixiang and Poole, Ben , booktitle=. Categorical Reparametrization with. 2017 , organization=

2017
[22]

Machine Intelligence 15 , pages=

A Framework for Behavioural Cloning , author=. Machine Intelligence 15 , pages=
[23]

Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=

A reduction of imitation learning and structured prediction to no-regret online learning , author=. Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=. 2011 , organization=

2011
[24]

Advances in Neural Information Processing Systems , volume=

Joint active feature acquisition and classification with variable-size set encoding , author=. Advances in Neural Information Processing Systems , volume=
[25]

Auto-encoding variational

Kingma, Diederik P and Welling, Max , journal=. Auto-encoding variational
[26]

Advances in Neural Information Processing Systems , volume=

Generative adversarial imitation learning , author=. Advances in Neural Information Processing Systems , volume=
[27]

2010 , publisher=

Modeling purposeful adaptive behavior with the principle of maximum causal entropy , author=. 2010 , publisher=

2010
[28]

Advances in Neural Information Processing Systems , volume=

Posterior Matching for Arbitrary Conditioning , author=. Advances in Neural Information Processing Systems , volume=
[29]

International Conference on Machine Learning , pages=

Active feature acquisition with generative surrogate models , author=. International Conference on Machine Learning , pages=. 2021 , organization=

2021
[30]

Biometrika , volume=

Causal diagrams for empirical research , author=. Biometrika , volume=. 1995 , publisher=

1995
[31]

Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=

Deep sparse rectifier neural networks , author=. Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=. 2011 , organization=

2011
[32]

Counterfactual explanations without opening the black box: Automated decisions and the

Wachter, Sandra and Mittelstadt, Brent and Russell, Chris , journal=. Counterfactual explanations without opening the black box: Automated decisions and the. 2017 , publisher=

2017
[33]

Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical

Dur. Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical. Journal of Medical Ethics , volume=. 2021 , publisher=

2021
[34]

arXiv preprint arXiv:1904.12584 , year=

The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision , author=. arXiv preprint arXiv:1904.12584 , year=

Pith/arXiv arXiv 1904
[35]

arXiv preprint physics/0004057 , year=

The information bottleneck method , author=. arXiv preprint physics/0004057 , year=

Pith/arXiv arXiv
[36]

2015 IEEE information theory workshop (ITW) , pages=

Deep learning and the information bottleneck principle , author=. 2015 IEEE information theory workshop (ITW) , pages=. 2015 , organization=

2015
[37]

1969 , publisher=

Perceptrons: An introduction to computational geometry , author=. 1969 , publisher=

1969
[38]

International Conference on Learning Representations , year=

On the Information Bottleneck Theory of Deep Learning , author=. International Conference on Learning Representations , year=
[39]

Proceedings of the 33rd International Conference on Neural Information Processing Systems , articleno =

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and K\". Proceedings of the 33rd International Conference on Neural Information Processing Systems , articleno =. 2019 , publisher =

2019
[40]

doi:10.5281/zenodo.3828935 , license =

Falcon, William and. doi:10.5281/zenodo.3828935 , license =

work page doi:10.5281/zenodo.3828935
[41]

Hunter, J. D. , Title =. Computing in Science & Engineering , Volume =
[42]

Scikit-learn: Machine learning in

Pedregosa, Fabian and Varoquaux, Ga. Scikit-learn: Machine learning in. the Journal of machine Learning research , volume=. 2011 , publisher=

2011
[43]

CoRR, abs/1211.5063 , volume=

Understanding the exploding gradient problem , author=. CoRR, abs/1211.5063 , volume=

Pith/arXiv arXiv
[44]

Rectifier nonlinearities improve neural network acoustic models , author=. Proc. icml , volume=. 2013 , organization=

2013
[45]

Osdi , volume=

Tensorflow: A system for large-scale machine learning , author=. Osdi , volume=. 2016 , organization=

2016
[46]

arXiv preprint arXiv:1703.00810 , year=

Opening the black box of deep neural networks via information , author=. arXiv preprint arXiv:1703.00810 , year=

Pith/arXiv arXiv
[47]

NeurIPS Workshop on eXplainable AI approaches for debugging and diagnosis (XAI4Debugging) , year=

Efficient decompositional rule extraction for deep neural networks , author=. NeurIPS Workshop on eXplainable AI approaches for debugging and diagnosis (XAI4Debugging) , year=
[48]

Machine learning , volume=

Support-vector networks , author=. Machine learning , volume=. 1995 , publisher=

1995
[49]

Automation and remote control , volume=

Theoretical foundations of the potential function method in pattern recognition learning , author=. Automation and remote control , volume=
[50]

IEEE transactions on information theory , volume=

On the mean accuracy of statistical pattern recognizers , author=. IEEE transactions on information theory , volume=. 1968 , publisher=

1968
[51]

Technometrics , volume=

Detection of influential observation in linear regression , author=. Technometrics , volume=. 2000 , publisher=

2000
[52]

International conference on machine learning , pages=

Understanding black-box predictions via influence functions , author=. International conference on machine learning , pages=. 2017 , organization=

2017
[53]

Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos , booktitle=
[54]

Advances in Neural Information Processing Systems , volume=

A unified approach to interpreting model predictions , author=. Advances in Neural Information Processing Systems , volume=
[55]

Anchors:

Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos , booktitle=. Anchors:
[56]

Proceedings of the AAAI conference on artificial intelligence , volume=

Interpretation of neural networks is fragile , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[57]

Advances in Neural Information Processing Systems , volume=

Sanity checks for saliency maps , author=. Advances in Neural Information Processing Systems , volume=
[58]

u tt, Kristof T. and D \

Kindermans, Pieter-Jan and Hooker, Sara and Adebayo, Julius and Alber, Maximilian and Sch \"u tt, Kristof T. and D \"a hne, Sven and Erhan, Dumitru and Kim, Been. The (Un)reliability of Saliency Methods. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. 2019. doi:10.1007/978-3-030-28954-6_14

work page doi:10.1007/978-3-030-28954-6_14 2019
[59]

Advances in Neural Information Processing Systems , volume=

Explanations can be manipulated and geometry is to blame , author=. Advances in Neural Information Processing Systems , volume=
[60]

PloS one , volume=

On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation , author=. PloS one , volume=. 2015 , publisher=

2015
[61]

Selvaraju, Ramprasaath R and Cogswell, Michael and Das, Abhishek and Vedantam, Ramakrishna and Parikh, Devi and Batra, Dhruv , booktitle=
[62]

University of Montreal , volume=

Visualizing higher-layer features of a deep network , author=. University of Montreal , volume=
[63]

International Conference on Machine Learning , pages=

Axiomatic attribution for deep networks , author=. International Conference on Machine Learning , pages=. 2017 , organization=

2017
[64]

arXiv preprint arXiv:1706.03825 , year=

Smilkov, Daniel and Thorat, Nikhil and Kim, Been and Vi. arXiv preprint arXiv:1706.03825 , year=

Pith/arXiv arXiv
[65]

Nature Machine Intelligence , volume=

A case-based interpretable deep learning model for classification of mass lesions in digital mammography , author=. Nature Machine Intelligence , volume=. 2021 , publisher=

2021
[66]

2021 , publisher=

DeGrave, Alex J and Janizek, Joseph D and Lee, Su-In , journal=. 2021 , publisher=

2021
[67]

High-dimensional brain in a high-dimensional world:

Gorban, Alexander N and Makarov, Valery A and Tyukin, Ivan Y , journal=. High-dimensional brain in a high-dimensional world:. 2020 , publisher=

2020
[68]

Salt and pepper noise:

Azzeh, Jamil and Zahran, Bilal and Alqadi, Ziad , journal=. Salt and pepper noise:
[69]

Trust in

Shen, Max W , journal=. Trust in
[70]

Information Fusion , volume=

Arrieta, Alejandro Barredo and D. Information Fusion , volume=. 2020 , publisher=

2020
[71]

International conference on machine learning , pages=

Axiomatic attribution for deep networks , author=. International conference on machine learning , pages=. 2017 , organization=

2017
[72]

arXiv preprint arXiv:1705.05598 , year=

Learning how to explain neural networks: Patternnet and patternattribution , author=. arXiv preprint arXiv:1705.05598 , year=

Pith/arXiv arXiv
[73]

Selvaraju, Ramprasaath R and Das, Abhishek and Vedantam, Ramakrishna and Cogswell, Michael and Parikh, Devi and Batra, Dhruv , journal=
[74]

Towards Automating Model Explanations with Certified Robustness Guarantees , author=
[75]

Advances in Neural Information Processing Systems , volume=

Towards robust interpretability with self-explaining neural networks , author=. Advances in Neural Information Processing Systems , volume=
[76]

Interpretability beyond classification output:

Losch, Max and Fritz, Mario and Schiele, Bernt , journal=. Interpretability beyond classification output:
[77]

Advances in Neural Information Processing Systems , volume=

Generative causal explanations of black-box classifiers , author=. Advances in Neural Information Processing Systems , volume=
[78]

arXiv preprint arXiv:2201.00572 , year=

Concept Embeddings for Fuzzy Logic Verification of Deep Neural Networks in Perception Tasks , author=. arXiv preprint arXiv:2201.00572 , year=

arXiv
[79]

arXiv preprint arXiv:2007.07375 , year=

Concept learners for few-shot learning , author=. arXiv preprint arXiv:2007.07375 , year=

arXiv 2007
[80]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Learning compositional representations for few-shot recognition , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Showing first 80 references.

[1] [1]

Dropout:

Srivastava, Nitish and Hinton, Geoffrey and Krizhevsky, Alex and Sutskever, Ilya and Salakhutdinov, Ruslan , journal=. Dropout:. 2014 , publisher=

2014

[2] [2]

international conference on machine learning , pages=

Dropout as a bayesian approximation: Representing model uncertainty in deep learning , author=. international conference on machine learning , pages=. 2016 , organization=

2016

[3] [3]

International conference on machine learning , pages=

Batch normalization: Accelerating deep network training by reducing internal covariate shift , author=. International conference on machine learning , pages=. 2015 , organization=

2015

[4] [4]

Large-scale

Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou , journal=. Large-scale

[5] [5]

Deng, Li , journal=. The. 2012 , publisher=

2012

[6] [6]

and Branson, S

Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S. , Year =. The

[7] [7]

2019 , publisher=

Johnson, Alistair EW and Pollard, Tom J and Berkowitz, Seth J and Greenbaum, Nathaniel R and Lungren, Matthew P and Deng, Chih-ying and Mark, Roger G and Horng, Steven , journal=. 2019 , publisher=

2019

[8] [8]

Irvin, Jeremy and Rajpurkar, Pranav and Ko, Michael and Yu, Yifan and Ciurea-Ilcus, Silviana and Chute, Chris and Marklund, Henrik and Haghgoo, Behzad and Ball, Robyn and Shpanskaya, Katie and others , booktitle=

[9] [9]

2020 , publisher=

Bustos, Aurelia and Pertusa, Antonio and Salinas, Jose-Maria and De La Iglesia-Vaya, Maria , journal=. 2020 , publisher=

2020

[10] [10]

Tschandl, Philipp and Rosendahl, Cliff and Kittler, Harald , journal=. The. 2018 , publisher=

2018

[11] [11]

Clinical rheumatology , volume=

Knee osteoarthritis: interpretation variability of radiological signs , author=. Clinical rheumatology , volume=. 2004 , publisher=

2004

[12] [12]

Rheumatic Disease Clinics of North America , volume=

Imaging in osteoarthritis , author=. Rheumatic Disease Clinics of North America , volume=. 2008 , publisher=

2008

[13] [13]

ACM computing surveys (CSUR) , volume=

A survey of deep active learning , author=. ACM computing surveys (CSUR) , volume=. 2021 , publisher=

2021

[14] [14]

International Conference on Artificial Intelligence and Statistics , pages=

Learning to defer to a population: A meta-learning approach , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

2024

[15] [15]

Proceedings of the 41st International Conference on Machine Learning , articleno =

Mao, Anqi and Mohri, Mehryar and Zhong, Yutao , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

2024

[16] [16]

Mathematics , volume=

A survey on active learning: State-of-the-art, practical challenges and research directions , author=. Mathematics , volume=. 2023 , publisher=

2023

[17] [17]

Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered

Monarch, Robert Munro , year=. Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered

[18] [18]

2009 , number=

Learning multiple layers of features from tiny images , author=. 2009 , number=

2009

[19] [19]

IEEE transactions on pattern analysis and machine intelligence , volume=

Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2018 , publisher=

2018

[20] [20]

Visualizing data using

Van der Maaten, Laurens and Hinton, Geoffrey , journal=. Visualizing data using

[21] [21]

Categorical Reparametrization with

Jang, Eric and Gu, Shixiang and Poole, Ben , booktitle=. Categorical Reparametrization with. 2017 , organization=

2017

[22] [22]

Machine Intelligence 15 , pages=

A Framework for Behavioural Cloning , author=. Machine Intelligence 15 , pages=

[23] [23]

Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=

A reduction of imitation learning and structured prediction to no-regret online learning , author=. Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=. 2011 , organization=

2011

[24] [24]

Advances in Neural Information Processing Systems , volume=

Joint active feature acquisition and classification with variable-size set encoding , author=. Advances in Neural Information Processing Systems , volume=

[25] [25]

Auto-encoding variational

Kingma, Diederik P and Welling, Max , journal=. Auto-encoding variational

[26] [26]

Advances in Neural Information Processing Systems , volume=

Generative adversarial imitation learning , author=. Advances in Neural Information Processing Systems , volume=

[27] [27]

2010 , publisher=

Modeling purposeful adaptive behavior with the principle of maximum causal entropy , author=. 2010 , publisher=

2010

[28] [28]

Advances in Neural Information Processing Systems , volume=

Posterior Matching for Arbitrary Conditioning , author=. Advances in Neural Information Processing Systems , volume=

[29] [29]

International Conference on Machine Learning , pages=

Active feature acquisition with generative surrogate models , author=. International Conference on Machine Learning , pages=. 2021 , organization=

2021

[30] [30]

Biometrika , volume=

Causal diagrams for empirical research , author=. Biometrika , volume=. 1995 , publisher=

1995

[31] [31]

Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=

Deep sparse rectifier neural networks , author=. Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=. 2011 , organization=

2011

[32] [32]

Counterfactual explanations without opening the black box: Automated decisions and the

Wachter, Sandra and Mittelstadt, Brent and Russell, Chris , journal=. Counterfactual explanations without opening the black box: Automated decisions and the. 2017 , publisher=

2017

[33] [33]

Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical

Dur. Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical. Journal of Medical Ethics , volume=. 2021 , publisher=

2021

[34] [34]

arXiv preprint arXiv:1904.12584 , year=

The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision , author=. arXiv preprint arXiv:1904.12584 , year=

Pith/arXiv arXiv 1904

[35] [35]

arXiv preprint physics/0004057 , year=

The information bottleneck method , author=. arXiv preprint physics/0004057 , year=

Pith/arXiv arXiv

[36] [36]

2015 IEEE information theory workshop (ITW) , pages=

Deep learning and the information bottleneck principle , author=. 2015 IEEE information theory workshop (ITW) , pages=. 2015 , organization=

2015

[37] [37]

1969 , publisher=

Perceptrons: An introduction to computational geometry , author=. 1969 , publisher=

1969

[38] [38]

International Conference on Learning Representations , year=

On the Information Bottleneck Theory of Deep Learning , author=. International Conference on Learning Representations , year=

[39] [39]

Proceedings of the 33rd International Conference on Neural Information Processing Systems , articleno =

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and K\". Proceedings of the 33rd International Conference on Neural Information Processing Systems , articleno =. 2019 , publisher =

2019

[40] [40]

doi:10.5281/zenodo.3828935 , license =

Falcon, William and. doi:10.5281/zenodo.3828935 , license =

work page doi:10.5281/zenodo.3828935

[41] [41]

Hunter, J. D. , Title =. Computing in Science & Engineering , Volume =

[42] [42]

Scikit-learn: Machine learning in

Pedregosa, Fabian and Varoquaux, Ga. Scikit-learn: Machine learning in. the Journal of machine Learning research , volume=. 2011 , publisher=

2011

[43] [43]

CoRR, abs/1211.5063 , volume=

Understanding the exploding gradient problem , author=. CoRR, abs/1211.5063 , volume=

Pith/arXiv arXiv

[44] [44]

Rectifier nonlinearities improve neural network acoustic models , author=. Proc. icml , volume=. 2013 , organization=

2013

[45] [45]

Osdi , volume=

Tensorflow: A system for large-scale machine learning , author=. Osdi , volume=. 2016 , organization=

2016

[46] [46]

arXiv preprint arXiv:1703.00810 , year=

Opening the black box of deep neural networks via information , author=. arXiv preprint arXiv:1703.00810 , year=

Pith/arXiv arXiv

[47] [47]

NeurIPS Workshop on eXplainable AI approaches for debugging and diagnosis (XAI4Debugging) , year=

Efficient decompositional rule extraction for deep neural networks , author=. NeurIPS Workshop on eXplainable AI approaches for debugging and diagnosis (XAI4Debugging) , year=

[48] [48]

Machine learning , volume=

Support-vector networks , author=. Machine learning , volume=. 1995 , publisher=

1995

[49] [49]

Automation and remote control , volume=

Theoretical foundations of the potential function method in pattern recognition learning , author=. Automation and remote control , volume=

[50] [50]

IEEE transactions on information theory , volume=

On the mean accuracy of statistical pattern recognizers , author=. IEEE transactions on information theory , volume=. 1968 , publisher=

1968

[51] [51]

Technometrics , volume=

Detection of influential observation in linear regression , author=. Technometrics , volume=. 2000 , publisher=

2000

[52] [52]

International conference on machine learning , pages=

Understanding black-box predictions via influence functions , author=. International conference on machine learning , pages=. 2017 , organization=

2017

[53] [53]

Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos , booktitle=

[54] [54]

Advances in Neural Information Processing Systems , volume=

A unified approach to interpreting model predictions , author=. Advances in Neural Information Processing Systems , volume=

[55] [55]

Anchors:

Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos , booktitle=. Anchors:

[56] [56]

Proceedings of the AAAI conference on artificial intelligence , volume=

Interpretation of neural networks is fragile , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

[57] [57]

Advances in Neural Information Processing Systems , volume=

Sanity checks for saliency maps , author=. Advances in Neural Information Processing Systems , volume=

[58] [58]

u tt, Kristof T. and D \

Kindermans, Pieter-Jan and Hooker, Sara and Adebayo, Julius and Alber, Maximilian and Sch \"u tt, Kristof T. and D \"a hne, Sven and Erhan, Dumitru and Kim, Been. The (Un)reliability of Saliency Methods. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. 2019. doi:10.1007/978-3-030-28954-6_14

work page doi:10.1007/978-3-030-28954-6_14 2019

[59] [59]

Advances in Neural Information Processing Systems , volume=

Explanations can be manipulated and geometry is to blame , author=. Advances in Neural Information Processing Systems , volume=

[60] [60]

PloS one , volume=

On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation , author=. PloS one , volume=. 2015 , publisher=

2015

[61] [61]

Selvaraju, Ramprasaath R and Cogswell, Michael and Das, Abhishek and Vedantam, Ramakrishna and Parikh, Devi and Batra, Dhruv , booktitle=

[62] [62]

University of Montreal , volume=

Visualizing higher-layer features of a deep network , author=. University of Montreal , volume=

[63] [63]

International Conference on Machine Learning , pages=

Axiomatic attribution for deep networks , author=. International Conference on Machine Learning , pages=. 2017 , organization=

2017

[64] [64]

arXiv preprint arXiv:1706.03825 , year=

Smilkov, Daniel and Thorat, Nikhil and Kim, Been and Vi. arXiv preprint arXiv:1706.03825 , year=

Pith/arXiv arXiv

[65] [65]

Nature Machine Intelligence , volume=

A case-based interpretable deep learning model for classification of mass lesions in digital mammography , author=. Nature Machine Intelligence , volume=. 2021 , publisher=

2021

[66] [66]

2021 , publisher=

DeGrave, Alex J and Janizek, Joseph D and Lee, Su-In , journal=. 2021 , publisher=

2021

[67] [67]

High-dimensional brain in a high-dimensional world:

Gorban, Alexander N and Makarov, Valery A and Tyukin, Ivan Y , journal=. High-dimensional brain in a high-dimensional world:. 2020 , publisher=

2020

[68] [68]

Salt and pepper noise:

Azzeh, Jamil and Zahran, Bilal and Alqadi, Ziad , journal=. Salt and pepper noise:

[69] [69]

Trust in

Shen, Max W , journal=. Trust in

[70] [70]

Information Fusion , volume=

Arrieta, Alejandro Barredo and D. Information Fusion , volume=. 2020 , publisher=

2020

[71] [71]

International conference on machine learning , pages=

Axiomatic attribution for deep networks , author=. International conference on machine learning , pages=. 2017 , organization=

2017

[72] [72]

arXiv preprint arXiv:1705.05598 , year=

Learning how to explain neural networks: Patternnet and patternattribution , author=. arXiv preprint arXiv:1705.05598 , year=

Pith/arXiv arXiv

[73] [73]

Selvaraju, Ramprasaath R and Das, Abhishek and Vedantam, Ramakrishna and Cogswell, Michael and Parikh, Devi and Batra, Dhruv , journal=

[74] [74]

Towards Automating Model Explanations with Certified Robustness Guarantees , author=

[75] [75]

Advances in Neural Information Processing Systems , volume=

Towards robust interpretability with self-explaining neural networks , author=. Advances in Neural Information Processing Systems , volume=

[76] [76]

Interpretability beyond classification output:

Losch, Max and Fritz, Mario and Schiele, Bernt , journal=. Interpretability beyond classification output:

[77] [77]

Advances in Neural Information Processing Systems , volume=

Generative causal explanations of black-box classifiers , author=. Advances in Neural Information Processing Systems , volume=

[78] [78]

arXiv preprint arXiv:2201.00572 , year=

Concept Embeddings for Fuzzy Logic Verification of Deep Neural Networks in Perception Tasks , author=. arXiv preprint arXiv:2201.00572 , year=

arXiv

[79] [79]

arXiv preprint arXiv:2007.07375 , year=

Concept learners for few-shot learning , author=. arXiv preprint arXiv:2007.07375 , year=

arXiv 2007

[80] [80]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Learning compositional representations for few-shot recognition , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=