pith. sign in

arxiv: 2605.21313 · v1 · pith:HBX2JJ6Wnew · submitted 2026-05-20 · 💻 cs.LG

A New Framework to Analyse the Distributional Robustness of Deep Neural Networks

Pith reviewed 2026-05-21 05:48 UTC · model grok-4.3

classification 💻 cs.LG
keywords distributional robustnessdeep neural networksBernoulli distributionsclass separationmemorizationdistribution shiftsmodel diagnosticsrepresentation structure
0
0 comments X

The pith

A framework modeling weight-activation interactions with Bernoulli distributions distinguishes memorized networks from those that generalize and detects robustness loss under shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a framework to quantify distributional robustness in deep neural networks by analyzing interactions between layer weights and activations. These interactions are modeled using Bernoulli distributions, with class separation serving as a proxy for robustness. Experiments on CIFAR-10 and ImageNet demonstrate that the resulting metrics can tell apart networks that have memorized their training data from those that have not. The same approach applied directly to activations does not produce comparable distinctions. Distribution shifts are shown to decrease the separation measured by these path-based diagnostics.

Core claim

By modeling the interactions between weights and activations in each layer as Bernoulli distributions and measuring the separation between classes under this model, the framework provides diagnostics for whether a network has learned robust representations or merely memorized its training set. Networks without memorization show greater class separation, and this separation decreases when the input distribution shifts.

What carries the argument

The path-based diagnostic that models weight-activation interactions via Bernoulli distributions and uses class separation as a robustness proxy. It quantifies representation structure by tracking how these probabilistic interactions distinguish classes.

If this is right

  • Metrics derived from this framework can identify networks that have memorized training data versus those that generalize.
  • Distribution shifts lead to reduced class separation under the proposed diagnostics.
  • Analysis in activation space alone does not yield the same ability to distinguish memorization.
  • The framework supplies model-level diagnostics focused on how weights interact with activations to structure representations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Training procedures could be modified to optimize for higher class separation scores to encourage better robustness.
  • The framework might extend to other data modalities or network types beyond image classification.
  • If the Bernoulli modeling captures key interactions, it could inspire new ways to regularize networks against overfitting.

Load-bearing premise

Modeling weight-activation interactions with Bernoulli distributions and using class separation as a proxy accurately reflects the distributional robustness of the network.

What would settle it

A clear counterexample would be a network that generalizes well to new distributions but scores low on these separation metrics, or a memorized network that scores high.

Figures

Figures reproduced from arXiv: 2605.21313 by Divij Khaitan, Subhashis Banerjee.

Figure 1
Figure 1. Figure 1: Overview of the proposed framework. (a) Two consecutive layers of a neural network; line thickness encodes the magnitude of each weight–activation product wij · aj . (b) The extracted interaction matrix N = W ·diag(a), whose entries capture per-connection contributions. (c) Different classes produce different N matrices, reflecting class-specific activation paths through the network. where wij represents t… view at source ↗
Figure 2
Figure 2. Figure 2: Pairwise KL Divergence Heatmaps for InceptionV3 on ImageNet. Top row contains the [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Activation Sparsity Histograms for InceptionV3 on ImageNet. The x-axis represents the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Pairwise KL Divergence Heatmaps for InceptionV3 on ImageNet-R. The separation betwen [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Activation Sparsity Histograms for InceptionV3 on ImageNet-r for the class ’king penguin’ [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: KL divergence for class distributions between in-distribution and OOD data for InceptionV3. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example of in-distribution and out-of-distribution images for the class n02056570 (penguin) [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Example of in-distribution and out-of-distribution images for the class n07697313 (burger) [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Pairwise Energy Distance Heatmaps for InceptionV3 on ImageNet-R. The random labels [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Pairwise Energy Distance Heatmaps for ResNet-50 and ViT-B/32 on ImageNet-R. The [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Heatmaps of the pairwise KL divergence between softmaxed prototype neuron-weight [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Heatmaps of the pairwise KL divergence between softmaxed prototype neuron-weight [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Pairwise KL Divergence Heatmaps for Modified Alexnet on CIFAR10 [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Unnormalised Pairwise KL Divergence Heatmaps for Modified Alexnet on CIFAR10 [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Activation Sparsity Histograms for Alexnet on CIFAR10 for the class ’airplane’ [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Activation Sparsity Histograms for Alexnet on CIFAR10 for the class ’automobile’ [PITH_FULL_IMAGE:figures/full_fig_p018_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Pairwise KL Divergence Heatmaps for Modified Alexnet on Tiny ImageNet with CIFAR10 [PITH_FULL_IMAGE:figures/full_fig_p019_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Pairwise KL divergence heatmaps for Alexnet on CIFAR10 at the second-to-last layer [PITH_FULL_IMAGE:figures/full_fig_p019_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Unnormalised Pairwise KL divergence heatmaps for Alexnet on CIFAR10 at the second [PITH_FULL_IMAGE:figures/full_fig_p020_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: KL divergence for class distributions between in-distribution and OOD data for Small [PITH_FULL_IMAGE:figures/full_fig_p020_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Pairwise KL Divergence Heatmaps for ResNet50 on ImageNet [PITH_FULL_IMAGE:figures/full_fig_p021_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Pairwise KL Divergence Heatmaps for ViT/B-32 on ImageNet [PITH_FULL_IMAGE:figures/full_fig_p021_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Activation Sparsity Histograms for InceptionV3 on ImageNet-r for the class ’cheeseburger’ [PITH_FULL_IMAGE:figures/full_fig_p022_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Activation Sparsity Histograms for Resnet50 on ImageNet for the class ’king penguin’ [PITH_FULL_IMAGE:figures/full_fig_p022_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Activation Sparsity Histograms for Resnet50 on ImageNet for the class ’cheeseburger’ [PITH_FULL_IMAGE:figures/full_fig_p022_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Activation Sparsity Histograms for Resnet50 on ImageNet for the class king penguin [PITH_FULL_IMAGE:figures/full_fig_p022_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Activation Sparsity Histograms for Resnet50 on ImageNet for the class ’cheeseburger’ [PITH_FULL_IMAGE:figures/full_fig_p023_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Activation Sparsity Histograms for ViT-B/32 ImageNet for the class ’king penguin’ [PITH_FULL_IMAGE:figures/full_fig_p023_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Activation Sparsity Histograms for ViT-B/32 ImageNet for the class ’cheeseburger’ [PITH_FULL_IMAGE:figures/full_fig_p023_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Activation Sparsity Histograms for ViT-B/32 ImageNet for the class king penguin [PITH_FULL_IMAGE:figures/full_fig_p023_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: Activation Sparsity Histograms for ViT-B/32 ImageNet for the class ’cheeseburger’ [PITH_FULL_IMAGE:figures/full_fig_p024_31.png] view at source ↗
read the original abstract

Deep neural networks have achieved impressive performance on a variety of tasks, but their brittleness to distributional shifts remains a significant barrier to real-world deployment. In this paper, we propose a framework to analyse and quantify the distributional robustness of neural networks by studying the interactions between layer weights and activations. We model these interactions using Bernoulli distributions, using the separation between classes as a diagnostic proxy for robustness. We demonstrate the usefulness of this framework through models trained on CIFAR-10 and ImageNet. We show that our proposed metrics can distinguish between networks that have memorised their training data and those that have not. We also perform analogous experiments in the activation space and find that the same properties do not hold up. Additionally, we investigate the behaviour of our metrics under various distribution shifts and show that these shifts reduce separation under our path-based diagnostics. Our results suggest that this framework provides useful model-level diagnostics of representation structure and robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a framework to analyze the distributional robustness of deep neural networks by modeling weight-activation interactions via Bernoulli distributions and treating class separation as a diagnostic proxy for robustness. Experiments on CIFAR-10 and ImageNet are used to show that the resulting path-based metrics distinguish networks that have memorized their training data from those that have not; analogous checks in activation space fail to replicate the property, and various distribution shifts are shown to reduce separation under the diagnostics.

Significance. If validated, the framework could supply a practical model-level diagnostic for representation structure that does not require exhaustive shift testing. The explicit weight-space versus activation-space contrast and the use of memorization as a controlled test case are constructive elements that help isolate the contribution of the Bernoulli construction.

major comments (2)
  1. [Experiments on CIFAR-10 and ImageNet] The central claim that the metrics isolate distributional robustness (rather than merely detecting memorization correlates) rests on the experiments distinguishing memorized from non-memorized networks. However, the manuscript does not report whether the metric correlates with or predicts actual robustness measures such as accuracy under ImageNet-C corruptions or adversarial perturbations; without such evidence the Bernoulli separation could simply reflect reduced class-conditional weight structure induced by label noise.
  2. [Results under distribution shifts] The abstract and results sections state that distribution shifts reduce separation, yet no quantitative tables or correlation coefficients are supplied linking the path-based metric values to downstream robustness performance. This gap is load-bearing because the diagnostic utility of the framework depends on demonstrating that lower separation anticipates degraded behavior under shifts.
minor comments (2)
  1. The abstract would be strengthened by including one or two key quantitative results (e.g., separation values or statistical significance) rather than qualitative statements alone.
  2. [Methods] Notation for the Bernoulli parameters (p, success probability, etc.) should be introduced once in the methods and used consistently thereafter to avoid reader confusion when moving between weight-space and path-based formulations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below, indicating where revisions will be made to strengthen the presentation of the framework's diagnostic value.

read point-by-point responses
  1. Referee: [Experiments on CIFAR-10 and ImageNet] The central claim that the metrics isolate distributional robustness (rather than merely detecting memorization correlates) rests on the experiments distinguishing memorized from non-memorized networks. However, the manuscript does not report whether the metric correlates with or predicts actual robustness measures such as accuracy under ImageNet-C corruptions or adversarial perturbations; without such evidence the Bernoulli separation could simply reflect reduced class-conditional weight structure induced by label noise.

    Authors: We appreciate the referee's point that direct validation against explicit robustness benchmarks would further isolate the contribution of the Bernoulli construction. The memorization experiments serve as a controlled test because label noise is known to induce both memorization and degraded robustness to shifts; the fact that path-based metrics detect this while activation-space checks do not provides evidence that the weight-activation modeling captures relevant structure. We agree this does not fully substitute for correlations with ImageNet-C or adversarial accuracy. We will revise the manuscript to include a discussion of this distinction and, where feasible, additional analysis correlating the metrics with such measures. revision: yes

  2. Referee: [Results under distribution shifts] The abstract and results sections state that distribution shifts reduce separation, yet no quantitative tables or correlation coefficients are supplied linking the path-based metric values to downstream robustness performance. This gap is load-bearing because the diagnostic utility of the framework depends on demonstrating that lower separation anticipates degraded behavior under shifts.

    Authors: We acknowledge that the current presentation shows reduced separation under shifts but does not supply explicit quantitative linkages such as correlation coefficients or tables relating metric values to accuracy degradation. We will revise the results section to incorporate such quantitative summaries, including tables of metric values and performance drops under the tested shifts, to more directly support the diagnostic interpretation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; metrics derived from Bernoulli model and empirically tested

full rationale

The paper defines a framework that models weight-activation interactions via Bernoulli distributions and adopts class separation as an explicit diagnostic proxy for distributional robustness. The central metrics are constructed directly from this modeling choice and path-based diagnostics rather than being fitted to memorization labels or robustness outcomes. Experiments on CIFAR-10 and ImageNet then demonstrate that these derived metrics distinguish memorized from non-memorized networks and respond to distribution shifts; this is an empirical observation, not a definitional reduction. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing way that collapses the derivation back to its inputs. The framework remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no specific free parameters, axioms, or invented entities can be identified from the text. The central modeling choice (Bernoulli distributions on weight-activation interactions) and the proxy assumption (class separation measures robustness) are stated at a high level without further decomposition.

pith-pipeline@v0.9.0 · 5686 in / 1136 out tokens · 33378 ms · 2026-05-21T05:48:38.751506+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 2 internal anchors

  1. [1]

    On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.PloS one, 10(7), July 2015

    Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus Robert Müller, and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.PloS one, 10(7), July 2015. Funding Information: This work was supported in part by the Federal Ministry of Economics and Technology of Germany...

  2. [2]

    Neural activation patterns (naps): Visual explainability of learned concepts, 2022

    Alex Bäuerle, Daniel Jönsson, and Timo Ropinski. Neural activation patterns (naps): Visual explainability of learned concepts, 2022

  3. [3]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

  4. [4]

    Prototypical self-explainable models without re-training.Transactions on Machine Learning Research, 2024

    Srishti Gautam, Ahcene Boubekki, Marina MC Höhne, and Michael Kampffmeyer. Prototypical self-explainable models without re-training.Transactions on Machine Learning Research, 2024

  5. [5]

    Höhne, Stine Hansen, Robert Jenssen, and Michael Kampffmeyer

    Srishti Gautam, Marina M.-C. Höhne, Stine Hansen, Robert Jenssen, and Michael Kampffmeyer. This looks more like that: Enhancing self-explaining models by prototypical relevance propaga- tion.Pattern Recognition, 136:109172, 2023

  6. [6]

    Explaining and Harnessing Adversarial Examples

    Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples.CoRR, abs/1412.6572, 2014

  7. [7]

    Spectre: Defending against backdoor attacks using robust statistics

    Jonathan Hayase, Weihao Kong, Raghav Somani, and Sewoong Oh. Spectre: Defending against backdoor attacks using robust statistics. InInternational Conference on Machine Learning, pages 4129–4139. PMLR, 2021

  8. [8]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

  9. [9]

    The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

    Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, and Justin Gilmer. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization . In2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 8320–8329,...

  10. [10]

    A baseline for detecting misclassified and out-of-distribution examples in neural networks

    Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. InInternational Conference on Learning Representations, 2017

  11. [11]

    Summit: Scaling deep learning interpretability by visualizing activation and attribution summarizations.IEEE Transactions on Visualization and Computer Graphics (TVCG), 2020

    Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Chau. Summit: Scaling deep learning interpretability by visualizing activation and attribution summarizations.IEEE Transactions on Visualization and Computer Graphics (TVCG), 2020

  12. [12]

    Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge, 2015

  13. [13]

    A simple unified framework for detect- ing out-of-distribution samples and adversarial attacks

    Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detect- ing out-of-distribution samples and adversarial attacks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Advances in Neural Information Pro- cessing Systems, volume 31. Curran Associates, Inc., 2018

  14. [14]

    Lundberg and Su-In Lee

    Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 4768–4777, Red Hook, NY , USA, 2017. Curran Associates Inc. 10

  15. [15]

    Towards deep learning models resistant to adversarial attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InInternational Conference on Learning Representations, 2018

  16. [16]

    Deepfool: a simple and accurate method to fool deep neural networks

    Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582, 2016

  17. [17]

    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. InNIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011

  18. [18]

    Detection of out-of-distribution samples using binary neuron activation patterns.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3378–3387, 2022

    Bartlomiej Olber, Krystian Radlak, Adam Popowicz, Michal Szczepankiewicz, and Krystian Chachula. Detection of out-of-distribution samples using binary neuron activation patterns.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3378–3387, 2022

  19. [19]

    Detection and mitigation of rare subclasses in deep neural network classifiers

    Colin Paterson, Radu Calinescu, and Chiara Picardi. Detection and mitigation of rare subclasses in deep neural network classifiers. In2021 IEEE International Conference on Artificial Intelli- gence Testing, AITest 2021, Oxford, United Kingdom, August 23-26, 2021, pages 9–16. IEEE, 2021

  20. [20]

    Certified defenses against adversarial examples.arXiv preprint arXiv:1801.09344, 2018

    Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples.arXiv preprint arXiv:1801.09344, 2018

  21. [21]

    Do cifar-10 classifiers generalize to cifar-10?, 2018

    Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do cifar-10 classifiers generalize to cifar-10?, 2018

  22. [22]

    why should I trust you?

    Marco Ribeiro, Sameer Singh, and Carlos Guestrin. “why should I trust you?”: Explaining the predictions of any classifier. In John DeNero, Mark Finlayson, and Sravana Reddy, editors, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pages 97–101, San Diego, California, June 2...

  23. [23]

    Berg, and Li Fei-Fei

    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge.International Journal of Computer Vision (IJCV), 115(3):211–252, 2015

  24. [24]

    Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra

    Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization.International Journal of Computer Vision, 128(2):336–359, October 2019

  25. [25]

    Ronny Huang, Christoph Studer, Soheil Feizi, and Tom Goldstein

    Ali Shafahi, W. Ronny Huang, Christoph Studer, Soheil Feizi, and Tom Goldstein. Are adversarial examples inevitable? InInternational Conference on Learning Representations, 2019

  26. [26]

    Gradient- regularized out-of-distribution detection

    Sina Sharifi, Taha Entesari, Bardia Safaei, Vishal M Patel, and Mahyar Fazlyab. Gradient- regularized out-of-distribution detection. InEuropean Conference on Computer Vision, pages 459–478. Springer, 2024

  27. [27]

    Deep inside convolutional networks: Visualising image classification models and saliency maps, 2014

    Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps, 2014

  28. [28]

    Explaining predictions of deep neural classifier via activation analysis, 2020

    Martin Stano, Wanda Benesova, and Lukas Samuel Martak. Explaining predictions of deep neural classifier via activation analysis, 2020

  29. [29]

    React: Out-of-distribution detection with rectified activations

    Yiyou Sun, Chuan Guo, and Yixuan Li. React: Out-of-distribution detection with rectified activations. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 144–157. Curran Associates, Inc., 2021. 11

  30. [30]

    Rethink- ing the Inception Architecture for Computer Vision

    Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethink- ing the Inception Architecture for Computer Vision . In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826, Los Alamitos, CA, USA, June 2016. IEEE Computer Society

  31. [31]

    Goodfellow, and Rob Fergus

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Yoshua Bengio and Yann LeCun, editors,2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014

  32. [32]

    Spectral signatures in backdoor attacks

    Brandon Tran, Jerry Li, and Aleksander M ˛ adry. Spectral signatures in backdoor attacks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pages 8011–8021, Red Hook, NY , USA, 2018. Curran Associates Inc

  33. [33]

    Understanding deep learning requires rethinking generalization

    Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. InInternational Conference on Learning Representations, 2017. A TinyImageNet to CIFAR10 class mapping Class in CIFAR10 WNIDs from TinyImageNet automobile n03599486, n02814533, n03444034, n03100240 bird n02058221, n02...