Prior Activation Distribution (PAD): A Versatile Representation to Utilize DNN Hidden Units

Archan Misra; Lakmal Meegahapola; Lance Kaplan; Vengateswaran Subramaniam

arxiv: 1907.02711 · v1 · pith:IJVUI7F2new · submitted 2019-07-05 · 💻 cs.CV · cs.LG

Prior Activation Distribution (PAD): A Versatile Representation to Utilize DNN Hidden Units

Lakmal Meegahapola , Vengateswaran Subramaniam , Lance Kaplan , Archan Misra This is my paper

Pith reviewed 2026-05-25 02:41 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords deep neural networkshidden layer activationsprior activation distributionuncertainty estimationout-of-distribution detectionactivation patternsclassification tasks

0 comments

The pith

Hidden layer activations in deep neural networks exhibit class-specific distributional properties usable for uncertainty estimation and out-of-distribution detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces Prior Activation Distribution (PAD) to capture typical activation patterns of hidden layer units in DNNs for classification. The authors show that combined activations have class-specific properties and define statistical measures for how much a test sample deviates from these distributions. These PAD measures enable fine-grained uncertainty estimates, competitive inference accuracy without the full pipeline, and reliable isolation of out-of-distribution samples, all independent of training technique. A sympathetic reader would care because this provides a way to utilize internal representations for practical tasks like uncertainty and anomaly detection without additional model training or full computation.

Core claim

The paper claims that the combined neural activations of a hidden layer have class-specific distributional properties. It defines multiple statistical measures to compute how far a test sample's activations deviate from such distributions. Using benchmark datasets, it demonstrates PAD-based measures for uncertainty estimates, competitive inferencing accuracy, and out-of-distribution isolation, independent of any training technique.

What carries the argument

Prior Activation Distribution (PAD), a representation of typical hidden-layer activation patterns that supports statistical deviation measures for test samples.

If this is right

PAD-based measures derive fine-grained uncertainty estimates for inferences.
They provide inferencing accuracy competitive with alternatives that require execution of the full pipeline.
They reliably isolate out-of-distribution test samples.
These capabilities hold independent of any training technique.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

PAD deviation could support early or partial network evaluation for faster decisions in constrained environments.
The approach might extend to non-classification tasks such as regression by adapting the distributional measures.
Integrating PAD with ensemble methods could further refine uncertainty without extra training passes.

Load-bearing premise

The combined activations across hidden layer units exhibit class-specific distributional properties that can be reliably captured by statistical deviation measures from a prior distribution.

What would settle it

A demonstration that PAD deviation scores show no correlation with actual classification errors or fail to separate in-distribution from out-of-distribution samples on datasets such as MNIST or CIFAR10 would disprove the utility claims.

Figures

Figures reproduced from arXiv: 1907.02711 by Archan Misra, Lakmal Meegahapola, Lance Kaplan, Vengateswaran Subramaniam.

**Figure 4.** Figure 4: Example from Modified-MNIST dataset [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Softmax (MA1 model): Rotational MNIST [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: MNIST (MA3): Coverage vs. Accuracy [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

read the original abstract

In this paper, we introduce the concept of Prior Activation Distribution (PAD) as a versatile and general technique to capture the typical activation patterns of hidden layer units of a Deep Neural Network used for classification tasks. We show that the combined neural activations of such a hidden layer have class-specific distributional properties, and then define multiple statistical measures to compute how far a test sample's activations deviate from such distributions. Using a variety of benchmark datasets (including MNIST, CIFAR10, Fashion-MNIST & notMNIST), we show how such PAD-based measures can be used, independent of any training technique, to (a) derive fine-grained uncertainty estimates for inferences; (b) provide inferencing accuracy competitive with alternatives that require execution of the full pipeline, and (c) reliably isolate out-of-distribution test samples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PAD offers a post-training activation deviation approach for uncertainty and OOD but the abstract leaves the core method too underspecified to judge if the class-specific claims actually deliver.

read the letter

The punchline is that this paper puts forward PAD as a training-independent way to get uncertainty estimates, competitive inference accuracy, and OOD detection from hidden layer activation deviations. If the method holds up, it could be a handy add-on for existing DNNs. What is new is treating the combined activations of a hidden layer as having class-specific distributional properties and then using statistical deviation measures from per-class priors. The paper does well by demonstrating this on multiple benchmarks including MNIST, CIFAR10, Fashion-MNIST, and notMNIST, covering the three claimed uses. The soft spots are that the abstract provides no equations, no description of how the priors are estimated, and no details on the statistical measures or any ablations. This makes it tough to assess whether the approach really separates classes better than chance in high-dimensional spaces or if it just reduces to known techniques. The concern about inter-class overlaps is reasonable here, as no evidence of separation strength is given in the summary. Overall, the central argument might hold if the full paper includes solid controls and comparisons, but based on what's visible the evidence is thin. This paper is aimed at the reliable machine learning community, particularly those interested in post-hoc methods for uncertainty and OOD. A reader focused on practical tools for DNN reliability would get some value from the benchmark results. It deserves a serious referee because the idea is simple enough to evaluate and the applications are timely, even if it needs more detail to be convincing. I would recommend engaging with it in peer review to see the full method and experiments.

Referee Report

2 major / 0 minor

Summary. The paper introduces Prior Activation Distribution (PAD) as a general technique to capture typical activation patterns of hidden layer units in DNNs for classification. It claims that the combined activations of a hidden layer exhibit class-specific distributional properties, defines statistical measures of deviation from per-class priors, and shows that these measures (independent of training method) can derive fine-grained uncertainty estimates, yield inference accuracy competitive with full-pipeline methods, and isolate out-of-distribution samples. Experiments are reported on MNIST, CIFAR10, Fashion-MNIST and notMNIST.

Significance. If the central claims hold with proper quantification, PAD would supply a training-independent, post-hoc representation for uncertainty and OOD tasks that could be applied to existing models. The multi-benchmark evaluation is a positive element. However, the absence of any reported separation metrics, baseline comparisons, or error bars in the provided information makes the practical significance difficult to gauge.

major comments (2)

[Abstract] Abstract: the load-bearing premise that 'the combined neural activations of such a hidden layer have class-specific distributional properties' is asserted without any quantitative support (pairwise distances, classification accuracy of a model using only the deviation statistics, or comparison to softmax entropy). This directly undermines the three downstream claims (a)–(c).
[Abstract] Abstract: no description is given of how the per-class prior is estimated, which statistical deviation measures are used, or any controls for layer choice and measure selection. Without these, it is impossible to assess whether the measures can overcome high inter-class overlap in activation space or whether results are post-hoc.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed comments. The two major points both concern the abstract; we agree that it can be strengthened for self-containment and will revise it. The full manuscript already contains the requested methodological details and empirical demonstrations.

read point-by-point responses

Referee: [Abstract] Abstract: the load-bearing premise that 'the combined neural activations of such a hidden layer have class-specific distributional properties' is asserted without any quantitative support (pairwise distances, classification accuracy of a model using only the deviation statistics, or comparison to softmax entropy). This directly undermines the three downstream claims (a)–(c).

Authors: The manuscript demonstrates the class-specific distributional properties empirically via the three downstream tasks (uncertainty, competitive inference accuracy, and OOD isolation) on four image benchmarks. We acknowledge that the abstract itself supplies no direct quantitative support such as pairwise distances or a standalone classifier on deviation statistics. We will revise the abstract to include a concise statement of the supporting experimental outcomes. revision: yes
Referee: [Abstract] Abstract: no description is given of how the per-class prior is estimated, which statistical deviation measures are used, or any controls for layer choice and measure selection. Without these, it is impossible to assess whether the measures can overcome high inter-class overlap in activation space or whether results are post-hoc.

Authors: The methods and experimental sections of the manuscript specify the per-class prior estimation procedure, the statistical deviation measures employed, and the layer/measure selection protocol. We agree the abstract should be self-contained on these points and will add a brief description of the estimation and measures used. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation is self-contained

full rationale

The provided abstract and description introduce PAD as a new representation, assert class-specific distributional properties of combined hidden activations, and define statistical deviation measures without any equations, self-referential definitions, fitted parameters renamed as predictions, or load-bearing self-citations. Claims rest on empirical evaluation across benchmark datasets rather than reducing to inputs by construction. No steps match the enumerated circularity patterns, so the derivation chain has no detectable circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that hidden layer activations possess class-specific distributional properties; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Combined neural activations of a hidden layer have class-specific distributional properties.
Directly stated in abstract as the foundation for defining PAD and deviation measures.

pith-pipeline@v0.9.0 · 5679 in / 1136 out tokens · 20836 ms · 2026-05-25T02:41:42.013464+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 6 canonical work pages · 3 internal anchors

[1]

Guillaume Alain and Yoshua Bengio. 2017. Understanding intermediate layers using linear classiﬁer probes. ICLR (Workshop) (2017)

2017
[2]

Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2017. Structured Pruning of Deep Con- volutional Neural Networks. ACM Journal on Emerging Technologies in Computing Systems (JETC) (2017)

2017
[3]

David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Network Dissection: Quantifying Interpretability of Deep Visual Representations. CVPR (2017)

2017
[4]

Yaroslav Bulatov. 2011. notMNIST dataset. (2011). http://yaroslavvb.blogspot.com/ 2011/09/notmnist-dataset.html

2011
[5]

Yen Pradeep Ravikumar Chih-Kuan Yeh, Joon Sik Kim

Ian E.H. Yen Pradeep Ravikumar Chih-Kuan Yeh, Joon Sik Kim. 2018. Representer Point Selection for Explaining Deep Neural Networks. NIPS (2018)

2018
[6]

François Chollet et al. 2015. Keras. https://github.com/fchollet/keras. (2015)

2015
[7]

François Chollet et al. 2019. CIFAR10 Sample Code - Keras Code Examples GitHub Repository. (2019). https://github.com/keras-team/keras/blob/master/examples/cifar10_ cnn.py

2019
[8]

François Chollet et al. 2019. MNIST Sample Code - Keras Code Examples GitHub Repos- itory. (2019). https://github.com/keras-team/keras/blob/master/examples/ mnist_cnn.py

2019
[9]

Yann Duan, Xi Chen, Rein Houthooft, John Schulman, and Peiter Abbeel. 2016. Benchmarking Deep Reinforcement Learning for Continuous Control. In 33 rd International Conference on Machine Learning (ICML)

2016
[10]

Martín Abadi et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http://tensorflow.org/ Software available from tensorﬂow.org

2015
[11]

Yarin Gal. 2016. Uncertainty in Deep Learning. PhD Thesis (2016)

2016
[12]

Yarin Gal and Zoubin Ghahramani. 2016. Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. ICLR (2016)

2016
[13]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. International Conference on Machine Learning (ICML) (2016)

2016
[14]

Shortliffe

Jean Gordon and Edward H. Shortliffe. 1984. The Dempster-Shafer Theory of Evidence. Rule-Based Expert Systems: The MYCIN (1984)

1984
[15]

A Graves. 2011. Practical variational inference for neural networks. NIPS (2011). 9

2011
[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[17]

J M Hernandez-Lobato and R P Adams. 2015. Probabilistic backpropagation for scalable learning of bayesian neural networks. ICML (2015)

2015
[18]

S Herzog and D. Ostwald. 2013. Experimental biology: Sometimes Bayesian statistics are better. Nature 494 (2013)

2013
[19]

H.N.Io and C.B.Lee. 2017. Chatbots and conversational agents: A bibliometric analysis. International Conference on Industrial Engineering and Engineering Management (IEEM) (2017), 215–219. https://doi.org/10.1109/IEEM.2017.8289883

work page doi:10.1109/ieem.2017.8289883 2017
[20]

J. D. Hunter. 2007. Matplotlib: A 2D graphics environment. Computing In Science & Engineer- ing 9, 3 (2007), 90–95. https://doi.org/10.1109/MCSE.2007.55

work page doi:10.1109/mcse.2007.55 2007
[21]

Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. (2009). https://www.cs.toronto.edu/~kriz/cifar.html

2009
[22]

Abhijeet Kumar. 2018. Achieving 90% accuracy in Object Recognition Task on CIFAR-10 Dataset with Keras: Convolutional Neural Networks. Applied Machine Learning Blog (2018). http://tiny.cc/c4os6y

2018
[23]

AiOTA LABS. 2019. Quantifying Accuracy and SoftMax Prediction Conﬁdence For Making Safe and Reliable Deep Neural Network Based AI System. UseJournal (2019)

2019
[24]

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and Scal- able Predictive Uncertainty Estimation using Deep Ensembles. 31st Conference on Neural Information Processing Systems (NIPS) (2017)

2017
[25]

Denker, and Sara A

Yann LeCun, John S. Denker, and Sara A. Solla. 1990. Optimal Brain Damage. In Advances in Neural Information Processing Systems 2, D. S. Touretzky (Ed.). Morgan-Kaufmann, 598–605. http://papers.nips.cc/paper/250-optimal-brain-damage.pdf

1990
[26]

Yann Lecunn, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning. Nature 521 (2015), 436–444

2015
[27]

LeCunn, L

Y . LeCunn, L. Bottou, Y . Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE (1998)

1998
[28]

van der Laak, Bram van Ginneken, and Clara I

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A.W.M. van der Laak, Bram van Ginneken, and Clara I. Sánchez. 2017. A survey on deep learning in medical image analysis. Medical Image Analysis 42 (2017), 60 – 88. https://doi.org/10.1016/j.media.2017.07.005

work page doi:10.1016/j.media.2017.07.005 2017
[29]

Louizos and M

C. Louizos and M. Welling. 2017. Multiplicative normalizing ﬂows for variational bayesian neural networks. ICML (2017)

2017
[30]

David JC MacKay. 1992. A practical Bayesian framework for backpropagation networks. Neural computation 4(3) (1992), 448–472

1992
[31]

David JC MacKay. 1995. Probable networks and plausible predictions-a review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems 6(3) (1995), 469–505

1995
[32]

Margaret Maynard-Reid. 2018. Fashion-MNIST with tf.Keras. (2018)

2018
[33]

Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning Convolutional Neural Networks for Resource Efﬁcient Inference.ICLR (2017)

2017
[34]

R M. Neal. 1995. Bayesian learning for neural networks. PhD thesis, University of Toronto (1995). 10

1995
[35]

Anh Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep Neural Networks are Easily Fooled: High Conﬁdence Predictions for Unrecognizable Images. Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

2015
[36]

Regina Nuzzo. 2013. Statistical Errors. Nature 506(13) (2013), 150–152

2013
[37]

Osband, J

I. Osband, J. Aslanides, and A. Cassirer. 2018. Randomized Prior Functions for Deep Re- inforcement Learning. 32nd Conference on Neural Information Processing Systems (NIPS) (2018)

2018
[38]

Manajit Pal. 2019. Deep Learning for Self-Driving Cars. Towards Data Science(2019)

2019
[39]

Larrel Pinto and Abhinav Gupta. 2016. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours. IEEE International Conference on Robotics and Automation (ICRA), 3406–3413

2016
[40]

Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. 2017. SVCCA: Sin- gular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. Advances in neural information processing systems(NIPS) (2017)

2017
[41]

Ritter, A

H. Ritter, A. Botev, and D. Barber. 2018. A scalable laplace approximation for Neural Networks. ICLR (2018)

2018
[42]

Murat Sensoy, Lance Kaplan, and Melih Kandemir. 2018. Evidential Deep Learning to Quan- tify Classiﬁcation Uncertainty. 32nd Conference on Neural Information Processing Systems (NeurIPS) (2018)

2018
[43]

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2015. Striving for Simplicity: The All Convolutional Net. ICLR Workshop (2015)

2015
[44]

Srivastava, G

N. Srivastava, G. Hinton, A. rizhevsky, I. Sutskever, and R. Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overﬁtting. The Journal of Machine Learning Research (2014)

2014
[45]

Mattias Teye, Hossein Azizpour, and Kevin Smith. 2018. Bayesian Uncertainty Estimation for Batch Normalized Deep Networks. International Conference on Machine Learning (ICML) (2018)

2018
[46]

van Rossum

G. van Rossum. 1995. Python tutorial. Technical Report CS-R9526. Centrum voor Wiskunde en Informatica (CWI), Amsterdam. Software available from python.org

1995
[47]

C. K. I. Williams. 1997. Computing with inﬁnite networks. NIPS (1997)

1997
[48]

Xiaolin Hu Jian Yang Xiang Li, Shuo Chen. 2018. Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift. arXiv:1801.05134 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[49]

Han Xiao, Kashif Rasul, and Roland V ollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. (2017). arXiv:cs.LG/1708.07747

work page internal anchor Pith review Pith/arXiv arXiv 2017
[50]

Shuochao Yao, Yiran Zhao, Aston Zhang, Lu Su, and Tarek Abdelzaher. 2017. DeepIoT: Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework. SenSys (2017)

2017
[51]

Yosinski, J

J. Yosinski, J. Clune, Y . Bengio, and H. Lipson. 2014. How transferable are features in deep neural networks? Advances in neural information processing systems(NIPS) (2014), 3320–3328

2014
[52]

M. D. Zeiler and R Fergus. 2014. Visualizing and understanding convolutional networks. European conference on computer vision (ECCV) (2014), 818–833. 11

2014

[1] [1]

Guillaume Alain and Yoshua Bengio. 2017. Understanding intermediate layers using linear classiﬁer probes. ICLR (Workshop) (2017)

2017

[2] [2]

Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2017. Structured Pruning of Deep Con- volutional Neural Networks. ACM Journal on Emerging Technologies in Computing Systems (JETC) (2017)

2017

[3] [3]

David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Network Dissection: Quantifying Interpretability of Deep Visual Representations. CVPR (2017)

2017

[4] [4]

Yaroslav Bulatov. 2011. notMNIST dataset. (2011). http://yaroslavvb.blogspot.com/ 2011/09/notmnist-dataset.html

2011

[5] [5]

Yen Pradeep Ravikumar Chih-Kuan Yeh, Joon Sik Kim

Ian E.H. Yen Pradeep Ravikumar Chih-Kuan Yeh, Joon Sik Kim. 2018. Representer Point Selection for Explaining Deep Neural Networks. NIPS (2018)

2018

[6] [6]

François Chollet et al. 2015. Keras. https://github.com/fchollet/keras. (2015)

2015

[7] [7]

François Chollet et al. 2019. CIFAR10 Sample Code - Keras Code Examples GitHub Repository. (2019). https://github.com/keras-team/keras/blob/master/examples/cifar10_ cnn.py

2019

[8] [8]

François Chollet et al. 2019. MNIST Sample Code - Keras Code Examples GitHub Repos- itory. (2019). https://github.com/keras-team/keras/blob/master/examples/ mnist_cnn.py

2019

[9] [9]

Yann Duan, Xi Chen, Rein Houthooft, John Schulman, and Peiter Abbeel. 2016. Benchmarking Deep Reinforcement Learning for Continuous Control. In 33 rd International Conference on Machine Learning (ICML)

2016

[10] [10]

Martín Abadi et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http://tensorflow.org/ Software available from tensorﬂow.org

2015

[11] [11]

Yarin Gal. 2016. Uncertainty in Deep Learning. PhD Thesis (2016)

2016

[12] [12]

Yarin Gal and Zoubin Ghahramani. 2016. Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference. ICLR (2016)

2016

[13] [13]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. International Conference on Machine Learning (ICML) (2016)

2016

[14] [14]

Shortliffe

Jean Gordon and Edward H. Shortliffe. 1984. The Dempster-Shafer Theory of Evidence. Rule-Based Expert Systems: The MYCIN (1984)

1984

[15] [15]

A Graves. 2011. Practical variational inference for neural networks. NIPS (2011). 9

2011

[16] [16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[17] [17]

J M Hernandez-Lobato and R P Adams. 2015. Probabilistic backpropagation for scalable learning of bayesian neural networks. ICML (2015)

2015

[18] [18]

S Herzog and D. Ostwald. 2013. Experimental biology: Sometimes Bayesian statistics are better. Nature 494 (2013)

2013

[19] [19]

H.N.Io and C.B.Lee. 2017. Chatbots and conversational agents: A bibliometric analysis. International Conference on Industrial Engineering and Engineering Management (IEEM) (2017), 215–219. https://doi.org/10.1109/IEEM.2017.8289883

work page doi:10.1109/ieem.2017.8289883 2017

[20] [20]

J. D. Hunter. 2007. Matplotlib: A 2D graphics environment. Computing In Science & Engineer- ing 9, 3 (2007), 90–95. https://doi.org/10.1109/MCSE.2007.55

work page doi:10.1109/mcse.2007.55 2007

[21] [21]

Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. (2009). https://www.cs.toronto.edu/~kriz/cifar.html

2009

[22] [22]

Abhijeet Kumar. 2018. Achieving 90% accuracy in Object Recognition Task on CIFAR-10 Dataset with Keras: Convolutional Neural Networks. Applied Machine Learning Blog (2018). http://tiny.cc/c4os6y

2018

[23] [23]

AiOTA LABS. 2019. Quantifying Accuracy and SoftMax Prediction Conﬁdence For Making Safe and Reliable Deep Neural Network Based AI System. UseJournal (2019)

2019

[24] [24]

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and Scal- able Predictive Uncertainty Estimation using Deep Ensembles. 31st Conference on Neural Information Processing Systems (NIPS) (2017)

2017

[25] [25]

Denker, and Sara A

Yann LeCun, John S. Denker, and Sara A. Solla. 1990. Optimal Brain Damage. In Advances in Neural Information Processing Systems 2, D. S. Touretzky (Ed.). Morgan-Kaufmann, 598–605. http://papers.nips.cc/paper/250-optimal-brain-damage.pdf

1990

[26] [26]

Yann Lecunn, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning. Nature 521 (2015), 436–444

2015

[27] [27]

LeCunn, L

Y . LeCunn, L. Bottou, Y . Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE (1998)

1998

[28] [28]

van der Laak, Bram van Ginneken, and Clara I

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A.W.M. van der Laak, Bram van Ginneken, and Clara I. Sánchez. 2017. A survey on deep learning in medical image analysis. Medical Image Analysis 42 (2017), 60 – 88. https://doi.org/10.1016/j.media.2017.07.005

work page doi:10.1016/j.media.2017.07.005 2017

[29] [29]

Louizos and M

C. Louizos and M. Welling. 2017. Multiplicative normalizing ﬂows for variational bayesian neural networks. ICML (2017)

2017

[30] [30]

David JC MacKay. 1992. A practical Bayesian framework for backpropagation networks. Neural computation 4(3) (1992), 448–472

1992

[31] [31]

David JC MacKay. 1995. Probable networks and plausible predictions-a review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems 6(3) (1995), 469–505

1995

[32] [32]

Margaret Maynard-Reid. 2018. Fashion-MNIST with tf.Keras. (2018)

2018

[33] [33]

Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning Convolutional Neural Networks for Resource Efﬁcient Inference.ICLR (2017)

2017

[34] [34]

R M. Neal. 1995. Bayesian learning for neural networks. PhD thesis, University of Toronto (1995). 10

1995

[35] [35]

Anh Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep Neural Networks are Easily Fooled: High Conﬁdence Predictions for Unrecognizable Images. Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

2015

[36] [36]

Regina Nuzzo. 2013. Statistical Errors. Nature 506(13) (2013), 150–152

2013

[37] [37]

Osband, J

I. Osband, J. Aslanides, and A. Cassirer. 2018. Randomized Prior Functions for Deep Re- inforcement Learning. 32nd Conference on Neural Information Processing Systems (NIPS) (2018)

2018

[38] [38]

Manajit Pal. 2019. Deep Learning for Self-Driving Cars. Towards Data Science(2019)

2019

[39] [39]

Larrel Pinto and Abhinav Gupta. 2016. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours. IEEE International Conference on Robotics and Automation (ICRA), 3406–3413

2016

[40] [40]

Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. 2017. SVCCA: Sin- gular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. Advances in neural information processing systems(NIPS) (2017)

2017

[41] [41]

Ritter, A

H. Ritter, A. Botev, and D. Barber. 2018. A scalable laplace approximation for Neural Networks. ICLR (2018)

2018

[42] [42]

Murat Sensoy, Lance Kaplan, and Melih Kandemir. 2018. Evidential Deep Learning to Quan- tify Classiﬁcation Uncertainty. 32nd Conference on Neural Information Processing Systems (NeurIPS) (2018)

2018

[43] [43]

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2015. Striving for Simplicity: The All Convolutional Net. ICLR Workshop (2015)

2015

[44] [44]

Srivastava, G

N. Srivastava, G. Hinton, A. rizhevsky, I. Sutskever, and R. Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overﬁtting. The Journal of Machine Learning Research (2014)

2014

[45] [45]

Mattias Teye, Hossein Azizpour, and Kevin Smith. 2018. Bayesian Uncertainty Estimation for Batch Normalized Deep Networks. International Conference on Machine Learning (ICML) (2018)

2018

[46] [46]

van Rossum

G. van Rossum. 1995. Python tutorial. Technical Report CS-R9526. Centrum voor Wiskunde en Informatica (CWI), Amsterdam. Software available from python.org

1995

[47] [47]

C. K. I. Williams. 1997. Computing with inﬁnite networks. NIPS (1997)

1997

[48] [48]

Xiaolin Hu Jian Yang Xiang Li, Shuo Chen. 2018. Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift. arXiv:1801.05134 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[49] [49]

Han Xiao, Kashif Rasul, and Roland V ollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. (2017). arXiv:cs.LG/1708.07747

work page internal anchor Pith review Pith/arXiv arXiv 2017

[50] [50]

Shuochao Yao, Yiran Zhao, Aston Zhang, Lu Su, and Tarek Abdelzaher. 2017. DeepIoT: Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework. SenSys (2017)

2017

[51] [51]

Yosinski, J

J. Yosinski, J. Clune, Y . Bengio, and H. Lipson. 2014. How transferable are features in deep neural networks? Advances in neural information processing systems(NIPS) (2014), 3320–3328

2014

[52] [52]

M. D. Zeiler and R Fergus. 2014. Visualizing and understanding convolutional networks. European conference on computer vision (ECCV) (2014), 818–833. 11

2014