Bayesian Neural Networks: An Introduction and Survey

Clinton Fookes; Ethan Goan

arxiv: 2006.12024 · v3 · submitted 2020-06-22 · 📊 stat.ML · cs.LG

Bayesian Neural Networks: An Introduction and Survey

Ethan Goan , Clinton Fookes This is my paper

Pith reviewed 2026-05-24 14:36 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords Bayesian neural networksuncertainty quantificationapproximate inferencevariational inferenceMarkov chain Monte Carlomachine learningdeep learning

0 comments

The pith

Bayesian Neural Networks place probability distributions over weights so predictions include explicit uncertainty estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard neural networks output single predictions without any measure of confidence. Bayesian Neural Networks treat weights as random variables drawn from a posterior distribution, so the model can integrate over many possible parameter settings and return a distribution over outputs. The survey presents the core Bayesian formulation, then examines the main families of approximate inference used to make the approach tractable. By contrasting these approximations the authors show persistent gaps in accuracy, scalability, and calibration. Readers who accept the survey's framing therefore see a clear research agenda for making uncertainty-aware networks practical.

Core claim

Bayesian Neural Networks enable reasoning about uncertainty in predictions by placing distributions over network weights and integrating over the posterior; the paper surveys the principal approximate inference techniques required to implement this idea and uses the comparison to identify concrete limitations that future work must address.

What carries the argument

Approximate posterior inference over network weights, performed via methods such as variational inference and Markov-chain Monte Carlo, to replace point estimates with distributions.

If this is right

Predictions come with calibrated uncertainty intervals rather than point values alone.
Safety-critical systems can flag inputs where the model is uncertain and defer to human review.
Model comparison and hyper-parameter tuning can use marginal likelihoods instead of validation accuracy.
Downstream tasks such as active learning and Bayesian optimisation receive direct uncertainty signals from the network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same posterior-over-weights construction could be applied to recurrent or transformer architectures with only notational changes.
Hybrid schemes that combine one of the surveyed approximations with post-hoc calibration might close some of the identified gaps without new theory.
Deployment on edge devices would require additional compression techniques that preserve the uncertainty properties highlighted in the survey.

Load-bearing premise

That the approximate inference methods reviewed in the survey are representative of the main approaches and limitations in the BNN literature.

What would settle it

A new inference procedure, omitted from the survey, that simultaneously achieves exact posterior computation, linear scaling to modern network sizes, and well-calibrated uncertainty on benchmark tasks would falsify the claimed need for further improvement.

Figures

Figures reproduced from arXiv: 2006.12024 by Clinton Fookes, Ethan Goan.

**Figure 2.** Figure 2: Example of a NN architecture with a single hidden layer for either [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Examples of commonly used activation functions in NNs. The output [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Graphical illustration of how the evidence plays a role in investigating [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Graphical illustration of how the minimisation of the KL divergence [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Illustration of GP prior induced on output when placing a Gaussian [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of BNNs with GP for a regression task over three toy data [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗

**Figure 8.** Figure 8: Examples of difficult to classify images from each class in MNIST. [PITH_FULL_IMAGE:figures/full_fig_p044_8.png] view at source ↗

read the original abstract

Neural Networks (NNs) have provided state-of-the-art results for many challenging machine learning tasks such as detection, regression and classification across the domains of computer vision, speech recognition and natural language processing. Despite their success, they are often implemented in a frequentist scheme, meaning they are unable to reason about uncertainty in their predictions. This article introduces Bayesian Neural Networks (BNNs) and the seminal research regarding their implementation. Different approximate inference methods are compared, and used to highlight where future research can improve on current methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript is an introductory survey on Bayesian Neural Networks (BNNs). It claims that, unlike frequentist neural networks, BNNs enable reasoning about uncertainty in predictions. The paper reviews seminal research on BNN implementation, compares different approximate inference methods, and uses those comparisons to highlight directions for future research.

Significance. As a survey without original derivations or experiments, the manuscript's significance rests on the accuracy and representativeness of its synthesis of prior work. If the coverage of approximate inference methods is balanced and the summaries of key papers are precise, it could serve as a useful entry point for newcomers to uncertainty quantification in neural networks; the standard contrast with frequentist approaches is already established in the literature.

minor comments (3)

The abstract states that frequentist NNs are 'unable to reason about uncertainty,' but the survey should briefly acknowledge post-hoc uncertainty estimation techniques (e.g., MC dropout or ensembles) used in the frequentist setting to sharpen the motivation for BNNs.
Because the paper is positioned as a survey, the comparison of approximate inference methods would benefit from an explicit statement of selection criteria (e.g., which methods were chosen and why) to address the representativeness concern.
The manuscript would be strengthened by including a short table summarizing the reviewed inference methods along dimensions such as scalability, accuracy guarantees, and computational cost.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and recommendation of minor revision. The report provides a fair summary of the manuscript but lists no specific major comments requiring response.

Circularity Check

0 steps flagged

Survey paper introduces no derivations or fitted quantities

full rationale

This is an introductory survey paper whose content consists of descriptive overviews of existing BNN literature, comparisons of established approximate inference methods, and standard contrasts between Bayesian and frequentist neural networks. No new equations, parameters, predictions, or technical derivations are introduced that could reduce to the paper's own inputs by construction. The central framing is a widely accepted distinction in the field and draws on external references without self-referential load-bearing steps. As a result, no circularity patterns (self-definitional, fitted-input-as-prediction, or self-citation chains) are present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper with no new mathematical derivations, fitted parameters, or postulated entities. All content rests on prior literature.

pith-pipeline@v0.9.0 · 5601 in / 859 out tokens · 19832 ms · 2026-05-24T14:36:19.668365+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

placing a distribution over the network parameters... posterior distribution π(ω|D)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Gaussian Process prior over the network output arises when the number of hidden units approaches infinity

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Extracting redshifts from 2D slitless spectroscopic images using deep learning for the CSST galaxy survey
astro-ph.IM 2026-05 unverdicted novelty 6.0

A Bayesian CNN maps 2D slitless spectral images to redshift estimates with NMAD precision 0.0104 for SNR_GI >=1 and better for brighter sources, while remaining robust to wavelength calibration errors via spatial augm...

Reference graph

Works this paper leans on

123 extracted references · 123 canonical work pages · cited by 1 Pith paper · 19 internal anchors

[1]

The perceptron: A probabilistic model for information storage and organization in the brain

F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain.” Psychological Review, vol. 65, no. 6, pp. 386 – 408, 1958

work page 1958
[2]

Bishop, Pattern recognition and machine learning

C. Bishop, Pattern recognition and machine learning . New York: Springer, 2006

work page 2006
[3]

Learning representations by back-propagating errors,

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, p. 533, 1986

work page 1986
[4]

Gpu implementation of neural networks,

K.-S. Oh and K. Jung, “Gpu implementation of neural networks,” Pattern Recogni- tion, vol. 37, no. 6, pp. 1311–1314, 2004

work page 2004
[5]

Deep big simple neural nets excel on handwritten digit recognition,

D. C. Ciresan, U. Meier, L. M. Gambardella, and J. Schmidhuber, “Deep big simple neural nets excel on handwritten digit recognition,” CoRR, 2010

work page 2010
[6]

Imagenet classiﬁcation with deep con- volutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep con- volutional neural networks,” in Advances in neural information processing systems , 2012, pp. 1097–1105

work page 2012
[7]

Very deep convolutional networks for large-scale image recognition,

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, 2014

work page 2014
[8]

Going deeper with convolutions,

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Van- houcke, A. Rabinovich et al., “Going deeper with convolutions,” in CVPR, 2015

work page 2015
[9]

Rich feature hierarchies for ac- curate object detection and semantic segmentation,

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for ac- curate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2014, pp. 580–587. 38 Ethan Goan, Clinton Fookes

work page 2014
[10]

Faster r-cnn: Towards real-time object de- tection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object de- tection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99

work page 2015
[11]

You only look once: Uniﬁed, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Uniﬁed, real-time object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788

work page 2016
[12]

Acoustic modeling using deep belief networks,

A. Mohamed, G. E. Dahl, and G. Hinton, “Acoustic modeling using deep belief networks,” IEEE Transactions on Audio, Speech, and Language Processing , vol. 20, no. 1, pp. 14–22, 2012

work page 2012
[13]

Context-dependent pre-trained deep neu- ral networks for large-vocabulary speech recognition,

G. E. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent pre-trained deep neu- ral networks for large-vocabulary speech recognition,” IEEE Transactions on audio, speech, and language processing, vol. 20, no. 1, pp. 30–42, 2012

work page 2012
[14]

Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Van- houcke, P. Nguyen, T. N. Sainathet al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Pro- cessing Magazine, vol. 29, no. 6, pp. 82–97, 2012

work page 2012
[15]

Deep speech 2 : End-to-end speech recognition in english and mandarin,

D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen, J. Chen, J. Chen, Z. Chen, M. Chrzanowski, A. Coates, G. Diamos, K. Ding, N. Du, E. Elsen, J. Engel, W. Fang, L. Fan, C. Fougner, L. Gao, C. Gong, A. Hannun, T. Han, L. Johannes, B. Jiang, C. Ju, B. Jun, P. LeGresley, L. Lin, J. Liu, Y. ...

work page 2016
[16]

Mastering the game of go without human knowledge,

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hu- bert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017

work page 2017
[17]

Smartening up with artiﬁcial intelligence (ai) - what’s in it for germany and its industrial sector?

“Smartening up with artiﬁcial intelligence (ai) - what’s in it for germany and its industrial sector?” McKinsey & Company, Inc, Tech. Rep., 4 2017. [Online]. Available: https://www.mckinsey.de/ﬁles/170419 mckinsey ki ﬁnal m.pdf

work page 2017
[18]

Artiﬁcial intelligence in wealth and as- set management,

E. V. T. V. Serooskerken, “Artiﬁcial intelligence in wealth and as- set management,” Pictet on Robot Advisors, Tech. Rep., 1 2017. [Online]. Available: https://perspectives.pictet.com/wp-content/uploads/2016/12/ Edgar-van-Tuyll-van-Serooskerken-Pictet-Report-winter-2016-2.pdf

work page 2017
[19]

Wavenet launches in the google assistant

A. van den Oord, T. Walters, and T. Strohman, “Wavenet launches in the google assistant.” [Online]. Available: https://deepmind.com/blog/ wavenet-launches-google-assistant/

work page
[20]

Deep learning for siri’s voice: On-device deep mixture density networks for hybrid unit selection synthesis,

Siri Team, “Deep learning for siri’s voice: On-device deep mixture density networks for hybrid unit selection synthesis,” 8 2017. [Online]. Available: https://machinelearning.apple.com/2017/08/06/siri-voices.html

work page 2017
[21]

Microsoft chatbot is taught to swear on twitter

J. Wakeﬁeld, “Microsoft chatbot is taught to swear on twitter.” [Online]. Available: www.bbc.com/news/technology-35890188

work page
[22]

Google photos labeled black people ’goril- las’

J. Guynn, “Google photos labeled black people ’goril- las’.” [Online]. Available: https://www.usatoday.com/story/tech/2015/07/01/ google-apologizes-after-photos-identify-black-people-as-gorillas/29567465/

work page arXiv 2015
[23]

Gender shades: Intersectional accuracy disparities in commercial gender classiﬁcation,

J. Buolamwini and T. Gebru, “Gender shades: Intersectional accuracy disparities in commercial gender classiﬁcation,” in Conference on fairness, accountability and transparency, 2018, pp. 77–91

work page 2018
[24]

A tragic loss

Tesla Team, “A tragic loss.” [Online]. Available: https://www.tesla.com/en GB/ blog/tragic-loss Bayesian Neural Networks: An Introduction and Survey 39

work page
[25]

Uber suspends self-driving car tests after vehicle hits and kills woman crossing the street in arizona,

ABC News, “Uber suspends self-driving car tests after vehicle hits and kills woman crossing the street in arizona,” 2018. [Online]. Available: http://www.abc.net.au/ news/2018-03-20/uber-suspends-self-driving-car-tests-after-fatal-crash/9565586

work page arXiv 2018
[26]

Regulation (eu) 2016/679 of the european parliment and of the council,

Council of European Union, “Regulation (eu) 2016/679 of the european parliment and of the council,” 2016

work page 2016
[27]

European union regulations on algorithmic decision- making and a right to explanation,

B. Goodman and S. Flaxman, “European union regulations on algorithmic decision- making and a right to explanation,” AI magazine, vol. 38, no. 3, pp. 50–57, 2017

work page 2017
[28]

A shared vision for machine learning in neuroscience,

M. Vu, T. Adali, D. Ba, G. Buzsaki, D. Carlson, K. Heller, C. Liston, C. Rudin, V. Sohal, A. Widge, H. Mayberg, G. Sapiro, and K. Dzirasa, “A shared vision for machine learning in neuroscience,” JOURNAL OF NEUROSCIENCE, vol. 38, no. 7, pp. 1601–1607, 2018

work page 2018
[29]

What do we need to build explainable AI systems for the medical domain?

A. Holzinger, C. Biemann, C. S. Pattichis, and D. B. Kell, “What do we need to build explainable ai systems for the medical domain?” arXiv preprint arXiv:1712.09923 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[30]

Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,

R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2015, pp. 1721–1730

work page 2015
[31]

Explainable artiﬁcial intelligence (xai),

D. Gunning, “Explainable artiﬁcial intelligence (xai),” Defense Advanced Research Projects Agency (DARPA), nd Web , 2017

work page 2017
[32]

Probable networks and plausible predictionsa review of practical bayesian methods for supervised neural networks,

D. J. MacKay, “Probable networks and plausible predictionsa review of practical bayesian methods for supervised neural networks,” Network: computation in neural systems, vol. 6, no. 3, pp. 469–505, 1995

work page 1995
[33]

Bayesian approach for neural networksreview and case studies,

J. Lampinen and A. Vehtari, “Bayesian approach for neural networksreview and case studies,” Neural networks, vol. 14, no. 3, pp. 257–274, 2001

work page 2001
[34]

Towards bayesian deep learning: A survey,

H. Wang and D.-Y. Yeung, “Towards bayesian deep learning: A survey,” arXiv preprint arXiv:1604.01662, 2016

work page arXiv 2016
[35]

Murphey, Machine learning, a probabilistic perspective

K. Murphey, Machine learning, a probabilistic perspective . Cambridge, MA: MIT Press, 2012

work page 2012
[36]

Deep sparse rectiﬁer neural networks,

X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectiﬁer neural networks,” in AISTATS, 2011, pp. 315–323

work page 2011
[37]

Rectiﬁer nonlinearities improve neural network acoustic models,

A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectiﬁer nonlinearities improve neural network acoustic models,” in ICML, vol. 30, 2013, p. 3

work page 2013
[38]

R. M. Neal, Bayesian learning for neural networks . Springer Science & Business Media, 1996, vol. 118

work page 1996
[39]

Uncertainty in deep learning,

Y. Gal, “Uncertainty in deep learning,” University of Cambridge , 2016

work page 2016
[40]

Consistent inference of probabilities in layered networks: predictions and generalizations,

N. Tishby, E. Levin, and S. A. Solla, “Consistent inference of probabilities in layered networks: predictions and generalizations,” in International 1989 Joint Conference on Neural Networks , 1989, pp. 403–409 vol.2

work page 1989
[41]

Transforming neural-net output levels to probability distributions,

J. S. Denker and Y. Lecun, “Transforming neural-net output levels to probability distributions,” in NeurIPS, 1991, pp. 853–859

work page 1991
[42]

Approximation by superpositions of a sigmoidal function,

G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathemat- ics of control, signals and systems , vol. 2, no. 4, pp. 303–314, 1989

work page 1989
[43]

On the approximate realization of continuous mappings by neural networks,

K.-I. Funahashi, “On the approximate realization of continuous mappings by neural networks,” Neural networks, vol. 2, no. 3, pp. 183–192, 1989

work page 1989
[44]

Approximation capabilities of multilayer feedforward networks,

K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991

work page 1991
[45]

Quantiﬁed maximum entropy memsys5 users manual,

S. F. Gull and J. Skilling, “Quantiﬁed maximum entropy memsys5 users manual,” Maximum Entropy Data Consultants Ltd , vol. 33, 1991

work page 1991
[46]

Bayesian interpolation,

D. J. MacKay, “Bayesian interpolation,” Neural computation, vol. 4, no. 3, pp. 415– 447, 1992

work page 1992
[47]

Bayesian methods for adaptive models,

——, “Bayesian methods for adaptive models,” Ph.D. dissertation, California Insti- tute of Technology, 1992

work page 1992
[48]

A practical bayesian framework for backpropagation networks,

——, “A practical bayesian framework for backpropagation networks,” Neural com- putation, vol. 4, no. 3, pp. 448–472, 1992. 40 Ethan Goan, Clinton Fookes

work page 1992
[49]

An introduction to variational methods for graphical models,

M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, “An introduction to variational methods for graphical models,” Machine learning, vol. 37, no. 2, pp. 183–233, 1999

work page 1999
[50]

Graphical models, exponential families, and variational inference,

M. J. Wainwright, M. I. Jordan et al., “Graphical models, exponential families, and variational inference,” Foundations and Trends R⃝ in Machine Learning , vol. 1, no. 1–2, pp. 1–305, 2008

work page 2008
[51]

Variational inference: A review for statisticians,

D. M. Blei, A. Kucukelbir, and J. D. McAuliﬀe, “Variational inference: A review for statisticians,” Journal of the American Statistical Association , vol. 112, no. 518, pp. 859–877, 2017

work page 2017
[52]

Stochastic variational infer- ence,

M. D. Hoﬀman, D. M. Blei, C. Wang, and J. Paisley, “Stochastic variational infer- ence,” The Journal of Machine Learning Research , vol. 14, no. 1, pp. 1303–1347, 2013

work page 2013
[53]

Ensemble learning in bayesian neural networks,

D. Barber and C. M. Bishop, “Ensemble learning in bayesian neural networks,”NATO ASI SERIES F COMPUTER AND SYSTEMS SCIENCES , vol. 168, pp. 215–238, 1998

work page 1998
[54]

Keeping the neural networks simple by minimizing the description length of the weights,

G. E. Hinton and D. Van Camp, “Keeping the neural networks simple by minimizing the description length of the weights,” in Proceedings of the sixth annual conference on Computational learning theory . ACM, 1993, pp. 5–13

work page 1993
[55]

A Conceptual Introduction to Hamiltonian Monte Carlo

M. Betancourt, “A conceptual introduction to hamiltonian monte carlo,” arXiv preprint arXiv:1701.02434, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[56]

The geometric founda- tions of hamiltonian monte carlo,

M. Betancourt, S. Byrne, S. Livingstone, M. Girolami et al., “The geometric founda- tions of hamiltonian monte carlo,” Bernoulli, vol. 23, no. 4A, pp. 2257–2298, 2017

work page 2017
[57]

Agent-based scientiﬁc simula- tion,

G. Madey, X. Xiang, S. E. Cabaniss, and Y. Huang, “Agent-based scientiﬁc simula- tion,” Computing in Science & Engineering , vol. 2, no. 01, pp. 22–29, jan 2005

work page 2005
[58]

Hybrid monte carlo,

S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth, “Hybrid monte carlo,” Physics letters B , vol. 195, no. 2, pp. 216–222, 1987

work page 1987
[59]

Mcmc using hamiltonian dynamics,

R. M. Neal et al., “Mcmc using hamiltonian dynamics,” Handbook of markov chain monte carlo, vol. 2, no. 11, p. 2, 2011

work page 2011
[60]

Brooks, A

S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, Handbook of markov chain monte carlo. CRC press, 2011

work page 2011
[61]

Bayesian learning via stochastic gradient langevin dynam- ics,

M. Welling and Y. Teh, “Bayesian learning via stochastic gradient langevin dynam- ics,” Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 681–688, 2011

work page 2011
[62]

Practical variational inference for neural networks,

A. Graves, “Practical variational inference for neural networks,” in Advances in Neu- ral Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2011, pp. 2348–2356

work page 2011
[63]

The variational gaussian approximation revisited,

M. Opper and C. Archambeau, “The variational gaussian approximation revisited,” Neural computation, vol. 21, no. 3, pp. 786–792, 2009

work page 2009
[64]

Probabilistic backpropagation for scal- able learning of bayesian neural networks,

J. M. Hern´ andez-Lobato and R. Adams, “Probabilistic backpropagation for scal- able learning of bayesian neural networks,” in International Conference on Machine Learning, 2015, pp. 1861–1869

work page 2015
[65]

Variational Bayesian Inference with Stochastic Search

J. Paisley, D. Blei, and M. Jordan, “Variational bayesian inference with stochastic search,” arXiv preprint arXiv:1206.6430 , 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012
[66]

Variance reduction techniques for digital simulation,

J. R. Wilson, “Variance reduction techniques for digital simulation,” American Jour- nal of Mathematical and Management Sciences , vol. 4, no. 3-4, pp. 277–312, 1984

work page 1984
[67]

The variational gaussian approximation revisited,

M. Opper and C. Archambeau, “The variational gaussian approximation revisited,” Neural computation, vol. 21 3, pp. 786–92, 2009

work page 2009
[68]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[69]

Stochastic backpropagation and ap- proximate inference in deep generative models,

D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and ap- proximate inference in deep generative models,” in Proceedings of the 31st Interna- tional Conference on Machine Learning (ICML) , 2014, pp. 1278–1286

work page 2014
[70]

Dropout: A simple way to prevent neural networks from overﬁtting,

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overﬁtting,” The Jour- nal of Machine Learning Research , vol. 15, no. 1, pp. 1929–1958, 2014. Bayesian Neural Networks: An Introduction and Survey 41

work page 1929
[71]

Variational dropout and the local reparameterization trick,

D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” in Advances in Neural Information Processing Systems , 2015, pp. 2575–2583

work page 2015
[72]

Fast dropout training,

S. Wang and C. Manning, “Fast dropout training,” in international conference on machine learning, 2013, pp. 118–126

work page 2013
[73]

Sex, mixability, and modularity,

A. Livnat, C. Papadimitriou, N. Pippenger, and M. W. Feldman, “Sex, mixability, and modularity,” Proceedings of the National Academy of Sciences , vol. 107, no. 4, pp. 1452–1457, 2010

work page 2010
[74]

A bayesian approach to on-line learning,

M. Opper and O. Winther, “A bayesian approach to on-line learning,” On-line learn- ing in neural networks , pp. 363–378, 1998

work page 1998
[75]

A family of algorithms for approximate bayesian inference,

T. P. Minka, “A family of algorithms for approximate bayesian inference,” Ph.D. dissertation, Massachusetts Institute of Technology, 2001

work page 2001
[76]

Weight Uncertainty in Neural Networks

C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural networks,” arXiv preprint arXiv:1505.05424 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[77]

Computing with inﬁnite networks,

C. K. Williams, “Computing with inﬁnite networks,” in Advances in neural informa- tion processing systems, 1997, pp. 295–301

work page 1997
[78]

Deep neural networks as gaussian processes,

J. Lee, J. Sohl-dickstein, J. Pennington, R. Novak, S. Schoenholz, and Y. Bahri, “Deep neural networks as gaussian processes,” in International Conference on Learning Representations, 2018

work page 2018
[79]

Deep gaussian processes,

A. Damianou and N. Lawrence, “Deep gaussian processes,” in AISTATS, 2013, pp. 207–215

work page 2013
[80]

Deep gaussian processes and variational propagation of uncertainty,

A. Damianou, “Deep gaussian processes and variational propagation of uncertainty,” Ph.D. dissertation, University of Sheﬃeld, 2015

work page 2015

Showing first 80 references.

[1] [1]

The perceptron: A probabilistic model for information storage and organization in the brain

F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain.” Psychological Review, vol. 65, no. 6, pp. 386 – 408, 1958

work page 1958

[2] [2]

Bishop, Pattern recognition and machine learning

C. Bishop, Pattern recognition and machine learning . New York: Springer, 2006

work page 2006

[3] [3]

Learning representations by back-propagating errors,

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, p. 533, 1986

work page 1986

[4] [4]

Gpu implementation of neural networks,

K.-S. Oh and K. Jung, “Gpu implementation of neural networks,” Pattern Recogni- tion, vol. 37, no. 6, pp. 1311–1314, 2004

work page 2004

[5] [5]

Deep big simple neural nets excel on handwritten digit recognition,

D. C. Ciresan, U. Meier, L. M. Gambardella, and J. Schmidhuber, “Deep big simple neural nets excel on handwritten digit recognition,” CoRR, 2010

work page 2010

[6] [6]

Imagenet classiﬁcation with deep con- volutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep con- volutional neural networks,” in Advances in neural information processing systems , 2012, pp. 1097–1105

work page 2012

[7] [7]

Very deep convolutional networks for large-scale image recognition,

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, 2014

work page 2014

[8] [8]

Going deeper with convolutions,

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Van- houcke, A. Rabinovich et al., “Going deeper with convolutions,” in CVPR, 2015

work page 2015

[9] [9]

Rich feature hierarchies for ac- curate object detection and semantic segmentation,

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for ac- curate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2014, pp. 580–587. 38 Ethan Goan, Clinton Fookes

work page 2014

[10] [10]

Faster r-cnn: Towards real-time object de- tection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object de- tection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99

work page 2015

[11] [11]

You only look once: Uniﬁed, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Uniﬁed, real-time object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788

work page 2016

[12] [12]

Acoustic modeling using deep belief networks,

A. Mohamed, G. E. Dahl, and G. Hinton, “Acoustic modeling using deep belief networks,” IEEE Transactions on Audio, Speech, and Language Processing , vol. 20, no. 1, pp. 14–22, 2012

work page 2012

[13] [13]

Context-dependent pre-trained deep neu- ral networks for large-vocabulary speech recognition,

G. E. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent pre-trained deep neu- ral networks for large-vocabulary speech recognition,” IEEE Transactions on audio, speech, and language processing, vol. 20, no. 1, pp. 30–42, 2012

work page 2012

[14] [14]

Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Van- houcke, P. Nguyen, T. N. Sainathet al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Pro- cessing Magazine, vol. 29, no. 6, pp. 82–97, 2012

work page 2012

[15] [15]

Deep speech 2 : End-to-end speech recognition in english and mandarin,

D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen, J. Chen, J. Chen, Z. Chen, M. Chrzanowski, A. Coates, G. Diamos, K. Ding, N. Du, E. Elsen, J. Engel, W. Fang, L. Fan, C. Fougner, L. Gao, C. Gong, A. Hannun, T. Han, L. Johannes, B. Jiang, C. Ju, B. Jun, P. LeGresley, L. Lin, J. Liu, Y. ...

work page 2016

[16] [16]

Mastering the game of go without human knowledge,

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hu- bert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017

work page 2017

[17] [17]

Smartening up with artiﬁcial intelligence (ai) - what’s in it for germany and its industrial sector?

“Smartening up with artiﬁcial intelligence (ai) - what’s in it for germany and its industrial sector?” McKinsey & Company, Inc, Tech. Rep., 4 2017. [Online]. Available: https://www.mckinsey.de/ﬁles/170419 mckinsey ki ﬁnal m.pdf

work page 2017

[18] [18]

Artiﬁcial intelligence in wealth and as- set management,

E. V. T. V. Serooskerken, “Artiﬁcial intelligence in wealth and as- set management,” Pictet on Robot Advisors, Tech. Rep., 1 2017. [Online]. Available: https://perspectives.pictet.com/wp-content/uploads/2016/12/ Edgar-van-Tuyll-van-Serooskerken-Pictet-Report-winter-2016-2.pdf

work page 2017

[19] [19]

Wavenet launches in the google assistant

A. van den Oord, T. Walters, and T. Strohman, “Wavenet launches in the google assistant.” [Online]. Available: https://deepmind.com/blog/ wavenet-launches-google-assistant/

work page

[20] [20]

Deep learning for siri’s voice: On-device deep mixture density networks for hybrid unit selection synthesis,

Siri Team, “Deep learning for siri’s voice: On-device deep mixture density networks for hybrid unit selection synthesis,” 8 2017. [Online]. Available: https://machinelearning.apple.com/2017/08/06/siri-voices.html

work page 2017

[21] [21]

Microsoft chatbot is taught to swear on twitter

J. Wakeﬁeld, “Microsoft chatbot is taught to swear on twitter.” [Online]. Available: www.bbc.com/news/technology-35890188

work page

[22] [22]

Google photos labeled black people ’goril- las’

J. Guynn, “Google photos labeled black people ’goril- las’.” [Online]. Available: https://www.usatoday.com/story/tech/2015/07/01/ google-apologizes-after-photos-identify-black-people-as-gorillas/29567465/

work page arXiv 2015

[23] [23]

Gender shades: Intersectional accuracy disparities in commercial gender classiﬁcation,

J. Buolamwini and T. Gebru, “Gender shades: Intersectional accuracy disparities in commercial gender classiﬁcation,” in Conference on fairness, accountability and transparency, 2018, pp. 77–91

work page 2018

[24] [24]

A tragic loss

Tesla Team, “A tragic loss.” [Online]. Available: https://www.tesla.com/en GB/ blog/tragic-loss Bayesian Neural Networks: An Introduction and Survey 39

work page

[25] [25]

Uber suspends self-driving car tests after vehicle hits and kills woman crossing the street in arizona,

ABC News, “Uber suspends self-driving car tests after vehicle hits and kills woman crossing the street in arizona,” 2018. [Online]. Available: http://www.abc.net.au/ news/2018-03-20/uber-suspends-self-driving-car-tests-after-fatal-crash/9565586

work page arXiv 2018

[26] [26]

Regulation (eu) 2016/679 of the european parliment and of the council,

Council of European Union, “Regulation (eu) 2016/679 of the european parliment and of the council,” 2016

work page 2016

[27] [27]

European union regulations on algorithmic decision- making and a right to explanation,

B. Goodman and S. Flaxman, “European union regulations on algorithmic decision- making and a right to explanation,” AI magazine, vol. 38, no. 3, pp. 50–57, 2017

work page 2017

[28] [28]

A shared vision for machine learning in neuroscience,

M. Vu, T. Adali, D. Ba, G. Buzsaki, D. Carlson, K. Heller, C. Liston, C. Rudin, V. Sohal, A. Widge, H. Mayberg, G. Sapiro, and K. Dzirasa, “A shared vision for machine learning in neuroscience,” JOURNAL OF NEUROSCIENCE, vol. 38, no. 7, pp. 1601–1607, 2018

work page 2018

[29] [29]

What do we need to build explainable AI systems for the medical domain?

A. Holzinger, C. Biemann, C. S. Pattichis, and D. B. Kell, “What do we need to build explainable ai systems for the medical domain?” arXiv preprint arXiv:1712.09923 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[30] [30]

Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,

R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2015, pp. 1721–1730

work page 2015

[31] [31]

Explainable artiﬁcial intelligence (xai),

D. Gunning, “Explainable artiﬁcial intelligence (xai),” Defense Advanced Research Projects Agency (DARPA), nd Web , 2017

work page 2017

[32] [32]

Probable networks and plausible predictionsa review of practical bayesian methods for supervised neural networks,

D. J. MacKay, “Probable networks and plausible predictionsa review of practical bayesian methods for supervised neural networks,” Network: computation in neural systems, vol. 6, no. 3, pp. 469–505, 1995

work page 1995

[33] [33]

Bayesian approach for neural networksreview and case studies,

J. Lampinen and A. Vehtari, “Bayesian approach for neural networksreview and case studies,” Neural networks, vol. 14, no. 3, pp. 257–274, 2001

work page 2001

[34] [34]

Towards bayesian deep learning: A survey,

H. Wang and D.-Y. Yeung, “Towards bayesian deep learning: A survey,” arXiv preprint arXiv:1604.01662, 2016

work page arXiv 2016

[35] [35]

Murphey, Machine learning, a probabilistic perspective

K. Murphey, Machine learning, a probabilistic perspective . Cambridge, MA: MIT Press, 2012

work page 2012

[36] [36]

Deep sparse rectiﬁer neural networks,

X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectiﬁer neural networks,” in AISTATS, 2011, pp. 315–323

work page 2011

[37] [37]

Rectiﬁer nonlinearities improve neural network acoustic models,

A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectiﬁer nonlinearities improve neural network acoustic models,” in ICML, vol. 30, 2013, p. 3

work page 2013

[38] [38]

R. M. Neal, Bayesian learning for neural networks . Springer Science & Business Media, 1996, vol. 118

work page 1996

[39] [39]

Uncertainty in deep learning,

Y. Gal, “Uncertainty in deep learning,” University of Cambridge , 2016

work page 2016

[40] [40]

Consistent inference of probabilities in layered networks: predictions and generalizations,

N. Tishby, E. Levin, and S. A. Solla, “Consistent inference of probabilities in layered networks: predictions and generalizations,” in International 1989 Joint Conference on Neural Networks , 1989, pp. 403–409 vol.2

work page 1989

[41] [41]

Transforming neural-net output levels to probability distributions,

J. S. Denker and Y. Lecun, “Transforming neural-net output levels to probability distributions,” in NeurIPS, 1991, pp. 853–859

work page 1991

[42] [42]

Approximation by superpositions of a sigmoidal function,

G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathemat- ics of control, signals and systems , vol. 2, no. 4, pp. 303–314, 1989

work page 1989

[43] [43]

On the approximate realization of continuous mappings by neural networks,

K.-I. Funahashi, “On the approximate realization of continuous mappings by neural networks,” Neural networks, vol. 2, no. 3, pp. 183–192, 1989

work page 1989

[44] [44]

Approximation capabilities of multilayer feedforward networks,

K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991

work page 1991

[45] [45]

Quantiﬁed maximum entropy memsys5 users manual,

S. F. Gull and J. Skilling, “Quantiﬁed maximum entropy memsys5 users manual,” Maximum Entropy Data Consultants Ltd , vol. 33, 1991

work page 1991

[46] [46]

Bayesian interpolation,

D. J. MacKay, “Bayesian interpolation,” Neural computation, vol. 4, no. 3, pp. 415– 447, 1992

work page 1992

[47] [47]

Bayesian methods for adaptive models,

——, “Bayesian methods for adaptive models,” Ph.D. dissertation, California Insti- tute of Technology, 1992

work page 1992

[48] [48]

A practical bayesian framework for backpropagation networks,

——, “A practical bayesian framework for backpropagation networks,” Neural com- putation, vol. 4, no. 3, pp. 448–472, 1992. 40 Ethan Goan, Clinton Fookes

work page 1992

[49] [49]

An introduction to variational methods for graphical models,

M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, “An introduction to variational methods for graphical models,” Machine learning, vol. 37, no. 2, pp. 183–233, 1999

work page 1999

[50] [50]

Graphical models, exponential families, and variational inference,

M. J. Wainwright, M. I. Jordan et al., “Graphical models, exponential families, and variational inference,” Foundations and Trends R⃝ in Machine Learning , vol. 1, no. 1–2, pp. 1–305, 2008

work page 2008

[51] [51]

Variational inference: A review for statisticians,

D. M. Blei, A. Kucukelbir, and J. D. McAuliﬀe, “Variational inference: A review for statisticians,” Journal of the American Statistical Association , vol. 112, no. 518, pp. 859–877, 2017

work page 2017

[52] [52]

Stochastic variational infer- ence,

M. D. Hoﬀman, D. M. Blei, C. Wang, and J. Paisley, “Stochastic variational infer- ence,” The Journal of Machine Learning Research , vol. 14, no. 1, pp. 1303–1347, 2013

work page 2013

[53] [53]

Ensemble learning in bayesian neural networks,

D. Barber and C. M. Bishop, “Ensemble learning in bayesian neural networks,”NATO ASI SERIES F COMPUTER AND SYSTEMS SCIENCES , vol. 168, pp. 215–238, 1998

work page 1998

[54] [54]

Keeping the neural networks simple by minimizing the description length of the weights,

G. E. Hinton and D. Van Camp, “Keeping the neural networks simple by minimizing the description length of the weights,” in Proceedings of the sixth annual conference on Computational learning theory . ACM, 1993, pp. 5–13

work page 1993

[55] [55]

A Conceptual Introduction to Hamiltonian Monte Carlo

M. Betancourt, “A conceptual introduction to hamiltonian monte carlo,” arXiv preprint arXiv:1701.02434, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[56] [56]

The geometric founda- tions of hamiltonian monte carlo,

M. Betancourt, S. Byrne, S. Livingstone, M. Girolami et al., “The geometric founda- tions of hamiltonian monte carlo,” Bernoulli, vol. 23, no. 4A, pp. 2257–2298, 2017

work page 2017

[57] [57]

Agent-based scientiﬁc simula- tion,

G. Madey, X. Xiang, S. E. Cabaniss, and Y. Huang, “Agent-based scientiﬁc simula- tion,” Computing in Science & Engineering , vol. 2, no. 01, pp. 22–29, jan 2005

work page 2005

[58] [58]

Hybrid monte carlo,

S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth, “Hybrid monte carlo,” Physics letters B , vol. 195, no. 2, pp. 216–222, 1987

work page 1987

[59] [59]

Mcmc using hamiltonian dynamics,

R. M. Neal et al., “Mcmc using hamiltonian dynamics,” Handbook of markov chain monte carlo, vol. 2, no. 11, p. 2, 2011

work page 2011

[60] [60]

Brooks, A

S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, Handbook of markov chain monte carlo. CRC press, 2011

work page 2011

[61] [61]

Bayesian learning via stochastic gradient langevin dynam- ics,

M. Welling and Y. Teh, “Bayesian learning via stochastic gradient langevin dynam- ics,” Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 681–688, 2011

work page 2011

[62] [62]

Practical variational inference for neural networks,

A. Graves, “Practical variational inference for neural networks,” in Advances in Neu- ral Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2011, pp. 2348–2356

work page 2011

[63] [63]

The variational gaussian approximation revisited,

M. Opper and C. Archambeau, “The variational gaussian approximation revisited,” Neural computation, vol. 21, no. 3, pp. 786–792, 2009

work page 2009

[64] [64]

Probabilistic backpropagation for scal- able learning of bayesian neural networks,

J. M. Hern´ andez-Lobato and R. Adams, “Probabilistic backpropagation for scal- able learning of bayesian neural networks,” in International Conference on Machine Learning, 2015, pp. 1861–1869

work page 2015

[65] [65]

Variational Bayesian Inference with Stochastic Search

J. Paisley, D. Blei, and M. Jordan, “Variational bayesian inference with stochastic search,” arXiv preprint arXiv:1206.6430 , 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012

[66] [66]

Variance reduction techniques for digital simulation,

J. R. Wilson, “Variance reduction techniques for digital simulation,” American Jour- nal of Mathematical and Management Sciences , vol. 4, no. 3-4, pp. 277–312, 1984

work page 1984

[67] [67]

The variational gaussian approximation revisited,

M. Opper and C. Archambeau, “The variational gaussian approximation revisited,” Neural computation, vol. 21 3, pp. 786–92, 2009

work page 2009

[68] [68]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[69] [69]

Stochastic backpropagation and ap- proximate inference in deep generative models,

D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and ap- proximate inference in deep generative models,” in Proceedings of the 31st Interna- tional Conference on Machine Learning (ICML) , 2014, pp. 1278–1286

work page 2014

[70] [70]

Dropout: A simple way to prevent neural networks from overﬁtting,

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overﬁtting,” The Jour- nal of Machine Learning Research , vol. 15, no. 1, pp. 1929–1958, 2014. Bayesian Neural Networks: An Introduction and Survey 41

work page 1929

[71] [71]

Variational dropout and the local reparameterization trick,

D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” in Advances in Neural Information Processing Systems , 2015, pp. 2575–2583

work page 2015

[72] [72]

Fast dropout training,

S. Wang and C. Manning, “Fast dropout training,” in international conference on machine learning, 2013, pp. 118–126

work page 2013

[73] [73]

Sex, mixability, and modularity,

A. Livnat, C. Papadimitriou, N. Pippenger, and M. W. Feldman, “Sex, mixability, and modularity,” Proceedings of the National Academy of Sciences , vol. 107, no. 4, pp. 1452–1457, 2010

work page 2010

[74] [74]

A bayesian approach to on-line learning,

M. Opper and O. Winther, “A bayesian approach to on-line learning,” On-line learn- ing in neural networks , pp. 363–378, 1998

work page 1998

[75] [75]

A family of algorithms for approximate bayesian inference,

T. P. Minka, “A family of algorithms for approximate bayesian inference,” Ph.D. dissertation, Massachusetts Institute of Technology, 2001

work page 2001

[76] [76]

Weight Uncertainty in Neural Networks

C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural networks,” arXiv preprint arXiv:1505.05424 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[77] [77]

Computing with inﬁnite networks,

C. K. Williams, “Computing with inﬁnite networks,” in Advances in neural informa- tion processing systems, 1997, pp. 295–301

work page 1997

[78] [78]

Deep neural networks as gaussian processes,

J. Lee, J. Sohl-dickstein, J. Pennington, R. Novak, S. Schoenholz, and Y. Bahri, “Deep neural networks as gaussian processes,” in International Conference on Learning Representations, 2018

work page 2018

[79] [79]

Deep gaussian processes,

A. Damianou and N. Lawrence, “Deep gaussian processes,” in AISTATS, 2013, pp. 207–215

work page 2013

[80] [80]

Deep gaussian processes and variational propagation of uncertainty,

A. Damianou, “Deep gaussian processes and variational propagation of uncertainty,” Ph.D. dissertation, University of Sheﬃeld, 2015

work page 2015