pith. sign in

arxiv: 2006.12024 · v3 · submitted 2020-06-22 · 📊 stat.ML · cs.LG

Bayesian Neural Networks: An Introduction and Survey

Pith reviewed 2026-05-24 14:36 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords Bayesian neural networksuncertainty quantificationapproximate inferencevariational inferenceMarkov chain Monte Carlomachine learningdeep learning
0
0 comments X

The pith

Bayesian Neural Networks place probability distributions over weights so predictions include explicit uncertainty estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard neural networks output single predictions without any measure of confidence. Bayesian Neural Networks treat weights as random variables drawn from a posterior distribution, so the model can integrate over many possible parameter settings and return a distribution over outputs. The survey presents the core Bayesian formulation, then examines the main families of approximate inference used to make the approach tractable. By contrasting these approximations the authors show persistent gaps in accuracy, scalability, and calibration. Readers who accept the survey's framing therefore see a clear research agenda for making uncertainty-aware networks practical.

Core claim

Bayesian Neural Networks enable reasoning about uncertainty in predictions by placing distributions over network weights and integrating over the posterior; the paper surveys the principal approximate inference techniques required to implement this idea and uses the comparison to identify concrete limitations that future work must address.

What carries the argument

Approximate posterior inference over network weights, performed via methods such as variational inference and Markov-chain Monte Carlo, to replace point estimates with distributions.

If this is right

  • Predictions come with calibrated uncertainty intervals rather than point values alone.
  • Safety-critical systems can flag inputs where the model is uncertain and defer to human review.
  • Model comparison and hyper-parameter tuning can use marginal likelihoods instead of validation accuracy.
  • Downstream tasks such as active learning and Bayesian optimisation receive direct uncertainty signals from the network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same posterior-over-weights construction could be applied to recurrent or transformer architectures with only notational changes.
  • Hybrid schemes that combine one of the surveyed approximations with post-hoc calibration might close some of the identified gaps without new theory.
  • Deployment on edge devices would require additional compression techniques that preserve the uncertainty properties highlighted in the survey.

Load-bearing premise

That the approximate inference methods reviewed in the survey are representative of the main approaches and limitations in the BNN literature.

What would settle it

A new inference procedure, omitted from the survey, that simultaneously achieves exact posterior computation, linear scaling to modern network sizes, and well-calibrated uncertainty on benchmark tasks would falsify the claimed need for further improvement.

Figures

Figures reproduced from arXiv: 2006.12024 by Clinton Fookes, Ethan Goan.

Figure 1
Figure 1. Figure 1: Comparison of neural network to traditional probabilistic methods [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of a NN architecture with a single hidden layer for either [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Examples of commonly used activation functions in NNs. The output [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Graphical illustration of how the evidence plays a role in investigating [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Graphical illustration of how the minimisation of the KL divergence [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of GP prior induced on output when placing a Gaussian [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of BNNs with GP for a regression task over three toy data [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Examples of difficult to classify images from each class in MNIST. [PITH_FULL_IMAGE:figures/full_fig_p044_8.png] view at source ↗
read the original abstract

Neural Networks (NNs) have provided state-of-the-art results for many challenging machine learning tasks such as detection, regression and classification across the domains of computer vision, speech recognition and natural language processing. Despite their success, they are often implemented in a frequentist scheme, meaning they are unable to reason about uncertainty in their predictions. This article introduces Bayesian Neural Networks (BNNs) and the seminal research regarding their implementation. Different approximate inference methods are compared, and used to highlight where future research can improve on current methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript is an introductory survey on Bayesian Neural Networks (BNNs). It claims that, unlike frequentist neural networks, BNNs enable reasoning about uncertainty in predictions. The paper reviews seminal research on BNN implementation, compares different approximate inference methods, and uses those comparisons to highlight directions for future research.

Significance. As a survey without original derivations or experiments, the manuscript's significance rests on the accuracy and representativeness of its synthesis of prior work. If the coverage of approximate inference methods is balanced and the summaries of key papers are precise, it could serve as a useful entry point for newcomers to uncertainty quantification in neural networks; the standard contrast with frequentist approaches is already established in the literature.

minor comments (3)
  1. The abstract states that frequentist NNs are 'unable to reason about uncertainty,' but the survey should briefly acknowledge post-hoc uncertainty estimation techniques (e.g., MC dropout or ensembles) used in the frequentist setting to sharpen the motivation for BNNs.
  2. Because the paper is positioned as a survey, the comparison of approximate inference methods would benefit from an explicit statement of selection criteria (e.g., which methods were chosen and why) to address the representativeness concern.
  3. The manuscript would be strengthened by including a short table summarizing the reviewed inference methods along dimensions such as scalability, accuracy guarantees, and computational cost.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and recommendation of minor revision. The report provides a fair summary of the manuscript but lists no specific major comments requiring response.

Circularity Check

0 steps flagged

Survey paper introduces no derivations or fitted quantities

full rationale

This is an introductory survey paper whose content consists of descriptive overviews of existing BNN literature, comparisons of established approximate inference methods, and standard contrasts between Bayesian and frequentist neural networks. No new equations, parameters, predictions, or technical derivations are introduced that could reduce to the paper's own inputs by construction. The central framing is a widely accepted distinction in the field and draws on external references without self-referential load-bearing steps. As a result, no circularity patterns (self-definitional, fitted-input-as-prediction, or self-citation chains) are present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper with no new mathematical derivations, fitted parameters, or postulated entities. All content rests on prior literature.

pith-pipeline@v0.9.0 · 5601 in / 859 out tokens · 19832 ms · 2026-05-24T14:36:19.668365+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Extracting redshifts from 2D slitless spectroscopic images using deep learning for the CSST galaxy survey

    astro-ph.IM 2026-05 unverdicted novelty 6.0

    A Bayesian CNN maps 2D slitless spectral images to redshift estimates with NMAD precision 0.0104 for SNR_GI >=1 and better for brighter sources, while remaining robust to wavelength calibration errors via spatial augm...

Reference graph

Works this paper leans on

123 extracted references · 123 canonical work pages · cited by 1 Pith paper · 19 internal anchors

  1. [1]

    The perceptron: A probabilistic model for information storage and organization in the brain

    F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain.” Psychological Review, vol. 65, no. 6, pp. 386 – 408, 1958

  2. [2]

    Bishop, Pattern recognition and machine learning

    C. Bishop, Pattern recognition and machine learning . New York: Springer, 2006

  3. [3]

    Learning representations by back-propagating errors,

    D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, p. 533, 1986

  4. [4]

    Gpu implementation of neural networks,

    K.-S. Oh and K. Jung, “Gpu implementation of neural networks,” Pattern Recogni- tion, vol. 37, no. 6, pp. 1311–1314, 2004

  5. [5]

    Deep big simple neural nets excel on handwritten digit recognition,

    D. C. Ciresan, U. Meier, L. M. Gambardella, and J. Schmidhuber, “Deep big simple neural nets excel on handwritten digit recognition,” CoRR, 2010

  6. [6]

    Imagenet classification with deep con- volutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep con- volutional neural networks,” in Advances in neural information processing systems , 2012, pp. 1097–1105

  7. [7]

    Very deep convolutional networks for large-scale image recognition,

    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, 2014

  8. [8]

    Going deeper with convolutions,

    C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Van- houcke, A. Rabinovich et al., “Going deeper with convolutions,” in CVPR, 2015

  9. [9]

    Rich feature hierarchies for ac- curate object detection and semantic segmentation,

    R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for ac- curate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2014, pp. 580–587. 38 Ethan Goan, Clinton Fookes

  10. [10]

    Faster r-cnn: Towards real-time object de- tection with region proposal networks,

    S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object de- tection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99

  11. [11]

    You only look once: Unified, real-time object detection,

    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788

  12. [12]

    Acoustic modeling using deep belief networks,

    A. Mohamed, G. E. Dahl, and G. Hinton, “Acoustic modeling using deep belief networks,” IEEE Transactions on Audio, Speech, and Language Processing , vol. 20, no. 1, pp. 14–22, 2012

  13. [13]

    Context-dependent pre-trained deep neu- ral networks for large-vocabulary speech recognition,

    G. E. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent pre-trained deep neu- ral networks for large-vocabulary speech recognition,” IEEE Transactions on audio, speech, and language processing, vol. 20, no. 1, pp. 30–42, 2012

  14. [14]

    Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,

    G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Van- houcke, P. Nguyen, T. N. Sainathet al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Pro- cessing Magazine, vol. 29, no. 6, pp. 82–97, 2012

  15. [15]

    Deep speech 2 : End-to-end speech recognition in english and mandarin,

    D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen, J. Chen, J. Chen, Z. Chen, M. Chrzanowski, A. Coates, G. Diamos, K. Ding, N. Du, E. Elsen, J. Engel, W. Fang, L. Fan, C. Fougner, L. Gao, C. Gong, A. Hannun, T. Han, L. Johannes, B. Jiang, C. Ju, B. Jun, P. LeGresley, L. Lin, J. Liu, Y. ...

  16. [16]

    Mastering the game of go without human knowledge,

    D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hu- bert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017

  17. [17]

    Smartening up with artificial intelligence (ai) - what’s in it for germany and its industrial sector?

    “Smartening up with artificial intelligence (ai) - what’s in it for germany and its industrial sector?” McKinsey & Company, Inc, Tech. Rep., 4 2017. [Online]. Available: https://www.mckinsey.de/files/170419 mckinsey ki final m.pdf

  18. [18]

    Artificial intelligence in wealth and as- set management,

    E. V. T. V. Serooskerken, “Artificial intelligence in wealth and as- set management,” Pictet on Robot Advisors, Tech. Rep., 1 2017. [Online]. Available: https://perspectives.pictet.com/wp-content/uploads/2016/12/ Edgar-van-Tuyll-van-Serooskerken-Pictet-Report-winter-2016-2.pdf

  19. [19]

    Wavenet launches in the google assistant

    A. van den Oord, T. Walters, and T. Strohman, “Wavenet launches in the google assistant.” [Online]. Available: https://deepmind.com/blog/ wavenet-launches-google-assistant/

  20. [20]

    Deep learning for siri’s voice: On-device deep mixture density networks for hybrid unit selection synthesis,

    Siri Team, “Deep learning for siri’s voice: On-device deep mixture density networks for hybrid unit selection synthesis,” 8 2017. [Online]. Available: https://machinelearning.apple.com/2017/08/06/siri-voices.html

  21. [21]

    Microsoft chatbot is taught to swear on twitter

    J. Wakefield, “Microsoft chatbot is taught to swear on twitter.” [Online]. Available: www.bbc.com/news/technology-35890188

  22. [22]

    Google photos labeled black people ’goril- las’

    J. Guynn, “Google photos labeled black people ’goril- las’.” [Online]. Available: https://www.usatoday.com/story/tech/2015/07/01/ google-apologizes-after-photos-identify-black-people-as-gorillas/29567465/

  23. [23]

    Gender shades: Intersectional accuracy disparities in commercial gender classification,

    J. Buolamwini and T. Gebru, “Gender shades: Intersectional accuracy disparities in commercial gender classification,” in Conference on fairness, accountability and transparency, 2018, pp. 77–91

  24. [24]

    A tragic loss

    Tesla Team, “A tragic loss.” [Online]. Available: https://www.tesla.com/en GB/ blog/tragic-loss Bayesian Neural Networks: An Introduction and Survey 39

  25. [25]

    Uber suspends self-driving car tests after vehicle hits and kills woman crossing the street in arizona,

    ABC News, “Uber suspends self-driving car tests after vehicle hits and kills woman crossing the street in arizona,” 2018. [Online]. Available: http://www.abc.net.au/ news/2018-03-20/uber-suspends-self-driving-car-tests-after-fatal-crash/9565586

  26. [26]

    Regulation (eu) 2016/679 of the european parliment and of the council,

    Council of European Union, “Regulation (eu) 2016/679 of the european parliment and of the council,” 2016

  27. [27]

    European union regulations on algorithmic decision- making and a right to explanation,

    B. Goodman and S. Flaxman, “European union regulations on algorithmic decision- making and a right to explanation,” AI magazine, vol. 38, no. 3, pp. 50–57, 2017

  28. [28]

    A shared vision for machine learning in neuroscience,

    M. Vu, T. Adali, D. Ba, G. Buzsaki, D. Carlson, K. Heller, C. Liston, C. Rudin, V. Sohal, A. Widge, H. Mayberg, G. Sapiro, and K. Dzirasa, “A shared vision for machine learning in neuroscience,” JOURNAL OF NEUROSCIENCE, vol. 38, no. 7, pp. 1601–1607, 2018

  29. [29]

    What do we need to build explainable AI systems for the medical domain?

    A. Holzinger, C. Biemann, C. S. Pattichis, and D. B. Kell, “What do we need to build explainable ai systems for the medical domain?” arXiv preprint arXiv:1712.09923 , 2017

  30. [30]

    Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,

    R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2015, pp. 1721–1730

  31. [31]

    Explainable artificial intelligence (xai),

    D. Gunning, “Explainable artificial intelligence (xai),” Defense Advanced Research Projects Agency (DARPA), nd Web , 2017

  32. [32]

    Probable networks and plausible predictionsa review of practical bayesian methods for supervised neural networks,

    D. J. MacKay, “Probable networks and plausible predictionsa review of practical bayesian methods for supervised neural networks,” Network: computation in neural systems, vol. 6, no. 3, pp. 469–505, 1995

  33. [33]

    Bayesian approach for neural networksreview and case studies,

    J. Lampinen and A. Vehtari, “Bayesian approach for neural networksreview and case studies,” Neural networks, vol. 14, no. 3, pp. 257–274, 2001

  34. [34]

    Towards bayesian deep learning: A survey,

    H. Wang and D.-Y. Yeung, “Towards bayesian deep learning: A survey,” arXiv preprint arXiv:1604.01662, 2016

  35. [35]

    Murphey, Machine learning, a probabilistic perspective

    K. Murphey, Machine learning, a probabilistic perspective . Cambridge, MA: MIT Press, 2012

  36. [36]

    Deep sparse rectifier neural networks,

    X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in AISTATS, 2011, pp. 315–323

  37. [37]

    Rectifier nonlinearities improve neural network acoustic models,

    A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in ICML, vol. 30, 2013, p. 3

  38. [38]

    R. M. Neal, Bayesian learning for neural networks . Springer Science & Business Media, 1996, vol. 118

  39. [39]

    Uncertainty in deep learning,

    Y. Gal, “Uncertainty in deep learning,” University of Cambridge , 2016

  40. [40]

    Consistent inference of probabilities in layered networks: predictions and generalizations,

    N. Tishby, E. Levin, and S. A. Solla, “Consistent inference of probabilities in layered networks: predictions and generalizations,” in International 1989 Joint Conference on Neural Networks , 1989, pp. 403–409 vol.2

  41. [41]

    Transforming neural-net output levels to probability distributions,

    J. S. Denker and Y. Lecun, “Transforming neural-net output levels to probability distributions,” in NeurIPS, 1991, pp. 853–859

  42. [42]

    Approximation by superpositions of a sigmoidal function,

    G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathemat- ics of control, signals and systems , vol. 2, no. 4, pp. 303–314, 1989

  43. [43]

    On the approximate realization of continuous mappings by neural networks,

    K.-I. Funahashi, “On the approximate realization of continuous mappings by neural networks,” Neural networks, vol. 2, no. 3, pp. 183–192, 1989

  44. [44]

    Approximation capabilities of multilayer feedforward networks,

    K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991

  45. [45]

    Quantified maximum entropy memsys5 users manual,

    S. F. Gull and J. Skilling, “Quantified maximum entropy memsys5 users manual,” Maximum Entropy Data Consultants Ltd , vol. 33, 1991

  46. [46]

    Bayesian interpolation,

    D. J. MacKay, “Bayesian interpolation,” Neural computation, vol. 4, no. 3, pp. 415– 447, 1992

  47. [47]

    Bayesian methods for adaptive models,

    ——, “Bayesian methods for adaptive models,” Ph.D. dissertation, California Insti- tute of Technology, 1992

  48. [48]

    A practical bayesian framework for backpropagation networks,

    ——, “A practical bayesian framework for backpropagation networks,” Neural com- putation, vol. 4, no. 3, pp. 448–472, 1992. 40 Ethan Goan, Clinton Fookes

  49. [49]

    An introduction to variational methods for graphical models,

    M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, “An introduction to variational methods for graphical models,” Machine learning, vol. 37, no. 2, pp. 183–233, 1999

  50. [50]

    Graphical models, exponential families, and variational inference,

    M. J. Wainwright, M. I. Jordan et al., “Graphical models, exponential families, and variational inference,” Foundations and Trends R⃝ in Machine Learning , vol. 1, no. 1–2, pp. 1–305, 2008

  51. [51]

    Variational inference: A review for statisticians,

    D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational inference: A review for statisticians,” Journal of the American Statistical Association , vol. 112, no. 518, pp. 859–877, 2017

  52. [52]

    Stochastic variational infer- ence,

    M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley, “Stochastic variational infer- ence,” The Journal of Machine Learning Research , vol. 14, no. 1, pp. 1303–1347, 2013

  53. [53]

    Ensemble learning in bayesian neural networks,

    D. Barber and C. M. Bishop, “Ensemble learning in bayesian neural networks,”NATO ASI SERIES F COMPUTER AND SYSTEMS SCIENCES , vol. 168, pp. 215–238, 1998

  54. [54]

    Keeping the neural networks simple by minimizing the description length of the weights,

    G. E. Hinton and D. Van Camp, “Keeping the neural networks simple by minimizing the description length of the weights,” in Proceedings of the sixth annual conference on Computational learning theory . ACM, 1993, pp. 5–13

  55. [55]

    A Conceptual Introduction to Hamiltonian Monte Carlo

    M. Betancourt, “A conceptual introduction to hamiltonian monte carlo,” arXiv preprint arXiv:1701.02434, 2017

  56. [56]

    The geometric founda- tions of hamiltonian monte carlo,

    M. Betancourt, S. Byrne, S. Livingstone, M. Girolami et al., “The geometric founda- tions of hamiltonian monte carlo,” Bernoulli, vol. 23, no. 4A, pp. 2257–2298, 2017

  57. [57]

    Agent-based scientific simula- tion,

    G. Madey, X. Xiang, S. E. Cabaniss, and Y. Huang, “Agent-based scientific simula- tion,” Computing in Science & Engineering , vol. 2, no. 01, pp. 22–29, jan 2005

  58. [58]

    Hybrid monte carlo,

    S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth, “Hybrid monte carlo,” Physics letters B , vol. 195, no. 2, pp. 216–222, 1987

  59. [59]

    Mcmc using hamiltonian dynamics,

    R. M. Neal et al., “Mcmc using hamiltonian dynamics,” Handbook of markov chain monte carlo, vol. 2, no. 11, p. 2, 2011

  60. [60]

    Brooks, A

    S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, Handbook of markov chain monte carlo. CRC press, 2011

  61. [61]

    Bayesian learning via stochastic gradient langevin dynam- ics,

    M. Welling and Y. Teh, “Bayesian learning via stochastic gradient langevin dynam- ics,” Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 681–688, 2011

  62. [62]

    Practical variational inference for neural networks,

    A. Graves, “Practical variational inference for neural networks,” in Advances in Neu- ral Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2011, pp. 2348–2356

  63. [63]

    The variational gaussian approximation revisited,

    M. Opper and C. Archambeau, “The variational gaussian approximation revisited,” Neural computation, vol. 21, no. 3, pp. 786–792, 2009

  64. [64]

    Probabilistic backpropagation for scal- able learning of bayesian neural networks,

    J. M. Hern´ andez-Lobato and R. Adams, “Probabilistic backpropagation for scal- able learning of bayesian neural networks,” in International Conference on Machine Learning, 2015, pp. 1861–1869

  65. [65]

    Variational Bayesian Inference with Stochastic Search

    J. Paisley, D. Blei, and M. Jordan, “Variational bayesian inference with stochastic search,” arXiv preprint arXiv:1206.6430 , 2012

  66. [66]

    Variance reduction techniques for digital simulation,

    J. R. Wilson, “Variance reduction techniques for digital simulation,” American Jour- nal of Mathematical and Management Sciences , vol. 4, no. 3-4, pp. 277–312, 1984

  67. [67]

    The variational gaussian approximation revisited,

    M. Opper and C. Archambeau, “The variational gaussian approximation revisited,” Neural computation, vol. 21 3, pp. 786–92, 2009

  68. [68]

    Auto-Encoding Variational Bayes

    D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013

  69. [69]

    Stochastic backpropagation and ap- proximate inference in deep generative models,

    D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and ap- proximate inference in deep generative models,” in Proceedings of the 31st Interna- tional Conference on Machine Learning (ICML) , 2014, pp. 1278–1286

  70. [70]

    Dropout: A simple way to prevent neural networks from overfitting,

    N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The Jour- nal of Machine Learning Research , vol. 15, no. 1, pp. 1929–1958, 2014. Bayesian Neural Networks: An Introduction and Survey 41

  71. [71]

    Variational dropout and the local reparameterization trick,

    D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” in Advances in Neural Information Processing Systems , 2015, pp. 2575–2583

  72. [72]

    Fast dropout training,

    S. Wang and C. Manning, “Fast dropout training,” in international conference on machine learning, 2013, pp. 118–126

  73. [73]

    Sex, mixability, and modularity,

    A. Livnat, C. Papadimitriou, N. Pippenger, and M. W. Feldman, “Sex, mixability, and modularity,” Proceedings of the National Academy of Sciences , vol. 107, no. 4, pp. 1452–1457, 2010

  74. [74]

    A bayesian approach to on-line learning,

    M. Opper and O. Winther, “A bayesian approach to on-line learning,” On-line learn- ing in neural networks , pp. 363–378, 1998

  75. [75]

    A family of algorithms for approximate bayesian inference,

    T. P. Minka, “A family of algorithms for approximate bayesian inference,” Ph.D. dissertation, Massachusetts Institute of Technology, 2001

  76. [76]

    Weight Uncertainty in Neural Networks

    C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural networks,” arXiv preprint arXiv:1505.05424 , 2015

  77. [77]

    Computing with infinite networks,

    C. K. Williams, “Computing with infinite networks,” in Advances in neural informa- tion processing systems, 1997, pp. 295–301

  78. [78]

    Deep neural networks as gaussian processes,

    J. Lee, J. Sohl-dickstein, J. Pennington, R. Novak, S. Schoenholz, and Y. Bahri, “Deep neural networks as gaussian processes,” in International Conference on Learning Representations, 2018

  79. [79]

    Deep gaussian processes,

    A. Damianou and N. Lawrence, “Deep gaussian processes,” in AISTATS, 2013, pp. 207–215

  80. [80]

    Deep gaussian processes and variational propagation of uncertainty,

    A. Damianou, “Deep gaussian processes and variational propagation of uncertainty,” Ph.D. dissertation, University of Sheffield, 2015

Showing first 80 references.