Recognition: unknown
Practical Bayesian Inference for Speech SNNs: Uncertainty and Loss-Landscape Smoothing
Pith reviewed 2026-05-10 17:36 UTC · model grok-4.3
The pith
Bayesian inference on spiking neural network weights produces smoother predictive landscapes and better uncertainty estimates for speech tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that Bayesian treatment of the weights, implemented via the Improved Variational Online Newton variational approach, improves negative log-likelihood and Brier score performance on speech classification while producing a smoother and more regular predictive landscape than deterministic training, as judged from one-dimensional slices through weight space.
What carries the argument
Improved Variational Online Newton (IVON) variational inference applied to the weights of surrogate-gradient spiking neural networks to counteract angularity in the predictive surface.
Load-bearing premise
One-dimensional slices through weight space are representative of the smoothness and regularity of the full high-dimensional predictive landscape.
What would settle it
A comparison using full high-dimensional landscape metrics, such as curvature estimates or optimization path statistics, that shows no reduction in irregularity for the Bayesian SNNs relative to deterministic baselines.
Figures
read the original abstract
Spiking Neural Networks (SNNs) are naturally suited for speech processing tasks due to their specific dynamics, which allows them to handle temporal data. However, the threshold-based generation of spikes in SNNs intuitively causes an angular or irregular predictive landscape. We explore the effect of using the Bayesian learning approach for the weights on the irregular predictive landscape. For the surrogate-gradient SNNs, we also explore the application of the Improved Variational Online Newton (IVON) approach, which is an efficient variational approach. The performance of the proposed approach is evaluated on the Heidelberg Digits and Speech Commands datasets. The hypothesis is that the Bayesian approach will result in a smoother and more regular predictive landscape, given the angular nature of the deterministic predictive landscape. The experimental evaluation of the proposed approach shows improved performance on the negative log-likelihood and Brier score. Furthermore, the proposed approach has resulted in a smoother and more regular predictive landscape compared to the deterministic approach, based on the one-dimensional slices of the weight space
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript explores Bayesian inference for the weights of surrogate-gradient Spiking Neural Networks (SNNs) on speech tasks, employing the Improved Variational Online Newton (IVON) method as an efficient variational approach. It hypothesizes that this will mitigate the angular irregularities of deterministic SNN predictive landscapes, leading to improved negative log-likelihood and Brier scores on the Heidelberg Digits and Speech Commands datasets, with smoother landscapes evidenced by one-dimensional weight-space slices.
Significance. If substantiated with stronger evidence, the work could offer a scalable route to uncertainty quantification in SNNs for temporal signals, building on IVON's efficiency. The emphasis on landscape smoothing addresses a plausible limitation of spike-based models and may guide more robust training, though the current support for the smoothing claim limits immediate field impact.
major comments (2)
- [Abstract] Abstract: The claim that the Bayesian approach produces a 'smoother and more regular predictive landscape' rests exclusively on one-dimensional slices through weight space. In high-dimensional parameter spaces, single-line slices can miss saddle structures or irregularities orthogonal to the chosen directions, so this evidence does not establish the global regularity asserted in the central hypothesis.
- [Experimental evaluation] Experimental evaluation: No error bars, statistical significance tests, or full protocol (including exact architectures, surrogate functions, and hyperparameter settings) are described for the reported NLL and Brier score improvements. Without these, the performance gains cannot be verified as reliable or reproducible, undermining the empirical support for the Bayesian advantage.
minor comments (2)
- [Abstract] The abstract refers to 'the angular nature of the deterministic predictive landscape' without defining or illustrating the angularity metric, which would help readers understand the baseline irregularity being addressed.
- Notation for IVON and the variational posterior is introduced without an accompanying equation or reference to the original IVON paper, reducing clarity for readers unfamiliar with the method.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the evidential basis for our claims. We provide point-by-point responses below and will incorporate revisions to address the identified issues.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that the Bayesian approach produces a 'smoother and more regular predictive landscape' rests exclusively on one-dimensional slices through weight space. In high-dimensional parameter spaces, single-line slices can miss saddle structures or irregularities orthogonal to the chosen directions, so this evidence does not establish the global regularity asserted in the central hypothesis.
Authors: We acknowledge that one-dimensional slices offer only a limited view and cannot rigorously establish global regularity in high-dimensional spaces. The manuscript already qualifies the smoothing observation as being 'based on the one-dimensional slices of the weight space' and ties it to the observed gains in NLL and Brier score. To avoid any overstatement, we will revise the abstract and discussion to describe the slices as providing supporting, indicative evidence of improved regularity rather than asserting global properties. We will also add a brief caveat on the limitations of low-dimensional projections. revision: partial
-
Referee: [Experimental evaluation] Experimental evaluation: No error bars, statistical significance tests, or full protocol (including exact architectures, surrogate functions, and hyperparameter settings) are described for the reported NLL and Brier score improvements. Without these, the performance gains cannot be verified as reliable or reproducible, undermining the empirical support for the Bayesian advantage.
Authors: We agree that the current draft omits essential details required for reproducibility and statistical validation. In the revised version we will add: (i) error bars or standard deviations computed over multiple independent runs, (ii) appropriate statistical significance tests comparing the Bayesian and deterministic results, and (iii) a complete experimental protocol section (or appendix) specifying architectures, surrogate functions, all hyperparameter values, training schedules, and evaluation procedures. These changes will directly strengthen the empirical claims. revision: yes
Circularity Check
No circularity; purely empirical claims with no derivational reduction
full rationale
The paper advances a hypothesis that Bayesian weight inference smooths the predictive landscape of surrogate-gradient SNNs relative to deterministic training, then supports it via direct experimental comparison on Heidelberg Digits and Speech Commands. Reported metrics (NLL, Brier score) and 1-D weight-space slices are independent observations, not quantities fitted or defined in terms of themselves. No equations, uniqueness theorems, or self-citations appear in the supplied text that would allow any claimed result to reduce to its own inputs by construction. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Abhronil Bhattacharjee et al. 2022. Examining the Robustness of Spiking Neural Networks on Non-Ideal Memristive Crossbars. InGLSVLSI
2022
-
[2]
Alexandre Bittar and Philip N. Garner. 2022. A surrogate gradient spiking baseline for speech command recognition.Frontiers in NeuroscienceVolume 16 - 2022 (2022). doi:10.3389/fnins.2022.865897
-
[3]
Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. Weight Uncertainty in Neural Networks. InInternational Conference on Machine Learning (ICML)
2015
-
[4]
Julian Büchel, Dmitrii Zendrikov, Sergio Solinas, Giacomo Indiveri, and Dylan R. Muir. 2021. Supervised training of spiking neural networks for robust deployment on mixed-signal neuromorphic processors.Scientific Reports(2021)
2021
- [5]
-
[6]
Erik Daxberger, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. 2021. Laplace Redux - Effortless Bayesian Deep Learning. InAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 20089–20103. https://p...
2021
-
[7]
Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. InProceedings of The 33rd , , Abdennadher and Garner Dataset Method Acc(%)↑NLL↓ECE↓Entropy↓MI↑Brier↓ HD Deterministic (MAP) 98.10 0.0553 0.0097 0.0824 – 0.0294 IVON (MC𝑆=20) 99.51 0.0233 0.0103 0.0450 0.0163 0.0105 SC Deterministic ...
2016
-
[8]
Alex Graves. 2011. Practical Variational Inference for Neural Networks. InAdvances in Neural Information Processing Systems, J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger (Eds.), Vol. 24. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2011/file/ 7eb3c8be3d411e8ebfab08eba5f49632-Paper.pdf
2011
-
[9]
Mohammad Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, and Akash Srivastava. 2018. Fast and Scalable Bayesian Deep Learning by Weight- Perturbation in Adam. InProceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 2611–2620. https://...
2018
-
[10]
Mohammad Emtiyaz Khan and Håvard Rue. 2023. The Bayesian learning rule.J. Mach. Learn. Res.24, 1, Article 281 (Jan. 2023), 46 pages
2023
-
[11]
Adam: A Method for Stochastic Optimization
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. InInternational Conference on Learning Representations. https: //arxiv.org/abs/1412.6980
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[12]
Wesley J Maddox, Pavel Izmailov, Timur Garipov, Dmitry P Vetrov, and An- drew Gordon Wilson. 2019. A Simple Baseline for Bayesian Uncertainty in Deep Learning. InAdvances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.n...
2019
-
[13]
Neftci, Hesham Mostafa, and Friedemann Zenke
Emre O. Neftci, Hesham Mostafa, and Friedemann Zenke. 2019. Surrogate Gradi- ent Learning in Spiking Neural Networks: Bringing the Power of Gradient-based optimization to spiking neural networks.IEEE Signal Processing Magazine36 (2019), 51–63. https://api.semanticscholar.org/CorpusID:207941413
2019
-
[14]
Kazuki Osawa, Siddharth Swaroop, Mohammad Emtiyaz Khan, Anirudh Jain, Runa Eschenhagen, Richard E Turner, and Rio Yokota. 2019. Practical Deep Learning with Bayesian Principles. InAdvances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. htt...
2019
-
[15]
Hippolyt Ritter, Aleksandar Botev, and David Barber. 2018. A Scalable Laplace Approximation for Neural Networks. InInternational Conference on Learning Representations. https://openreview.net/forum?id=Skdvd2xAZ
2018
-
[16]
Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, and Thomas Möllenhoff. 2024. Variational learning is effective for large deep networks. InProceedings of the 41st International Conference on Machine Learning (Vienna, Austria)(ICML’24). JMLR.org, Article ...
2024
-
[17]
Nicolas Skatchkovsky, Hyeryung Jang, and Osvaldo Simeone. 2022. Bayesian continual learning via spiking neural networks.Frontiers in Computational NeuroscienceVolume 16 - 2022 (2022). doi:10.3389/fncom.2022.1037976
-
[18]
Pete Warden. 2018. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. (04 2018). doi:10.48550/arXiv.1804.03209
-
[19]
Max Welling and Yee Whye Teh. 2011. Bayesian learning via stochastic gradient langevin dynamics. InProceedings of the 28th International Conference on Interna- tional Conference on Machine Learning(Bellevue, Washington, USA)(ICML’11). Omnipress, Madison, WI, USA, 681–688
2011
-
[20]
Jibin Wu, Emre Yılmaz, Malu Zhang, Haizhou Li, and Kay Chen Tan. 2020. Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition. Frontiers in NeuroscienceVolume 14 - 2020 (2020). doi:10.3389/fnins.2020.00199
-
[21]
Difan Zou, Pan Xu, and Quanquan Gu. 2019. Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction. InAdvances in Neu- ral Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.