arxiv: 2604.08624 · v1 · submitted 2026-04-09 · 💻 cs.LG · cs.AI

Recognition: unknown

Practical Bayesian Inference for Speech SNNs: Uncertainty and Loss-Landscape Smoothing

Yesmine Abdennadher , Philip N. Garner

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:36 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords spiking neural networksBayesian inferencespeech processinguncertainty estimationpredictive landscapevariational inferenceloss landscape smoothing

0 comments

The pith

Bayesian inference on spiking neural network weights produces smoother predictive landscapes and better uncertainty estimates for speech tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether Bayesian modeling of weights in spiking neural networks can reduce the irregular, angular predictive landscapes that arise from threshold-based spike generation during speech processing. It applies an efficient variational method called Improved Variational Online Newton to surrogate-gradient SNNs and measures outcomes on the Heidelberg Digits and Speech Commands datasets. If the hypothesis holds, the approach delivers lower negative log-likelihood and Brier scores while yielding more regular loss surfaces than standard deterministic training.

Core claim

The central claim is that Bayesian treatment of the weights, implemented via the Improved Variational Online Newton variational approach, improves negative log-likelihood and Brier score performance on speech classification while producing a smoother and more regular predictive landscape than deterministic training, as judged from one-dimensional slices through weight space.

What carries the argument

Improved Variational Online Newton (IVON) variational inference applied to the weights of surrogate-gradient spiking neural networks to counteract angularity in the predictive surface.

Load-bearing premise

One-dimensional slices through weight space are representative of the smoothness and regularity of the full high-dimensional predictive landscape.

What would settle it

A comparison using full high-dimensional landscape metrics, such as curvature estimates or optimization path statistics, that shows no reduction in irregularity for the Bayesian SNNs relative to deterministic baselines.

Figures

Figures reproduced from arXiv: 2604.08624 by Philip N. Garner, Yesmine Abdennadher.

**Figure 2.** Figure 2: Local one-dimensional slice of the IVON Bayesian [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Spiking Neural Networks (SNNs) are naturally suited for speech processing tasks due to their specific dynamics, which allows them to handle temporal data. However, the threshold-based generation of spikes in SNNs intuitively causes an angular or irregular predictive landscape. We explore the effect of using the Bayesian learning approach for the weights on the irregular predictive landscape. For the surrogate-gradient SNNs, we also explore the application of the Improved Variational Online Newton (IVON) approach, which is an efficient variational approach. The performance of the proposed approach is evaluated on the Heidelberg Digits and Speech Commands datasets. The hypothesis is that the Bayesian approach will result in a smoother and more regular predictive landscape, given the angular nature of the deterministic predictive landscape. The experimental evaluation of the proposed approach shows improved performance on the negative log-likelihood and Brier score. Furthermore, the proposed approach has resulted in a smoother and more regular predictive landscape compared to the deterministic approach, based on the one-dimensional slices of the weight space

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Bayesian IVON on speech SNNs improves NLL and Brier scores but the smoother-landscape claim rests on 1D weight slices that may not capture high-dimensional behavior.

read the letter

The one or two things to know: this work combines Improved Variational Online Newton with spiking neural networks for speech, showing better negative log-likelihood and Brier scores than deterministic baselines, plus what they describe as a smoother predictive landscape based on one-dimensional weight space slices. On the positive side, the paper takes a real limitation of SNNs—the irregular, angular loss landscapes from spike thresholds—and tries to address it with Bayesian weight inference. Using IVON keeps things efficient, which matters for practical use. The experiments on Heidelberg Digits and Speech Commands datasets give concrete numbers on the metrics, and the hypothesis about smoothing makes intuitive sense given how deterministic SNNs behave. It does well by focusing on a narrow but useful application area. Speech processing benefits from SNNs' temporal handling, and adding uncertainty via variational methods is a step toward more reliable models in that domain. The surrogate gradient approach is standard for training these nets, so layering Bayesian on top is a logical extension. Where it gets soft is the evidence for landscape smoothing. Relying on one-dimensional slices through weight space is a common visualization trick, but it doesn't capture the full high-dimensional picture. Features like saddles or irregularities in orthogonal directions could still be there, so the claim of a more regular predictive landscape needs stronger support, perhaps through aggregated measures or multi-axis analysis. The abstract also skips details on error bars and full experimental protocols, which weakens confidence in the reported improvements. This is the kind of paper that would appeal to specialists in neuromorphic computing or Bayesian deep learning applied to time-series data. A reader already familiar with SNN training would find the IVON integration interesting and might pick up ideas for their own setups. It deserves a serious referee because the topic is timely and the results point to practical gains, even with the limitations in the landscape analysis. Peer review would likely push for better characterization of the smoothing effect and more transparent reporting. I recommend sending it for peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript explores Bayesian inference for the weights of surrogate-gradient Spiking Neural Networks (SNNs) on speech tasks, employing the Improved Variational Online Newton (IVON) method as an efficient variational approach. It hypothesizes that this will mitigate the angular irregularities of deterministic SNN predictive landscapes, leading to improved negative log-likelihood and Brier scores on the Heidelberg Digits and Speech Commands datasets, with smoother landscapes evidenced by one-dimensional weight-space slices.

Significance. If substantiated with stronger evidence, the work could offer a scalable route to uncertainty quantification in SNNs for temporal signals, building on IVON's efficiency. The emphasis on landscape smoothing addresses a plausible limitation of spike-based models and may guide more robust training, though the current support for the smoothing claim limits immediate field impact.

major comments (2)

[Abstract] Abstract: The claim that the Bayesian approach produces a 'smoother and more regular predictive landscape' rests exclusively on one-dimensional slices through weight space. In high-dimensional parameter spaces, single-line slices can miss saddle structures or irregularities orthogonal to the chosen directions, so this evidence does not establish the global regularity asserted in the central hypothesis.
[Experimental evaluation] Experimental evaluation: No error bars, statistical significance tests, or full protocol (including exact architectures, surrogate functions, and hyperparameter settings) are described for the reported NLL and Brier score improvements. Without these, the performance gains cannot be verified as reliable or reproducible, undermining the empirical support for the Bayesian advantage.

minor comments (2)

[Abstract] The abstract refers to 'the angular nature of the deterministic predictive landscape' without defining or illustrating the angularity metric, which would help readers understand the baseline irregularity being addressed.
Notation for IVON and the variational posterior is introduced without an accompanying equation or reference to the original IVON paper, reducing clarity for readers unfamiliar with the method.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the evidential basis for our claims. We provide point-by-point responses below and will incorporate revisions to address the identified issues.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the Bayesian approach produces a 'smoother and more regular predictive landscape' rests exclusively on one-dimensional slices through weight space. In high-dimensional parameter spaces, single-line slices can miss saddle structures or irregularities orthogonal to the chosen directions, so this evidence does not establish the global regularity asserted in the central hypothesis.

Authors: We acknowledge that one-dimensional slices offer only a limited view and cannot rigorously establish global regularity in high-dimensional spaces. The manuscript already qualifies the smoothing observation as being 'based on the one-dimensional slices of the weight space' and ties it to the observed gains in NLL and Brier score. To avoid any overstatement, we will revise the abstract and discussion to describe the slices as providing supporting, indicative evidence of improved regularity rather than asserting global properties. We will also add a brief caveat on the limitations of low-dimensional projections. revision: partial
Referee: [Experimental evaluation] Experimental evaluation: No error bars, statistical significance tests, or full protocol (including exact architectures, surrogate functions, and hyperparameter settings) are described for the reported NLL and Brier score improvements. Without these, the performance gains cannot be verified as reliable or reproducible, undermining the empirical support for the Bayesian advantage.

Authors: We agree that the current draft omits essential details required for reproducibility and statistical validation. In the revised version we will add: (i) error bars or standard deviations computed over multiple independent runs, (ii) appropriate statistical significance tests comparing the Bayesian and deterministic results, and (iii) a complete experimental protocol section (or appendix) specifying architectures, surrogate functions, all hyperparameter values, training schedules, and evaluation procedures. These changes will directly strengthen the empirical claims. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical claims with no derivational reduction

full rationale

The paper advances a hypothesis that Bayesian weight inference smooths the predictive landscape of surrogate-gradient SNNs relative to deterministic training, then supports it via direct experimental comparison on Heidelberg Digits and Speech Commands. Reported metrics (NLL, Brier score) and 1-D weight-space slices are independent observations, not quantities fitted or defined in terms of themselves. No equations, uniqueness theorems, or self-citations appear in the supplied text that would allow any claimed result to reduce to its own inputs by construction. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms or invented entities are stated.

pith-pipeline@v0.9.0 · 5478 in / 966 out tokens · 49816 ms · 2026-05-10T17:36:36.880927+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Abhronil Bhattacharjee et al. 2022. Examining the Robustness of Spiking Neural Networks on Non-Ideal Memristive Crossbars. InGLSVLSI

2022
[2]

Alexandre Bittar and Philip N. Garner. 2022. A surrogate gradient spiking baseline for speech command recognition.Frontiers in NeuroscienceVolume 16 - 2022 (2022). doi:10.3389/fnins.2022.865897

work page doi:10.3389/fnins.2022.865897 2022
[3]

Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. Weight Uncertainty in Neural Networks. InInternational Conference on Machine Learning (ICML)

2015
[4]

Julian Büchel, Dmitrii Zendrikov, Sergio Solinas, Giacomo Indiveri, and Dylan R. Muir. 2021. Supervised training of spiking neural networks for robust deployment on mixed-signal neuromorphic processors.Scientific Reports(2021)

2021
[5]

Uğurcan Çakal, Maryada, Chenxi Wu, Ilkay Ulusoy, and Dylan R. Muir. 2023. Gradient-descent hardware-aware training and deployment for mixed-signal neuromorphic processors.arXiv preprint arXiv:2303.12167(2023)

work page arXiv 2023
[6]

Erik Daxberger, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. 2021. Laplace Redux - Effortless Bayesian Deep Learning. InAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 20089–20103. https://p...

2021
[7]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. InProceedings of The 33rd , , Abdennadher and Garner Dataset Method Acc(%)↑NLL↓ECE↓Entropy↓MI↑Brier↓ HD Deterministic (MAP) 98.10 0.0553 0.0097 0.0824 – 0.0294 IVON (MC𝑆=20) 99.51 0.0233 0.0103 0.0450 0.0163 0.0105 SC Deterministic ...

2016
[8]

Alex Graves. 2011. Practical Variational Inference for Neural Networks. InAdvances in Neural Information Processing Systems, J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger (Eds.), Vol. 24. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2011/file/ 7eb3c8be3d411e8ebfab08eba5f49632-Paper.pdf

2011
[9]

Mohammad Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, and Akash Srivastava. 2018. Fast and Scalable Bayesian Deep Learning by Weight- Perturbation in Adam. InProceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 2611–2620. https://...

2018
[10]

Mohammad Emtiyaz Khan and Håvard Rue. 2023. The Bayesian learning rule.J. Mach. Learn. Res.24, 1, Article 281 (Jan. 2023), 46 pages

2023
[11]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. InInternational Conference on Learning Representations. https: //arxiv.org/abs/1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2015
[12]

Wesley J Maddox, Pavel Izmailov, Timur Garipov, Dmitry P Vetrov, and An- drew Gordon Wilson. 2019. A Simple Baseline for Bayesian Uncertainty in Deep Learning. InAdvances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.n...

2019
[13]

Neftci, Hesham Mostafa, and Friedemann Zenke

Emre O. Neftci, Hesham Mostafa, and Friedemann Zenke. 2019. Surrogate Gradi- ent Learning in Spiking Neural Networks: Bringing the Power of Gradient-based optimization to spiking neural networks.IEEE Signal Processing Magazine36 (2019), 51–63. https://api.semanticscholar.org/CorpusID:207941413

2019
[14]

Kazuki Osawa, Siddharth Swaroop, Mohammad Emtiyaz Khan, Anirudh Jain, Runa Eschenhagen, Richard E Turner, and Rio Yokota. 2019. Practical Deep Learning with Bayesian Principles. InAdvances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. htt...

2019
[15]

Hippolyt Ritter, Aleksandar Botev, and David Barber. 2018. A Scalable Laplace Approximation for Neural Networks. InInternational Conference on Learning Representations. https://openreview.net/forum?id=Skdvd2xAZ

2018
[16]

Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, and Thomas Möllenhoff. 2024. Variational learning is effective for large deep networks. InProceedings of the 41st International Conference on Machine Learning (Vienna, Austria)(ICML’24). JMLR.org, Article ...

2024
[17]

Nicolas Skatchkovsky, Hyeryung Jang, and Osvaldo Simeone. 2022. Bayesian continual learning via spiking neural networks.Frontiers in Computational NeuroscienceVolume 16 - 2022 (2022). doi:10.3389/fncom.2022.1037976

work page doi:10.3389/fncom.2022.1037976 2022
[18]

Pete Warden. 2018. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. (04 2018). doi:10.48550/arXiv.1804.03209

work page Pith review doi:10.48550/arxiv.1804.03209 2018
[19]

Max Welling and Yee Whye Teh. 2011. Bayesian learning via stochastic gradient langevin dynamics. InProceedings of the 28th International Conference on Interna- tional Conference on Machine Learning(Bellevue, Washington, USA)(ICML’11). Omnipress, Madison, WI, USA, 681–688

2011
[20]

Jibin Wu, Emre Yılmaz, Malu Zhang, Haizhou Li, and Kay Chen Tan. 2020. Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition. Frontiers in NeuroscienceVolume 14 - 2020 (2020). doi:10.3389/fnins.2020.00199

work page doi:10.3389/fnins.2020.00199 2020
[21]

Difan Zou, Pan Xu, and Quanquan Gu. 2019. Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction. InAdvances in Neu- ral Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc

2019