Sharpness-Aware Surrogate Training for On-Sensor Spiking Neural Networks
Pith reviewed 2026-05-10 18:36 UTC · model grok-4.3
The pith
Sharpness-aware surrogate training narrows the gap between smooth training and hard-spike deployment in on-sensor spiking networks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that applying sharpness-aware minimization to surrogate-gradient training of spiking neural networks produces models whose hard-threshold inference is more accurate, stable, and efficient than standard surrogate training. This is supported by explicit state-stability, input-Lipschitz, and smoothness bounds derived from contraction assumptions, together with a matching nonconvex convergence guarantee. The claim is instantiated by large measured improvements in swap-only hard-spike accuracy on two event-camera datasets and by retained performance under INT8 and INT4 weight quantization with fixed-point membrane potentials.
What carries the argument
Sharpness-Aware Surrogate Training (SAST), which applies sharpness-aware minimization to the surrogate-forward pass of a spiking neural network so the training objective remains differentiable while the final hard-spike behavior improves.
If this is right
- Hard-spike accuracy rises substantially on N-MNIST and DVS Gesture benchmarks.
- The accuracy gains persist under INT8 and INT4 weight quantization and fixed-point membrane potentials.
- Synaptic operation counts decrease in the same hardware-aware simulation settings.
- State-stability and input-Lipschitz bounds hold whenever the contraction assumptions are met.
Where Pith is reading between the lines
- SAST could be combined with other hardware-specific optimizations such as custom leak factors to further reduce power.
- The contraction assumptions may restrict use to feed-forward or mildly recurrent topologies; recurrent or deep architectures might require separate analysis.
- The same sharpness-aware principle might transfer to surrogate training in other binary or quantized neural models beyond spiking networks.
- Evaluating SAST on additional neuromorphic vision datasets would test whether the observed accuracy and efficiency gains generalize.
Load-bearing premise
The networks and data satisfy explicit contraction assumptions that permit the derivation of state-stability, input-Lipschitz, and smoothness bounds.
What would settle it
Applying SAST to an event-camera dataset where the contraction assumptions are violated and observing that hard-spike accuracy fails to rise above standard surrogate training would falsify the practical value of the bounds and the method's reliability.
Figures
read the original abstract
Spiking neural networks (SNNs) are a natural computational model for on-sensor and near-sensor vision, where event driven processors must operate under strict power budgets with hard binary spikes. However, models trained with surrogate gradients often degrade sharply when the smooth surrogate nonlinearity is replaced by a hard threshold at deployment; a surrogate-to-hard transfer gap that directly limits on-sensor accuracy. We study Sharpness-Aware Surrogate Training (SAST), which applies Sharpness-Aware Minimization (SAM) to a surrogate-forward SNN so that the training objective is smooth and the gradient is exact, and position it as one gap-reduction strategy under the tested settings rather than the only viable mechanism. Under explicit contraction assumptions we provide state-stability, input-Lipschitz, and smoothness bounds, together with a corresponding nonconvex convergence result. On two event-camera benchmarks, swap-only hard-spike accuracy improves from 65.7\% to 94.7\% on N-MNIST and from 31.8\% to 63.3\% on DVS Gesture. Under a hardware-aware inference simulation (INT8/INT4 weight quantization, fixed-point membrane potentials, discrete leak factors), SAST remains strong: on N-MNIST, hard-spike accuracy improves from 47.6\% to 96.9\% (INT8) and from 43.2\% to 81.0\% (INT4), while on DVS Gesture it improves from 25.3\% to 47.6\% (INT8) and from 26.0\% to 43.8\% (INT4). SynOps also decrease under the same hardware-aware setting, including 1734k$\rightarrow$1315k (N-MNIST, INT8) and 86221k$\rightarrow$4323k (DVS Gesture, INT8). These results suggest that SAST is a promising component in a broader toolbox for on-sensor spiking inference under the tested settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Sharpness-Aware Surrogate Training (SAST) for spiking neural networks to reduce the accuracy drop when replacing surrogate nonlinearities with hard spikes at inference. Under explicit contraction assumptions it derives state-stability, input-Lipschitz, and smoothness bounds together with a nonconvex convergence guarantee. On N-MNIST and DVS Gesture event-camera benchmarks it reports large hard-spike accuracy gains (65.7% to 94.7% and 31.8% to 63.3%) that persist under INT8/INT4 quantization and fixed-point arithmetic, accompanied by SynOps reductions.
Significance. If the contraction assumptions hold for the trained models, the work supplies both a theoretical framework for SNN stability under hard spiking and concrete empirical evidence that sharpness-aware training can substantially close the surrogate-to-hard transfer gap while improving hardware efficiency. The combination of nonconvex convergence analysis and hardware-aware simulations on event data is a positive contribution to on-sensor SNN deployment.
major comments (2)
- [Theoretical analysis] Theoretical section (contraction-assumption paragraph): The state-stability, input-Lipschitz, and smoothness bounds plus the nonconvex convergence result are stated only under explicit contraction assumptions on the recurrent state map. The manuscript does not verify whether these assumptions are satisfied by the learned weights or the discrete-time dynamics on the actual N-MNIST and DVS Gesture inputs; without such verification the theoretical guarantees do not apply to the networks that produce the reported accuracy numbers.
- [Experimental results] Experimental results (accuracy and SynOps tables): The large reported gains (e.g., 65.7%→94.7% on N-MNIST hard-spike accuracy) are presented as direct measurements, yet the text supplies no information on the number of random seeds, statistical significance tests, or the precise hyper-parameter search budget used for both the baseline surrogate training and SAST. This information is load-bearing for interpreting whether the gains are robust or attributable to optimization differences.
minor comments (1)
- [Abstract] Abstract: the phrase 'under the tested settings' is repeated; a single clarifying sentence on the scope of the claims would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and outline the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: [Theoretical analysis] Theoretical section (contraction-assumption paragraph): The state-stability, input-Lipschitz, and smoothness bounds plus the nonconvex convergence result are stated only under explicit contraction assumptions on the recurrent state map. The manuscript does not verify whether these assumptions are satisfied by the learned weights or the discrete-time dynamics on the actual N-MNIST and DVS Gesture inputs; without such verification the theoretical guarantees do not apply to the networks that produce the reported accuracy numbers.
Authors: We appreciate the referee's observation. The theoretical bounds and convergence result are explicitly derived under the stated contraction assumptions on the recurrent state map, which is a standard approach for analyzing stability in discrete-time recurrent systems. The manuscript positions these results as holding under those assumptions rather than claiming they apply unconditionally to every trained network. While the strong empirical performance (stable training dynamics and high hard-spike accuracy without divergence) provides indirect support that the trained models remain in a contractive regime, we acknowledge that direct numerical verification on the learned weights and dataset inputs was not included. In the revised manuscript we will add a short empirical verification subsection that estimates the contraction factor (via spectral norm of the Jacobian of the state map) on the final trained weights using representative N-MNIST and DVS Gesture inputs. This will clarify the applicability of the guarantees to the reported experiments. revision: yes
-
Referee: [Experimental results] Experimental results (accuracy and SynOps tables): The large reported gains (e.g., 65.7%→94.7% on N-MNIST hard-spike accuracy) are presented as direct measurements, yet the text supplies no information on the number of random seeds, statistical significance tests, or the precise hyper-parameter search budget used for both the baseline surrogate training and SAST. This information is load-bearing for interpreting whether the gains are robust or attributable to optimization differences.
Authors: We agree that reproducibility details are essential. The reported accuracy and SynOps improvements were obtained after systematic hyperparameter tuning for both the baseline and SAST, with multiple random initializations to ensure robustness. However, the original manuscript omitted the precise experimental protocol. In the revision we will expand the experimental section to report: (i) the number of independent random seeds (mean and standard deviation across runs), (ii) statistical significance tests (e.g., paired t-tests with p-values), and (iii) the hyperparameter search budget and procedure (grid/random search ranges for learning rate, SAM rho, surrogate slope, and SNN-specific parameters such as leak and threshold). These additions will demonstrate that the gains are consistent and not attributable to unequal optimization effort. revision: yes
Circularity Check
No circularity: theory conditional on explicit assumptions; empirical gains are direct measurements
full rationale
The paper derives state-stability, input-Lipschitz, and smoothness bounds plus a nonconvex convergence result strictly under stated contraction assumptions, without claiming these hold unconditionally or reducing them to fitted quantities. Empirical accuracy improvements (e.g., 65.7% to 94.7% on N-MNIST hard-spike) are reported as direct benchmark measurements under hardware-aware settings, independent of the theory. No self-citations, ansatzes, or renamings are invoked as load-bearing steps that collapse the central claims back to inputs by construction. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption explicit contraction assumptions
Reference graph
Works this paper leans on
-
[1]
A 128 ×128 120 dB 15µs latency asynchronous temporal contrast vision sensor,
P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128 ×128 120 dB 15µs latency asynchronous temporal contrast vision sensor,”IEEE Journal of Solid-State Circuits, vol. 43, no. 2, pp. 566–576, 2008. 1
work page 2008
-
[2]
G. Gallegoet al., “Event-based vision: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 154–180, 2022
work page 2022
-
[3]
Loihi: A neuromorphic manycore processor with on-chip learning,
M. Davieset al., “Loihi: A neuromorphic manycore processor with on-chip learning,”IEEE Micro, vol. 38, no. 1, pp. 82–99, 2018
work page 2018
-
[4]
A million spiking-neuron integrated circuit with a scalable communication network and interface,
P. A. Merollaet al., “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science, vol. 345, no. 6197, pp. 668–673, 2014. 1
work page 2014
-
[5]
Networks of spiking neurons: The third gener- ation of neural network models,
W. Maass, “Networks of spiking neurons: The third gener- ation of neural network models,”Neural Networks, vol. 10, no. 9, pp. 1659–1671, 1997. 1
work page 1997
-
[6]
Memory and information process- ing in neuromorphic systems,
G. Indiveri and S.-C. Liu, “Memory and information process- ing in neuromorphic systems,”Proceedings of the IEEE, vol. 103, no. 8, pp. 1379–1397, 2015
work page 2015
-
[7]
Towards spike-based ma- chine intelligence with neuromorphic computing,
K. Roy, A. Jaiswal, and P. Panda, “Towards spike-based ma- chine intelligence with neuromorphic computing,”Nature, vol. 575, no. 7784, pp. 607–617, 2019. 1
work page 2019
-
[8]
Surrogate gradient learning in spiking neural networks,
E. O. Neftci, H. Mostafa, and F. Zenke, “Surrogate gradient learning in spiking neural networks,”IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 51–63, 2019. 1, 2
work page 2019
-
[9]
SuperSpike: Supervised learning in multilayer spiking neural networks,
F. Zenke and S. Ganguli, “SuperSpike: Supervised learning in multilayer spiking neural networks,”Neural Computation, vol. 30, no. 6, pp. 1514–1541, 2018
work page 2018
-
[10]
Long short-term memory and learning-to-learn in networks of spiking neurons,
G. Bellec, D. Salaj, A. Subramoney, R. Legenstein, and W. Maass, “Long short-term memory and learning-to-learn in networks of spiking neurons,” inAdvances in Neural Infor- mation Processing Systems (NeurIPS), 2018
work page 2018
-
[11]
SLAYER: Spike layer error reassignment in time,
S. B. Shrestha and G. Orchard, “SLAYER: Spike layer error reassignment in time,” inAdvances in Neural Information Processing Systems (NeurIPS), 2018
work page 2018
-
[12]
Spatio-temporal backpropagation for training high-performance spiking neural networks,
Y . Wu, L. Deng, G. Li, J. Zhu, and L. Shi, “Spatio-temporal backpropagation for training high-performance spiking neural networks,”Frontiers in Neuroscience, vol. 12, p. 331, 2018. 2
work page 2018
-
[13]
Training spiking neural networks using lessons from deep learning,
J. K. Eshraghianet al., “Training spiking neural networks using lessons from deep learning,”Proceedings of the IEEE, vol. 111, no. 9, pp. 1016–1054, 2023. 1, 2
work page 2023
-
[14]
Sharpness-aware minimization for efficiently improving gen- eralization,
P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving gen- eralization,” inProceedings of ICLR, 2021. 1, 2, 3
work page 2021
-
[15]
ASAM: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks,
J. Kwon, J. Kim, H. Park, and I. K. Choi, “ASAM: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks,” inProceedings of ICML, 2021. 1, 2
work page 2021
-
[16]
Con- verting static image datasets to spiking neuromorphic datasets using saccades,
G. Orchard, A. Jayawant, G. K. Cohen, and N. Thakor, “Con- verting static image datasets to spiking neuromorphic datasets using saccades,”Frontiers in Neuroscience, vol. 9, p. 437,
-
[17]
A low power, fully event-based gesture recogni- tion system,
A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak, A. Andreopoulos, G. Garreau, M. Mendoza, J. Kusnitz, M. Debole, S. Esser, T. Delbruck, M. Flickner, and D. Modha, “A low power, fully event-based gesture recogni- tion system,” inProceedings of CVPR, 2017, pp. 7243–7252. 1, 3
work page 2017
-
[18]
Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing,
P. U. Diehl, D. Neil, J. Binas, M. Cook, S.-C. Liu, and M. Pfeiffer, “Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing,” inPro- ceedings of IJCNN, 2015. 2
work page 2015
-
[19]
B. Han, G. Srinivasan, and K. Roy, “RMP-SNN: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network,” inProceedings of CVPR, 2020
work page 2020
-
[20]
Opti- mal ANN-SNN conversion for high-accuracy and ultra-low- latency spiking neural networks,
T. Bu, W. Fang, J. Ding, P. Dai, Z. Yu, and T. Huang, “Opti- mal ANN-SNN conversion for high-accuracy and ultra-low- latency spiking neural networks,” inProceedings of ICLR,
-
[21]
A free lunch from ANN: Towards efficient, accurate spiking neural net- works calibration,
Y . Li, S. Deng, X. Dong, R. Gong, and S. Gu, “A free lunch from ANN: Towards efficient, accurate spiking neural net- works calibration,” inProceedings of ICML, 2021. 2
work page 2021
-
[22]
Going deeper in spiking neural networks: VGG and residual archi- tectures,
A. Sengupta, Y . Ye, R. Wang, C. Liu, and K. Roy, “Going deeper in spiking neural networks: VGG and residual archi- tectures,”Frontiers in Neuroscience, vol. 13, p. 95, 2019. 2
work page 2019
-
[23]
Sharpness-aware surrogate training for spik- ing neural networks,
M. Nicholson, “Sharpness-aware surrogate training for spik- ing neural networks,”arXiv preprint arXiv:2603.18039, 2026. 2
-
[24]
How does sharpness-aware mini- mization minimize sharpness?
K. Wen, T. Ma, and Z. Li, “How does sharpness-aware mini- mization minimize sharpness?” inProceedings of ICLR, 2023. 3 5
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.