Sharpness-Aware Surrogate Training for On-Sensor Spiking Neural Networks

Maximilian Nicholson

arxiv: 2604.09696 · v1 · submitted 2026-04-06 · 💻 cs.NE · cs.CV· cs.LG

Sharpness-Aware Surrogate Training for On-Sensor Spiking Neural Networks

Maximilian Nicholson This is my paper

Pith reviewed 2026-05-10 18:36 UTC · model grok-4.3

classification 💻 cs.NE cs.CVcs.LG

keywords spiking neural networkssurrogate gradientssharpness-aware minimizationon-sensor inferenceevent camerashard spikesneuromorphic computingquantized inference

0 comments

The pith

Sharpness-aware surrogate training narrows the gap between smooth training and hard-spike deployment in on-sensor spiking networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Spiking neural networks suit low-power event-driven vision because they communicate with binary spikes. Training typically relies on smooth surrogate gradients, yet replacing the surrogate with a hard threshold at inference often causes sharp accuracy loss. The paper introduces Sharpness-Aware Surrogate Training, which applies sharpness-aware minimization directly to the surrogate-forward pass so the training objective stays smooth while the gradient remains exact. Under stated contraction assumptions the approach yields state-stability, input-Lipschitz, and smoothness bounds plus a nonconvex convergence result. On event-camera benchmarks the method raises hard-spike accuracy from 65.7% to 94.7% on N-MNIST and from 31.8% to 63.3% on DVS Gesture, with further gains and lower operation counts under integer quantization.

Core claim

The paper claims that applying sharpness-aware minimization to surrogate-gradient training of spiking neural networks produces models whose hard-threshold inference is more accurate, stable, and efficient than standard surrogate training. This is supported by explicit state-stability, input-Lipschitz, and smoothness bounds derived from contraction assumptions, together with a matching nonconvex convergence guarantee. The claim is instantiated by large measured improvements in swap-only hard-spike accuracy on two event-camera datasets and by retained performance under INT8 and INT4 weight quantization with fixed-point membrane potentials.

What carries the argument

Sharpness-Aware Surrogate Training (SAST), which applies sharpness-aware minimization to the surrogate-forward pass of a spiking neural network so the training objective remains differentiable while the final hard-spike behavior improves.

If this is right

Hard-spike accuracy rises substantially on N-MNIST and DVS Gesture benchmarks.
The accuracy gains persist under INT8 and INT4 weight quantization and fixed-point membrane potentials.
Synaptic operation counts decrease in the same hardware-aware simulation settings.
State-stability and input-Lipschitz bounds hold whenever the contraction assumptions are met.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

SAST could be combined with other hardware-specific optimizations such as custom leak factors to further reduce power.
The contraction assumptions may restrict use to feed-forward or mildly recurrent topologies; recurrent or deep architectures might require separate analysis.
The same sharpness-aware principle might transfer to surrogate training in other binary or quantized neural models beyond spiking networks.
Evaluating SAST on additional neuromorphic vision datasets would test whether the observed accuracy and efficiency gains generalize.

Load-bearing premise

The networks and data satisfy explicit contraction assumptions that permit the derivation of state-stability, input-Lipschitz, and smoothness bounds.

What would settle it

Applying SAST to an event-camera dataset where the contraction assumptions are violated and observing that hard-spike accuracy fails to rise above standard surrogate training would falsify the practical value of the bounds and the method's reliability.

Figures

Figures reproduced from arXiv: 2604.09696 by Maximilian Nicholson.

**Figure 2.** Figure 2: (a) Swap-only hard-spike membrane margins [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Spiking neural networks (SNNs) are a natural computational model for on-sensor and near-sensor vision, where event driven processors must operate under strict power budgets with hard binary spikes. However, models trained with surrogate gradients often degrade sharply when the smooth surrogate nonlinearity is replaced by a hard threshold at deployment; a surrogate-to-hard transfer gap that directly limits on-sensor accuracy. We study Sharpness-Aware Surrogate Training (SAST), which applies Sharpness-Aware Minimization (SAM) to a surrogate-forward SNN so that the training objective is smooth and the gradient is exact, and position it as one gap-reduction strategy under the tested settings rather than the only viable mechanism. Under explicit contraction assumptions we provide state-stability, input-Lipschitz, and smoothness bounds, together with a corresponding nonconvex convergence result. On two event-camera benchmarks, swap-only hard-spike accuracy improves from 65.7\% to 94.7\% on N-MNIST and from 31.8\% to 63.3\% on DVS Gesture. Under a hardware-aware inference simulation (INT8/INT4 weight quantization, fixed-point membrane potentials, discrete leak factors), SAST remains strong: on N-MNIST, hard-spike accuracy improves from 47.6\% to 96.9\% (INT8) and from 43.2\% to 81.0\% (INT4), while on DVS Gesture it improves from 25.3\% to 47.6\% (INT8) and from 26.0\% to 43.8\% (INT4). SynOps also decrease under the same hardware-aware setting, including 1734k$\rightarrow$1315k (N-MNIST, INT8) and 86221k$\rightarrow$4323k (DVS Gesture, INT8). These results suggest that SAST is a promising component in a broader toolbox for on-sensor spiking inference under the tested settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SAST applies SAM to surrogate SNN training and reports large hard-spike accuracy lifts on event datasets, but the contraction assumptions behind the theory are not checked on the actual trained models.

read the letter

The paper takes Sharpness-Aware Minimization and slots it into the surrogate-gradient loop for spiking networks aimed at on-sensor use. The headline empirical result is the jump in swap-only hard-spike accuracy: 65.7% to 94.7% on N-MNIST and 31.8% to 63.3% on DVS Gesture, with further gains under INT8/INT4 quantization and lower SynOps in the hardware simulation. That is the part worth paying attention to if you care about closing the surrogate-to-hard gap in real deployment settings. The work also sketches state-stability, Lipschitz, and smoothness bounds plus a nonconvex convergence claim, all under explicit contraction assumptions on the recurrent map. The empirical section is straightforward and the hardware-aware numbers add a practical angle that is not always present in SNN papers. The main soft spot is that the contraction assumptions are stated but never verified on the learned weights or the event inputs used in the experiments. If those conditions do not hold for the trained networks, the bounds and convergence result do not attach to the reported models, so the practical gains stand on their own without the claimed theoretical backing. The abstract also leaves out details on baseline implementations, hyper-parameter search, and statistical testing, though the full text may fill some of that in. This is useful reading for anyone already working on event-camera SNN deployment or surrogate training tricks; the numbers are concrete enough to test quickly. I would send it to peer review because the empirical improvements are large enough to merit referee scrutiny even if the theory section needs more grounding.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Sharpness-Aware Surrogate Training (SAST) for spiking neural networks to reduce the accuracy drop when replacing surrogate nonlinearities with hard spikes at inference. Under explicit contraction assumptions it derives state-stability, input-Lipschitz, and smoothness bounds together with a nonconvex convergence guarantee. On N-MNIST and DVS Gesture event-camera benchmarks it reports large hard-spike accuracy gains (65.7% to 94.7% and 31.8% to 63.3%) that persist under INT8/INT4 quantization and fixed-point arithmetic, accompanied by SynOps reductions.

Significance. If the contraction assumptions hold for the trained models, the work supplies both a theoretical framework for SNN stability under hard spiking and concrete empirical evidence that sharpness-aware training can substantially close the surrogate-to-hard transfer gap while improving hardware efficiency. The combination of nonconvex convergence analysis and hardware-aware simulations on event data is a positive contribution to on-sensor SNN deployment.

major comments (2)

[Theoretical analysis] Theoretical section (contraction-assumption paragraph): The state-stability, input-Lipschitz, and smoothness bounds plus the nonconvex convergence result are stated only under explicit contraction assumptions on the recurrent state map. The manuscript does not verify whether these assumptions are satisfied by the learned weights or the discrete-time dynamics on the actual N-MNIST and DVS Gesture inputs; without such verification the theoretical guarantees do not apply to the networks that produce the reported accuracy numbers.
[Experimental results] Experimental results (accuracy and SynOps tables): The large reported gains (e.g., 65.7%→94.7% on N-MNIST hard-spike accuracy) are presented as direct measurements, yet the text supplies no information on the number of random seeds, statistical significance tests, or the precise hyper-parameter search budget used for both the baseline surrogate training and SAST. This information is load-bearing for interpreting whether the gains are robust or attributable to optimization differences.

minor comments (1)

[Abstract] Abstract: the phrase 'under the tested settings' is repeated; a single clarifying sentence on the scope of the claims would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and outline the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Theoretical analysis] Theoretical section (contraction-assumption paragraph): The state-stability, input-Lipschitz, and smoothness bounds plus the nonconvex convergence result are stated only under explicit contraction assumptions on the recurrent state map. The manuscript does not verify whether these assumptions are satisfied by the learned weights or the discrete-time dynamics on the actual N-MNIST and DVS Gesture inputs; without such verification the theoretical guarantees do not apply to the networks that produce the reported accuracy numbers.

Authors: We appreciate the referee's observation. The theoretical bounds and convergence result are explicitly derived under the stated contraction assumptions on the recurrent state map, which is a standard approach for analyzing stability in discrete-time recurrent systems. The manuscript positions these results as holding under those assumptions rather than claiming they apply unconditionally to every trained network. While the strong empirical performance (stable training dynamics and high hard-spike accuracy without divergence) provides indirect support that the trained models remain in a contractive regime, we acknowledge that direct numerical verification on the learned weights and dataset inputs was not included. In the revised manuscript we will add a short empirical verification subsection that estimates the contraction factor (via spectral norm of the Jacobian of the state map) on the final trained weights using representative N-MNIST and DVS Gesture inputs. This will clarify the applicability of the guarantees to the reported experiments. revision: yes
Referee: [Experimental results] Experimental results (accuracy and SynOps tables): The large reported gains (e.g., 65.7%→94.7% on N-MNIST hard-spike accuracy) are presented as direct measurements, yet the text supplies no information on the number of random seeds, statistical significance tests, or the precise hyper-parameter search budget used for both the baseline surrogate training and SAST. This information is load-bearing for interpreting whether the gains are robust or attributable to optimization differences.

Authors: We agree that reproducibility details are essential. The reported accuracy and SynOps improvements were obtained after systematic hyperparameter tuning for both the baseline and SAST, with multiple random initializations to ensure robustness. However, the original manuscript omitted the precise experimental protocol. In the revision we will expand the experimental section to report: (i) the number of independent random seeds (mean and standard deviation across runs), (ii) statistical significance tests (e.g., paired t-tests with p-values), and (iii) the hyperparameter search budget and procedure (grid/random search ranges for learning rate, SAM rho, surrogate slope, and SNN-specific parameters such as leak and threshold). These additions will demonstrate that the gains are consistent and not attributable to unequal optimization effort. revision: yes

Circularity Check

0 steps flagged

No circularity: theory conditional on explicit assumptions; empirical gains are direct measurements

full rationale

The paper derives state-stability, input-Lipschitz, and smoothness bounds plus a nonconvex convergence result strictly under stated contraction assumptions, without claiming these hold unconditionally or reducing them to fitted quantities. Empirical accuracy improvements (e.g., 65.7% to 94.7% on N-MNIST hard-spike) are reported as direct benchmark measurements under hardware-aware settings, independent of the theory. No self-citations, ansatzes, or renamings are invoked as load-bearing steps that collapse the central claims back to inputs by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, invented entities, or additional axioms beyond the contraction assumptions are described.

axioms (1)

domain assumption explicit contraction assumptions
Invoked to derive state-stability, input-Lipschitz, and smoothness bounds plus nonconvex convergence.

pith-pipeline@v0.9.0 · 5668 in / 1275 out tokens · 30946 ms · 2026-05-10T18:36:01.771261+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

A 128 ×128 120 dB 15µs latency asynchronous temporal contrast vision sensor,

P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128 ×128 120 dB 15µs latency asynchronous temporal contrast vision sensor,”IEEE Journal of Solid-State Circuits, vol. 43, no. 2, pp. 566–576, 2008. 1

work page 2008
[2]

Event-based vision: A survey,

G. Gallegoet al., “Event-based vision: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 154–180, 2022

work page 2022
[3]

Loihi: A neuromorphic manycore processor with on-chip learning,

M. Davieset al., “Loihi: A neuromorphic manycore processor with on-chip learning,”IEEE Micro, vol. 38, no. 1, pp. 82–99, 2018

work page 2018
[4]

A million spiking-neuron integrated circuit with a scalable communication network and interface,

P. A. Merollaet al., “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science, vol. 345, no. 6197, pp. 668–673, 2014. 1

work page 2014
[5]

Networks of spiking neurons: The third gener- ation of neural network models,

W. Maass, “Networks of spiking neurons: The third gener- ation of neural network models,”Neural Networks, vol. 10, no. 9, pp. 1659–1671, 1997. 1

work page 1997
[6]

Memory and information process- ing in neuromorphic systems,

G. Indiveri and S.-C. Liu, “Memory and information process- ing in neuromorphic systems,”Proceedings of the IEEE, vol. 103, no. 8, pp. 1379–1397, 2015

work page 2015
[7]

Towards spike-based ma- chine intelligence with neuromorphic computing,

K. Roy, A. Jaiswal, and P. Panda, “Towards spike-based ma- chine intelligence with neuromorphic computing,”Nature, vol. 575, no. 7784, pp. 607–617, 2019. 1

work page 2019
[8]

Surrogate gradient learning in spiking neural networks,

E. O. Neftci, H. Mostafa, and F. Zenke, “Surrogate gradient learning in spiking neural networks,”IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 51–63, 2019. 1, 2

work page 2019
[9]

SuperSpike: Supervised learning in multilayer spiking neural networks,

F. Zenke and S. Ganguli, “SuperSpike: Supervised learning in multilayer spiking neural networks,”Neural Computation, vol. 30, no. 6, pp. 1514–1541, 2018

work page 2018
[10]

Long short-term memory and learning-to-learn in networks of spiking neurons,

G. Bellec, D. Salaj, A. Subramoney, R. Legenstein, and W. Maass, “Long short-term memory and learning-to-learn in networks of spiking neurons,” inAdvances in Neural Infor- mation Processing Systems (NeurIPS), 2018

work page 2018
[11]

SLAYER: Spike layer error reassignment in time,

S. B. Shrestha and G. Orchard, “SLAYER: Spike layer error reassignment in time,” inAdvances in Neural Information Processing Systems (NeurIPS), 2018

work page 2018
[12]

Spatio-temporal backpropagation for training high-performance spiking neural networks,

Y . Wu, L. Deng, G. Li, J. Zhu, and L. Shi, “Spatio-temporal backpropagation for training high-performance spiking neural networks,”Frontiers in Neuroscience, vol. 12, p. 331, 2018. 2

work page 2018
[13]

Training spiking neural networks using lessons from deep learning,

J. K. Eshraghianet al., “Training spiking neural networks using lessons from deep learning,”Proceedings of the IEEE, vol. 111, no. 9, pp. 1016–1054, 2023. 1, 2

work page 2023
[14]

Sharpness-aware minimization for efficiently improving gen- eralization,

P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving gen- eralization,” inProceedings of ICLR, 2021. 1, 2, 3

work page 2021
[15]

ASAM: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks,

J. Kwon, J. Kim, H. Park, and I. K. Choi, “ASAM: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks,” inProceedings of ICML, 2021. 1, 2

work page 2021
[16]

Con- verting static image datasets to spiking neuromorphic datasets using saccades,

G. Orchard, A. Jayawant, G. K. Cohen, and N. Thakor, “Con- verting static image datasets to spiking neuromorphic datasets using saccades,”Frontiers in Neuroscience, vol. 9, p. 437,

work page
[17]

A low power, fully event-based gesture recogni- tion system,

A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak, A. Andreopoulos, G. Garreau, M. Mendoza, J. Kusnitz, M. Debole, S. Esser, T. Delbruck, M. Flickner, and D. Modha, “A low power, fully event-based gesture recogni- tion system,” inProceedings of CVPR, 2017, pp. 7243–7252. 1, 3

work page 2017
[18]

Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing,

P. U. Diehl, D. Neil, J. Binas, M. Cook, S.-C. Liu, and M. Pfeiffer, “Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing,” inPro- ceedings of IJCNN, 2015. 2

work page 2015
[19]

RMP-SNN: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network,

B. Han, G. Srinivasan, and K. Roy, “RMP-SNN: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network,” inProceedings of CVPR, 2020

work page 2020
[20]

Opti- mal ANN-SNN conversion for high-accuracy and ultra-low- latency spiking neural networks,

T. Bu, W. Fang, J. Ding, P. Dai, Z. Yu, and T. Huang, “Opti- mal ANN-SNN conversion for high-accuracy and ultra-low- latency spiking neural networks,” inProceedings of ICLR,

work page
[21]

A free lunch from ANN: Towards efficient, accurate spiking neural net- works calibration,

Y . Li, S. Deng, X. Dong, R. Gong, and S. Gu, “A free lunch from ANN: Towards efficient, accurate spiking neural net- works calibration,” inProceedings of ICML, 2021. 2

work page 2021
[22]

Going deeper in spiking neural networks: VGG and residual archi- tectures,

A. Sengupta, Y . Ye, R. Wang, C. Liu, and K. Roy, “Going deeper in spiking neural networks: VGG and residual archi- tectures,”Frontiers in Neuroscience, vol. 13, p. 95, 2019. 2

work page 2019
[23]

Sharpness-aware surrogate training for spik- ing neural networks,

M. Nicholson, “Sharpness-aware surrogate training for spik- ing neural networks,”arXiv preprint arXiv:2603.18039, 2026. 2

work page arXiv 2026
[24]

How does sharpness-aware mini- mization minimize sharpness?

K. Wen, T. Ma, and Z. Li, “How does sharpness-aware mini- mization minimize sharpness?” inProceedings of ICLR, 2023. 3 5

work page 2023

[1] [1]

A 128 ×128 120 dB 15µs latency asynchronous temporal contrast vision sensor,

P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128 ×128 120 dB 15µs latency asynchronous temporal contrast vision sensor,”IEEE Journal of Solid-State Circuits, vol. 43, no. 2, pp. 566–576, 2008. 1

work page 2008

[2] [2]

Event-based vision: A survey,

G. Gallegoet al., “Event-based vision: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 154–180, 2022

work page 2022

[3] [3]

Loihi: A neuromorphic manycore processor with on-chip learning,

M. Davieset al., “Loihi: A neuromorphic manycore processor with on-chip learning,”IEEE Micro, vol. 38, no. 1, pp. 82–99, 2018

work page 2018

[4] [4]

A million spiking-neuron integrated circuit with a scalable communication network and interface,

P. A. Merollaet al., “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science, vol. 345, no. 6197, pp. 668–673, 2014. 1

work page 2014

[5] [5]

Networks of spiking neurons: The third gener- ation of neural network models,

W. Maass, “Networks of spiking neurons: The third gener- ation of neural network models,”Neural Networks, vol. 10, no. 9, pp. 1659–1671, 1997. 1

work page 1997

[6] [6]

Memory and information process- ing in neuromorphic systems,

G. Indiveri and S.-C. Liu, “Memory and information process- ing in neuromorphic systems,”Proceedings of the IEEE, vol. 103, no. 8, pp. 1379–1397, 2015

work page 2015

[7] [7]

Towards spike-based ma- chine intelligence with neuromorphic computing,

K. Roy, A. Jaiswal, and P. Panda, “Towards spike-based ma- chine intelligence with neuromorphic computing,”Nature, vol. 575, no. 7784, pp. 607–617, 2019. 1

work page 2019

[8] [8]

Surrogate gradient learning in spiking neural networks,

E. O. Neftci, H. Mostafa, and F. Zenke, “Surrogate gradient learning in spiking neural networks,”IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 51–63, 2019. 1, 2

work page 2019

[9] [9]

SuperSpike: Supervised learning in multilayer spiking neural networks,

F. Zenke and S. Ganguli, “SuperSpike: Supervised learning in multilayer spiking neural networks,”Neural Computation, vol. 30, no. 6, pp. 1514–1541, 2018

work page 2018

[10] [10]

Long short-term memory and learning-to-learn in networks of spiking neurons,

G. Bellec, D. Salaj, A. Subramoney, R. Legenstein, and W. Maass, “Long short-term memory and learning-to-learn in networks of spiking neurons,” inAdvances in Neural Infor- mation Processing Systems (NeurIPS), 2018

work page 2018

[11] [11]

SLAYER: Spike layer error reassignment in time,

S. B. Shrestha and G. Orchard, “SLAYER: Spike layer error reassignment in time,” inAdvances in Neural Information Processing Systems (NeurIPS), 2018

work page 2018

[12] [12]

Spatio-temporal backpropagation for training high-performance spiking neural networks,

Y . Wu, L. Deng, G. Li, J. Zhu, and L. Shi, “Spatio-temporal backpropagation for training high-performance spiking neural networks,”Frontiers in Neuroscience, vol. 12, p. 331, 2018. 2

work page 2018

[13] [13]

Training spiking neural networks using lessons from deep learning,

J. K. Eshraghianet al., “Training spiking neural networks using lessons from deep learning,”Proceedings of the IEEE, vol. 111, no. 9, pp. 1016–1054, 2023. 1, 2

work page 2023

[14] [14]

Sharpness-aware minimization for efficiently improving gen- eralization,

P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving gen- eralization,” inProceedings of ICLR, 2021. 1, 2, 3

work page 2021

[15] [15]

ASAM: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks,

J. Kwon, J. Kim, H. Park, and I. K. Choi, “ASAM: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks,” inProceedings of ICML, 2021. 1, 2

work page 2021

[16] [16]

Con- verting static image datasets to spiking neuromorphic datasets using saccades,

G. Orchard, A. Jayawant, G. K. Cohen, and N. Thakor, “Con- verting static image datasets to spiking neuromorphic datasets using saccades,”Frontiers in Neuroscience, vol. 9, p. 437,

work page

[17] [17]

A low power, fully event-based gesture recogni- tion system,

A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak, A. Andreopoulos, G. Garreau, M. Mendoza, J. Kusnitz, M. Debole, S. Esser, T. Delbruck, M. Flickner, and D. Modha, “A low power, fully event-based gesture recogni- tion system,” inProceedings of CVPR, 2017, pp. 7243–7252. 1, 3

work page 2017

[18] [18]

Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing,

P. U. Diehl, D. Neil, J. Binas, M. Cook, S.-C. Liu, and M. Pfeiffer, “Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing,” inPro- ceedings of IJCNN, 2015. 2

work page 2015

[19] [19]

RMP-SNN: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network,

B. Han, G. Srinivasan, and K. Roy, “RMP-SNN: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network,” inProceedings of CVPR, 2020

work page 2020

[20] [20]

Opti- mal ANN-SNN conversion for high-accuracy and ultra-low- latency spiking neural networks,

T. Bu, W. Fang, J. Ding, P. Dai, Z. Yu, and T. Huang, “Opti- mal ANN-SNN conversion for high-accuracy and ultra-low- latency spiking neural networks,” inProceedings of ICLR,

work page

[21] [21]

A free lunch from ANN: Towards efficient, accurate spiking neural net- works calibration,

Y . Li, S. Deng, X. Dong, R. Gong, and S. Gu, “A free lunch from ANN: Towards efficient, accurate spiking neural net- works calibration,” inProceedings of ICML, 2021. 2

work page 2021

[22] [22]

Going deeper in spiking neural networks: VGG and residual archi- tectures,

A. Sengupta, Y . Ye, R. Wang, C. Liu, and K. Roy, “Going deeper in spiking neural networks: VGG and residual archi- tectures,”Frontiers in Neuroscience, vol. 13, p. 95, 2019. 2

work page 2019

[23] [23]

Sharpness-aware surrogate training for spik- ing neural networks,

M. Nicholson, “Sharpness-aware surrogate training for spik- ing neural networks,”arXiv preprint arXiv:2603.18039, 2026. 2

work page arXiv 2026

[24] [24]

How does sharpness-aware mini- mization minimize sharpness?

K. Wen, T. Ma, and Z. Li, “How does sharpness-aware mini- mization minimize sharpness?” inProceedings of ICLR, 2023. 3 5

work page 2023