SAFA-SNN: Sparsity-Aware On-Device Few-Shot Class-Incremental Learning with Fast-Adaptive Structure of Spiking Neural Network
Pith reviewed 2026-05-18 10:39 UTC · model grok-4.3
The pith
SAFA-SNN enables on-device few-shot class-incremental learning by using spiking neural networks whose threshold regulation preserves base-class knowledge while adapting to new classes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By applying threshold regulation in spiking neurons, most units produce stable spikes that keep synaptic traces of base classes intact while a smaller set produces adaptive spikes for incoming few-shot data; this separation, together with gradient-free optimization and orthogonal subspace projection of prototypes, yields at least 4.01 percent higher accuracy at the final session on Mini-ImageNet and 20 percent lower energy cost on CIFAR-100 than baseline networks.
What carries the argument
Sparsity-aware neuronal dynamics that use threshold regulation to create stable spikes for preserving base-class synaptic traces and adaptive spikes for new classes.
If this is right
- Edge devices can add new visual categories from only a handful of examples without erasing earlier performance.
- Neuromorphic chips become practical for privacy-preserving incremental vision because the method lowers energy per inference.
- Class prototypes become more separable after orthogonal projection, limiting overfitting on the few available samples.
- Zeroth-order updates bypass the non-differentiable spike function, enabling end-to-end training of the spiking model.
Where Pith is reading between the lines
- The same threshold mechanism could be tested on streaming video or sensor data where classes arrive continuously rather than in discrete sessions.
- If the energy saving holds on actual neuromorphic hardware, the approach may reduce the need for cloud offloading in mobile robotics.
- Combining the stable-spike preservation with existing rehearsal buffers might further improve long-term retention without extra memory.
Load-bearing premise
Threshold regulation will keep base-class synaptic traces intact without blocking adaptation to new few-shot classes.
What would settle it
Measure accuracy on the original base classes after each new session; a sharp drop comparable to standard networks would show the stable-spike mechanism failed to protect prior knowledge.
Figures
read the original abstract
Continuous learning of novel classes is crucial for edge devices to preserve data privacy and maintain reliable performance in dynamic environments. However, the scenario becomes particularly challenging when data samples are insufficient, requiring on-device few-shot class-incremental learning (FSCIL). Although existing work has explored parameter-efficient FSCIL frameworks based on artificial neural networks (ANNs), their deployment is still fundamentally constrained by limited device resources. Spiking neural networks (SNNs) process spatiotemporal information efficiently, offering lower energy consumption, greater biological plausibility, and compatibility with neuromorphic hardware than ANNs. In this work, we propose an SNN-based method containing Sparsity-Aware neuronal dynamics and Fast Adaptive structure (SAFA-SNN) for on-device FSCIL. By threshold regulation, most neurons exhibit stable spikes and others exhibit adaptive spikes. As a result, synaptic traces that encode base-class knowledge are naturally preserved, thereby alleviating catastrophic forgetting. To cope with spike non-differentiability in backpropagation, we employ a gradient-free technique, i.e., zeroth-order optimization. Moreover, class prototypes can limit overfitting on few-shot data but introduce bias. We enhance prototype discriminability by orthogonal subspace projection. Extensive experiments conducted on two standard benchmark datasets (CIFAR-100 and Mini-ImageNet) and three neuromorphic datasets (CIFAR10-DVS, DVS128 Gesture, and N-Caltech101) demonstrate that SAFA-SNN outperforms baselines, specifically achieving at least 4.01% improvement at the last incremental session on Mini-ImageNet and 20% lower energy cost on CIFAR-100 over baselines with practical implementation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SAFA-SNN, an SNN-based framework for on-device few-shot class-incremental learning. It introduces sparsity-aware neuronal dynamics via threshold regulation, under which most neurons produce stable spikes to preserve base-class synaptic traces while others adapt to new classes, thereby reducing catastrophic forgetting. Zeroth-order optimization handles spike non-differentiability during backpropagation, and orthogonal subspace projection is used to enhance the discriminability of class prototypes and limit overfitting on few-shot data. Experiments on CIFAR-100, Mini-ImageNet, and three neuromorphic datasets report that SAFA-SNN outperforms baselines, with at least 4.01% improvement at the final incremental session on Mini-ImageNet and 20% lower energy cost on CIFAR-100.
Significance. If the reported gains prove statistically robust, the work would contribute to energy-efficient continual learning on edge devices by exploiting the low-power and neuromorphic compatibility of SNNs. The sparsity-aware dynamics and gradient-free optimization address practical constraints of FSCIL under limited data and compute. Credit is due for evaluating on both conventional and neuromorphic benchmarks and for the explicit energy-cost comparison.
major comments (3)
- [Abstract / Sparsity-aware neuronal dynamics] Abstract and method description of sparsity-aware neuronal dynamics: the central mechanism—that threshold regulation produces stable spikes for base-class neurons while permitting adaptation for new few-shot classes without cross-interference—lacks any derivation, stability analysis, or explicit per-neuron selection rule. This assumption is load-bearing for the claim of alleviated catastrophic forgetting yet remains unvalidated mechanistically.
- [Experiments] Experimental results section: the reported improvements (e.g., ≥4.01% on Mini-ImageNet last session, 20% energy reduction on CIFAR-100) are presented without error bars, ablation studies, or statistical significance tests. Given that the central claim rests entirely on these empirical outcomes, the absence of robustness verification undermines confidence in the gains.
- [Zeroth-order optimization] Zeroth-order optimization subsection: the use of noisy gradient estimates for few-shot incremental updates on limited data risks destabilizing the intended stable-spike regime produced by threshold regulation; no analysis or mitigation of this interaction is provided.
minor comments (2)
- [Prototype enhancement] Clarify the precise formulation of the orthogonal subspace projection and its interaction with prototype computation; the current description is high-level.
- [Energy evaluation] Include hardware-specific energy measurements or simulation details for the claimed 20% reduction to allow reproducibility on neuromorphic platforms.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment point by point below, indicating the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract / Sparsity-aware neuronal dynamics] Abstract and method description of sparsity-aware neuronal dynamics: the central mechanism—that threshold regulation produces stable spikes for base-class neurons while permitting adaptation for new few-shot classes without cross-interference—lacks any derivation, stability analysis, or explicit per-neuron selection rule. This assumption is load-bearing for the claim of alleviated catastrophic forgetting yet remains unvalidated mechanistically.
Authors: We agree that the current description would benefit from greater mechanistic rigor. The manuscript introduces threshold regulation to induce stable spikes in most neurons (preserving base-class traces) and adaptive spikes in others, but does not supply a formal derivation or stability analysis. In the revised manuscript we will add a dedicated subsection deriving the threshold update rule from spike-rate statistics, stating the explicit per-neuron selection criterion (neurons whose recent spike rate falls below a learned threshold remain stable), and providing a Lyapunov-style stability argument showing that the separation of stable and adaptive populations limits interference with previously learned synaptic weights. revision: yes
-
Referee: [Experiments] Experimental results section: the reported improvements (e.g., ≥4.01% on Mini-ImageNet last session, 20% energy reduction on CIFAR-100) are presented without error bars, ablation studies, or statistical significance tests. Given that the central claim rests entirely on these empirical outcomes, the absence of robustness verification undermines confidence in the gains.
Authors: We concur that the empirical claims require stronger statistical support. The original results report mean improvements without accompanying variability measures or significance testing. We will revise the experimental section to include standard-deviation error bars computed over five independent runs, a full set of ablation studies that isolate the contribution of sparsity-aware dynamics, zeroth-order optimization, and orthogonal subspace projection, and paired t-test p-values confirming that the reported gains (including the 4.01 % final-session improvement on Mini-ImageNet and 20 % energy reduction on CIFAR-100) are statistically significant relative to the strongest baselines. revision: yes
-
Referee: [Zeroth-order optimization] Zeroth-order optimization subsection: the use of noisy gradient estimates for few-shot incremental updates on limited data risks destabilizing the intended stable-spike regime produced by threshold regulation; no analysis or mitigation of this interaction is provided.
Authors: The potential interaction between noisy zeroth-order gradients and the stable-spike regime is a legitimate concern not explicitly treated in the current text. While the manuscript motivates zeroth-order optimization solely as a means to handle spike non-differentiability, it does not analyze its effect on the threshold-regulated dynamics. In the revision we will add a short analysis demonstrating that the sparsity-aware threshold mechanism keeps the majority of neurons in a low-variance firing regime, thereby buffering against gradient noise; we will also describe a simple mitigation (adaptive perturbation scale that shrinks with session number) and report new experiments confirming that the combined system preserves the intended stable-spike statistics throughout incremental sessions. revision: yes
Circularity Check
No significant circularity; derivation is self-contained against external benchmarks
full rationale
The paper introduces SAFA-SNN via threshold regulation for sparsity-aware dynamics (preserving base-class traces) and fast-adaptive structure (zeroth-order optimization plus orthogonal projection). These are presented as design choices whose effects are validated through experiments on CIFAR-100, Mini-ImageNet, and neuromorphic datasets, reporting concrete gains such as 4.01% accuracy improvement and 20% energy reduction. No equations, fitted parameters, or self-citations are shown reducing the claimed outcomes to inputs by construction. The central mechanisms are not self-definitional, and results rely on external benchmarks rather than renaming or smuggling prior author work. This is the normal case of an empirical method paper whose claims remain falsifiable outside its own definitions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Threshold regulation produces stable spikes for most neurons and adaptive spikes for others that preserve base-class knowledge.
Reference graph
Works this paper leans on
-
[1]
Few-shot class incremental learning with attention-aware self-adaptive prompt
Chenxi Liu, Zhenyi Wang, Tianyi Xiong, Ruibo Chen, Yihan Wu, Junfeng Guo, and Heng Huang. Few-shot class incremental learning with attention-aware self-adaptive prompt. InEuropean Con- ference on Computer Vision, pp. 1–18. Springer, 2024a. Qianhui Liu, Jiaqi Yan, Malu Zhang, Gang Pan, and Haizhou Li. Lite-snn: Designing lightweight and efficient spiking n...
-
[2]
Wenyao Ni, Jiangrong Shen, Qi Xu, and Huajin Tang. ALADE-SNN: adaptive logit alignment in dynamically expandable spiking neural networks for class incremental learning. In Toby Walsh, Julie Shah, and Zico Kolter (eds.),AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA, ...
work page 2025
-
[3]
URLhttps://doi.org/10.1609/ aaai.v39i18.34171
doi: 10.1609/AAAI.V39I18.34171. URLhttps://doi.org/10.1609/ aaai.v39i18.34171. 11 NVIDIA. Nvidia jetson orin.https://www.nvidia.cn/autonomous-machines/ embedded-systems/jetson-orin/,
-
[4]
Accessed: 2025-08-02. Hyunseok Oh and Youngki Lee. Sign gradient descent-based neuronal dynamics: Ann-to-snn con- version beyond relu network.arXiv preprint arXiv:2407.01645,
-
[5]
Quantized spike-driven transformer.arXiv preprint arXiv:2501.13492,
Xuerui Qiu, Malu Zhang, Jieyuan Zhang, Wenjie Wei, Honglin Cao, Junsheng Guo, Rui-Jie Zhu, Yimeng Shan, Yang Yang, and Haizhou Li. Quantized spike-driven transformer.arXiv preprint arXiv:2501.13492,
-
[6]
Qp-snn: Quantized and pruned spiking neural networks.arXiv preprint arXiv:2502.05905,
Wenjie Wei, Malu Zhang, Zijian Zhou, Ammar Belatreche, Yimeng Shan, Yu Liang, Honglin Cao, Jieyuan Zhang, and Yang Yang. Qp-snn: Quantized and pruned spiking neural networks.arXiv preprint arXiv:2502.05905,
-
[7]
Di Yu, Xin Du, Linshan Jiang, Shunwen Bai, Wentao Tong, and Shuiguang Deng. Fedlec: Effective federated learning algorithm with spiking neural networks under label skews.arXiv e-prints, pp. arXiv–2412, 2024a. Di Yu, Xin Du, Linshan Jiang, Wentao Tong, and Shuiguang Deng. Ec-snn: splitting deep spiking neural networks for edge devices. InProceedings of the...
-
[8]
At the non-differentiable pointu= 0, the true gradient does not exist
+O(bδ 2d2)(16) where ˆ∇g(x)is the ture gradient obeys (Berahas et al., 2022),dis the dimensionality andbis the number of samples. At the non-differentiable pointu= 0, the true gradient does not exist. However, the expected value of the estimation is zero whenzis drawn from a symmetric distribution, so the bias is zero. The key is that the non-zero contrib...
work page 2022
-
[9]
is typically used in place of the optimality gap for an online convex cost functionf t as E[ TX t=1 ft(xt)−min x TX t=1 ft(x)](18) Assumption 3.(Unconstrained nonconvex optimization) The convergence is evaluated by the first- order stationary condition in terms of the squared gradient norm for the nonconvex objectivef: 1 T TX t=1 E ∥∇f(x t)∥2 2 (19) Assum...
work page 2016
-
[10]
Misclassified to Base classes Ratio
Then, Ez∼p[g2(u;z, δ)] = d du Ez∼˜p[ch(u+δz)].(23) The probability density function characterizing the standard normal distributionN(0,1)takes the form 1√ 2π exp − z2 2 . Consequently, it is straightforward to obtain Ez∼p[g2(u;z, δ)] = Z ∞ −∞ |z| 2δ 1√ 2π exp − z2 2 dz= 1 δ √ 2π exp − u2 2δ2 (24) Theorem 3.Letg(u)be a surrogate function. Suppose further t...
-
[11]
We directly encode spikes with layers of LIF neurons. Each formulated unitSgenerating spikes can be represented as S=SN((BN(CONV(X)))),(27) whereX∈R T×B×C×H×W is the input with time,BN(·)is the batch normalization layer and SN(·)is the LIF neuron model. A.7 BASELINES DESCRIPTION We establish several baselines to better evaluate our proposed framework. To ...
work page 2021
-
[12]
and ASP (Liu et al., 2024a). CEC, BIDIST, and S3C establish dynamically evolving architectures to effectively support incremental learning. CEC incrementally transforms newly added linear classifiers into a graph-based structure. BIDIST assigns a learnable weightW t to each task and employs bilateral distillation between the representations of current and...
-
[13]
These results demonstrate that SAFA-SNN consistently achieves the highest final-session performance, highlighting its robustness and effectiveness. A.9 ENERGYCONSUMPTION According to previous studies (Horowitz, 2014; Yao et al., 2023; Lv et al., 2024), for SNNs, the theoretical energy consumption of layerlcan be calculated as: E(l) =E AC ×SOP s(l)(28) whe...
work page 2014
-
[14]
The points feature red dot with black border is the start point in each setting while the points feature green dot with black border is the end point. We can see that both the sparsity and accuracy convergence at high level, showing the promise in sparsity-accuracy trade-off in SAFA-SNN. A.14 VISUALIZATION OF FIRING RATE We visualize the average firing ra...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.