pith. sign in

arxiv: 2510.03648 · v2 · submitted 2025-10-04 · 💻 cs.LG

SAFA-SNN: Sparsity-Aware On-Device Few-Shot Class-Incremental Learning with Fast-Adaptive Structure of Spiking Neural Network

Pith reviewed 2026-05-18 10:39 UTC · model grok-4.3

classification 💻 cs.LG
keywords spiking neural networksfew-shot class-incremental learningon-device learningsparsity-aware dynamicscatastrophic forgettingneuromorphic hardwareedge AI
0
0 comments X

The pith

SAFA-SNN enables on-device few-shot class-incremental learning by using spiking neural networks whose threshold regulation preserves base-class knowledge while adapting to new classes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an SNN-based approach called SAFA-SNN for few-shot class-incremental learning that must run directly on edge devices. It introduces sparsity-aware neuronal dynamics that regulate thresholds so most neurons keep stable spikes encoding earlier classes and only a minority adapt to new few-shot examples. This structure, paired with zeroth-order optimization and orthogonal projection of class prototypes, is shown to reduce forgetting and energy use compared with prior methods on standard image and event-based datasets. A reader would care because edge devices need continuous learning that respects privacy and runs within tight power budgets without full retraining.

Core claim

By applying threshold regulation in spiking neurons, most units produce stable spikes that keep synaptic traces of base classes intact while a smaller set produces adaptive spikes for incoming few-shot data; this separation, together with gradient-free optimization and orthogonal subspace projection of prototypes, yields at least 4.01 percent higher accuracy at the final session on Mini-ImageNet and 20 percent lower energy cost on CIFAR-100 than baseline networks.

What carries the argument

Sparsity-aware neuronal dynamics that use threshold regulation to create stable spikes for preserving base-class synaptic traces and adaptive spikes for new classes.

If this is right

  • Edge devices can add new visual categories from only a handful of examples without erasing earlier performance.
  • Neuromorphic chips become practical for privacy-preserving incremental vision because the method lowers energy per inference.
  • Class prototypes become more separable after orthogonal projection, limiting overfitting on the few available samples.
  • Zeroth-order updates bypass the non-differentiable spike function, enabling end-to-end training of the spiking model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same threshold mechanism could be tested on streaming video or sensor data where classes arrive continuously rather than in discrete sessions.
  • If the energy saving holds on actual neuromorphic hardware, the approach may reduce the need for cloud offloading in mobile robotics.
  • Combining the stable-spike preservation with existing rehearsal buffers might further improve long-term retention without extra memory.

Load-bearing premise

Threshold regulation will keep base-class synaptic traces intact without blocking adaptation to new few-shot classes.

What would settle it

Measure accuracy on the original base classes after each new session; a sharp drop comparable to standard networks would show the stable-spike mechanism failed to protect prior knowledge.

Figures

Figures reproduced from arXiv: 2510.03648 by Changze Lv, Di Yu, Huijing Zhang, Linshan Jiang, Muyang Cao, Shuiguang Deng, Xin Du.

Figure 1
Figure 1. Figure 1: Top: the on-Device FSCIL scenario includes three stages: base data training, few-shot [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SAFA-SNN framework include three main components: (a) Training abundant data and [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Accuracy in each session. Datasets and Spiking architecture. We eval￾uate the generalization performance of SAFA￾SNN on two standard benchmark datasets, i.e., CIFAR100 (Krizhevsky et al., 2009) and Mini￾ImageNet (Russakovsky et al., 2015), each split into eight 5-way 5-shot incremental tasks. We also extend experiments on three neuromor￾phic datasets: CIFAR-10-DVS (Li et al., 2017), DVS128gesture (Amir et … view at source ↗
Figure 5
Figure 5. Figure 5: Average Training Time. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation results of SAFA-SNN. 0 1 2 3 4 5 Sessions 50 60 70 80 90 Accuracy (%) (a) DVS128-Gesture 1-Shot 2-Shot 5-Shot 10-Shot 20-Shot 50-Shot 0 1 2 3 4 Sessions 25 30 35 40 45 50 55 (b) CIFAR10-DVS [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of different hyper-parameters on [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Accuracy in each session on different time steps. [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Sparsity-Accuracy on Mini-Imagenet. 0.85 0.86 0.87 0.88 0.89 0.90 Sparsity 0.2 0.3 0.4 0.5 0.6 0.7 Accuracy Spiking-VGG5-T2 Spiking-VGG5-T3 Spiking-VGG9-T4 [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Firing rate of adaptive and non-adaptive neurons. [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
read the original abstract

Continuous learning of novel classes is crucial for edge devices to preserve data privacy and maintain reliable performance in dynamic environments. However, the scenario becomes particularly challenging when data samples are insufficient, requiring on-device few-shot class-incremental learning (FSCIL). Although existing work has explored parameter-efficient FSCIL frameworks based on artificial neural networks (ANNs), their deployment is still fundamentally constrained by limited device resources. Spiking neural networks (SNNs) process spatiotemporal information efficiently, offering lower energy consumption, greater biological plausibility, and compatibility with neuromorphic hardware than ANNs. In this work, we propose an SNN-based method containing Sparsity-Aware neuronal dynamics and Fast Adaptive structure (SAFA-SNN) for on-device FSCIL. By threshold regulation, most neurons exhibit stable spikes and others exhibit adaptive spikes. As a result, synaptic traces that encode base-class knowledge are naturally preserved, thereby alleviating catastrophic forgetting. To cope with spike non-differentiability in backpropagation, we employ a gradient-free technique, i.e., zeroth-order optimization. Moreover, class prototypes can limit overfitting on few-shot data but introduce bias. We enhance prototype discriminability by orthogonal subspace projection. Extensive experiments conducted on two standard benchmark datasets (CIFAR-100 and Mini-ImageNet) and three neuromorphic datasets (CIFAR10-DVS, DVS128 Gesture, and N-Caltech101) demonstrate that SAFA-SNN outperforms baselines, specifically achieving at least 4.01% improvement at the last incremental session on Mini-ImageNet and 20% lower energy cost on CIFAR-100 over baselines with practical implementation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes SAFA-SNN, an SNN-based framework for on-device few-shot class-incremental learning. It introduces sparsity-aware neuronal dynamics via threshold regulation, under which most neurons produce stable spikes to preserve base-class synaptic traces while others adapt to new classes, thereby reducing catastrophic forgetting. Zeroth-order optimization handles spike non-differentiability during backpropagation, and orthogonal subspace projection is used to enhance the discriminability of class prototypes and limit overfitting on few-shot data. Experiments on CIFAR-100, Mini-ImageNet, and three neuromorphic datasets report that SAFA-SNN outperforms baselines, with at least 4.01% improvement at the final incremental session on Mini-ImageNet and 20% lower energy cost on CIFAR-100.

Significance. If the reported gains prove statistically robust, the work would contribute to energy-efficient continual learning on edge devices by exploiting the low-power and neuromorphic compatibility of SNNs. The sparsity-aware dynamics and gradient-free optimization address practical constraints of FSCIL under limited data and compute. Credit is due for evaluating on both conventional and neuromorphic benchmarks and for the explicit energy-cost comparison.

major comments (3)
  1. [Abstract / Sparsity-aware neuronal dynamics] Abstract and method description of sparsity-aware neuronal dynamics: the central mechanism—that threshold regulation produces stable spikes for base-class neurons while permitting adaptation for new few-shot classes without cross-interference—lacks any derivation, stability analysis, or explicit per-neuron selection rule. This assumption is load-bearing for the claim of alleviated catastrophic forgetting yet remains unvalidated mechanistically.
  2. [Experiments] Experimental results section: the reported improvements (e.g., ≥4.01% on Mini-ImageNet last session, 20% energy reduction on CIFAR-100) are presented without error bars, ablation studies, or statistical significance tests. Given that the central claim rests entirely on these empirical outcomes, the absence of robustness verification undermines confidence in the gains.
  3. [Zeroth-order optimization] Zeroth-order optimization subsection: the use of noisy gradient estimates for few-shot incremental updates on limited data risks destabilizing the intended stable-spike regime produced by threshold regulation; no analysis or mitigation of this interaction is provided.
minor comments (2)
  1. [Prototype enhancement] Clarify the precise formulation of the orthogonal subspace projection and its interaction with prototype computation; the current description is high-level.
  2. [Energy evaluation] Include hardware-specific energy measurements or simulation details for the claimed 20% reduction to allow reproducibility on neuromorphic platforms.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment point by point below, indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract / Sparsity-aware neuronal dynamics] Abstract and method description of sparsity-aware neuronal dynamics: the central mechanism—that threshold regulation produces stable spikes for base-class neurons while permitting adaptation for new few-shot classes without cross-interference—lacks any derivation, stability analysis, or explicit per-neuron selection rule. This assumption is load-bearing for the claim of alleviated catastrophic forgetting yet remains unvalidated mechanistically.

    Authors: We agree that the current description would benefit from greater mechanistic rigor. The manuscript introduces threshold regulation to induce stable spikes in most neurons (preserving base-class traces) and adaptive spikes in others, but does not supply a formal derivation or stability analysis. In the revised manuscript we will add a dedicated subsection deriving the threshold update rule from spike-rate statistics, stating the explicit per-neuron selection criterion (neurons whose recent spike rate falls below a learned threshold remain stable), and providing a Lyapunov-style stability argument showing that the separation of stable and adaptive populations limits interference with previously learned synaptic weights. revision: yes

  2. Referee: [Experiments] Experimental results section: the reported improvements (e.g., ≥4.01% on Mini-ImageNet last session, 20% energy reduction on CIFAR-100) are presented without error bars, ablation studies, or statistical significance tests. Given that the central claim rests entirely on these empirical outcomes, the absence of robustness verification undermines confidence in the gains.

    Authors: We concur that the empirical claims require stronger statistical support. The original results report mean improvements without accompanying variability measures or significance testing. We will revise the experimental section to include standard-deviation error bars computed over five independent runs, a full set of ablation studies that isolate the contribution of sparsity-aware dynamics, zeroth-order optimization, and orthogonal subspace projection, and paired t-test p-values confirming that the reported gains (including the 4.01 % final-session improvement on Mini-ImageNet and 20 % energy reduction on CIFAR-100) are statistically significant relative to the strongest baselines. revision: yes

  3. Referee: [Zeroth-order optimization] Zeroth-order optimization subsection: the use of noisy gradient estimates for few-shot incremental updates on limited data risks destabilizing the intended stable-spike regime produced by threshold regulation; no analysis or mitigation of this interaction is provided.

    Authors: The potential interaction between noisy zeroth-order gradients and the stable-spike regime is a legitimate concern not explicitly treated in the current text. While the manuscript motivates zeroth-order optimization solely as a means to handle spike non-differentiability, it does not analyze its effect on the threshold-regulated dynamics. In the revision we will add a short analysis demonstrating that the sparsity-aware threshold mechanism keeps the majority of neurons in a low-variance firing regime, thereby buffering against gradient noise; we will also describe a simple mitigation (adaptive perturbation scale that shrinks with session number) and report new experiments confirming that the combined system preserves the intended stable-spike statistics throughout incremental sessions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained against external benchmarks

full rationale

The paper introduces SAFA-SNN via threshold regulation for sparsity-aware dynamics (preserving base-class traces) and fast-adaptive structure (zeroth-order optimization plus orthogonal projection). These are presented as design choices whose effects are validated through experiments on CIFAR-100, Mini-ImageNet, and neuromorphic datasets, reporting concrete gains such as 4.01% accuracy improvement and 20% energy reduction. No equations, fitted parameters, or self-citations are shown reducing the claimed outcomes to inputs by construction. The central mechanisms are not self-definitional, and results rely on external benchmarks rather than renaming or smuggling prior author work. This is the normal case of an empirical method paper whose claims remain falsifiable outside its own definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unverified premise that threshold regulation preserves old synaptic traces while allowing adaptation, plus the effectiveness of zeroth-order optimization and orthogonal projection for this setting; no explicit free parameters or invented physical entities are named in the abstract.

axioms (1)
  • domain assumption Threshold regulation produces stable spikes for most neurons and adaptive spikes for others that preserve base-class knowledge.
    Invoked to explain alleviation of catastrophic forgetting in the sparsity-aware dynamics section of the abstract.

pith-pipeline@v0.9.0 · 5864 in / 1326 out tokens · 27853 ms · 2026-05-18T10:39:25.846860+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    Few-shot class incremental learning with attention-aware self-adaptive prompt

    Chenxi Liu, Zhenyi Wang, Tianyi Xiong, Ruibo Chen, Yihan Wu, Junfeng Guo, and Heng Huang. Few-shot class incremental learning with attention-aware self-adaptive prompt. InEuropean Con- ference on Computer Vision, pp. 1–18. Springer, 2024a. Qianhui Liu, Jiaqi Yan, Malu Zhang, Gang Pan, and Haizhou Li. Lite-snn: Designing lightweight and efficient spiking n...

  2. [2]

    ALADE-SNN: adaptive logit alignment in dynamically expandable spiking neural networks for class incremental learning

    Wenyao Ni, Jiangrong Shen, Qi Xu, and Huajin Tang. ALADE-SNN: adaptive logit alignment in dynamically expandable spiking neural networks for class incremental learning. In Toby Walsh, Julie Shah, and Zico Kolter (eds.),AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA, ...

  3. [3]

    URLhttps://doi.org/10.1609/ aaai.v39i18.34171

    doi: 10.1609/AAAI.V39I18.34171. URLhttps://doi.org/10.1609/ aaai.v39i18.34171. 11 NVIDIA. Nvidia jetson orin.https://www.nvidia.cn/autonomous-machines/ embedded-systems/jetson-orin/,

  4. [4]

    Hyunseok Oh and Youngki Lee

    Accessed: 2025-08-02. Hyunseok Oh and Youngki Lee. Sign gradient descent-based neuronal dynamics: Ann-to-snn con- version beyond relu network.arXiv preprint arXiv:2407.01645,

  5. [5]

    Quantized spike-driven transformer.arXiv preprint arXiv:2501.13492,

    Xuerui Qiu, Malu Zhang, Jieyuan Zhang, Wenjie Wei, Honglin Cao, Junsheng Guo, Rui-Jie Zhu, Yimeng Shan, Yang Yang, and Haizhou Li. Quantized spike-driven transformer.arXiv preprint arXiv:2501.13492,

  6. [6]

    Qp-snn: Quantized and pruned spiking neural networks.arXiv preprint arXiv:2502.05905,

    Wenjie Wei, Malu Zhang, Zijian Zhou, Ammar Belatreche, Yimeng Shan, Yu Liang, Honglin Cao, Jieyuan Zhang, and Yang Yang. Qp-snn: Quantized and pruned spiking neural networks.arXiv preprint arXiv:2502.05905,

  7. [7]

    Fedlec: Effective federated learning algorithm with spiking neural networks under label skews.arXiv e-prints, pp

    Di Yu, Xin Du, Linshan Jiang, Shunwen Bai, Wentao Tong, and Shuiguang Deng. Fedlec: Effective federated learning algorithm with spiking neural networks under label skews.arXiv e-prints, pp. arXiv–2412, 2024a. Di Yu, Xin Du, Linshan Jiang, Wentao Tong, and Shuiguang Deng. Ec-snn: splitting deep spiking neural networks for edge devices. InProceedings of the...

  8. [8]

    At the non-differentiable pointu= 0, the true gradient does not exist

    +O(bδ 2d2)(16) where ˆ∇g(x)is the ture gradient obeys (Berahas et al., 2022),dis the dimensionality andbis the number of samples. At the non-differentiable pointu= 0, the true gradient does not exist. However, the expected value of the estimation is zero whenzis drawn from a symmetric distribution, so the bias is zero. The key is that the non-zero contrib...

  9. [9]

    14 A.3 PROOF OFZEROTH-ORDER OPTIMIZATION We provide a theoretical proof of the estimation functiong(t)according to literature (Mukhoty et al.,

    is typically used in place of the optimality gap for an online convex cost functionf t as E[ TX t=1 ft(xt)−min x TX t=1 ft(x)](18) Assumption 3.(Unconstrained nonconvex optimization) The convergence is evaluated by the first- order stationary condition in terms of the squared gradient norm for the nonconvex objectivef: 1 T TX t=1 E ∥∇f(x t)∥2 2 (19) Assum...

  10. [10]

    Misclassified to Base classes Ratio

    Then, Ez∼p[g2(u;z, δ)] = d du Ez∼˜p[ch(u+δz)].(23) The probability density function characterizing the standard normal distributionN(0,1)takes the form 1√ 2π exp − z2 2 . Consequently, it is straightforward to obtain Ez∼p[g2(u;z, δ)] = Z ∞ −∞ |z| 2δ 1√ 2π exp − z2 2 dz= 1 δ √ 2π exp − u2 2δ2 (24) Theorem 3.Letg(u)be a surrogate function. Suppose further t...

  11. [11]

    We directly encode spikes with layers of LIF neurons. Each formulated unitSgenerating spikes can be represented as S=SN((BN(CONV(X)))),(27) whereX∈R T×B×C×H×W is the input with time,BN(·)is the batch normalization layer and SN(·)is the LIF neuron model. A.7 BASELINES DESCRIPTION We establish several baselines to better evaluate our proposed framework. To ...

  12. [12]

    CEC, BIDIST, and S3C establish dynamically evolving architectures to effectively support incremental learning

    and ASP (Liu et al., 2024a). CEC, BIDIST, and S3C establish dynamically evolving architectures to effectively support incremental learning. CEC incrementally transforms newly added linear classifiers into a graph-based structure. BIDIST assigns a learnable weightW t to each task and employs bilateral distillation between the representations of current and...

  13. [13]

    These results demonstrate that SAFA-SNN consistently achieves the highest final-session performance, highlighting its robustness and effectiveness. A.9 ENERGYCONSUMPTION According to previous studies (Horowitz, 2014; Yao et al., 2023; Lv et al., 2024), for SNNs, the theoretical energy consumption of layerlcan be calculated as: E(l) =E AC ×SOP s(l)(28) whe...

  14. [14]

    S-k-A” and “S-k-N

    The points feature red dot with black border is the start point in each setting while the points feature green dot with black border is the end point. We can see that both the sparsity and accuracy convergence at high level, showing the promise in sparsity-accuracy trade-off in SAFA-SNN. A.14 VISUALIZATION OF FIRING RATE We visualize the average firing ra...