BSViT: A Burst Spiking Vision Transformer for Expressive and Efficient Visual Representation Learning

Dewei Bai; Hong Qu; Hongxiang Peng

arxiv: 2604.23165 · v1 · submitted 2026-04-25 · 💻 cs.CV

BSViT: A Burst Spiking Vision Transformer for Expressive and Efficient Visual Representation Learning

Hongxiang Peng , Dewei Bai , Hong Qu This is my paper

Pith reviewed 2026-05-08 08:31 UTC · model grok-4.3

classification 💻 cs.CV

keywords burst spikingspiking vision transformerself-attention mechanismneuromorphic hardwareenergy efficiencyevent-based visionvisual representationpatch masking

0 comments

The pith

Burst spikes and dual-channel attention improve accuracy in spiking vision transformers without sacrificing energy efficiency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to overcome the limited capacity of binary spike representations and the high cost of global attention in spiking vision transformers. It does so by introducing a mechanism that uses burst spikes for keys, binary for queries, and dual excitatory-inhibitory channels for values, combined with local patch masking. This keeps all operations as simple additions suitable for neuromorphic chips. If successful, it would make spiking neural networks more competitive for practical visual tasks on low-power hardware. Sympathetic readers care because current spiking models trade too much accuracy for their efficiency gains.

Core claim

BSViT features a Dual-Channel Burst Spiking Self-Attention where queries use binary spikes, keys use burst spikes to boost capacity, and values use dual binary channels for signed interactions. The design adds patch adjacency masking to limit attention to local areas for sparsity and incorporates burst coding throughout the model. Experiments show it surpasses other spiking transformers in accuracy on both standard image and event-driven vision datasets while matching their energy efficiency.

What carries the argument

The Dual-Channel Burst Spiking Self-Attention (DBSSA) that separates spike types across query, key, and value paths to enable richer interactions using only additions.

Load-bearing premise

The assumption that assigning binary spikes to queries, burst spikes to keys, and dual channels to values will meaningfully expand representational capacity and spike interactions while remaining strictly addition-based.

What would settle it

A direct comparison experiment on a vision benchmark like CIFAR-10 or DVS Gesture where BSViT accuracy falls short of or energy exceeds that of a conventional binary spiking transformer.

Figures

Figures reproduced from arXiv: 2604.23165 by Dewei Bai, Hong Qu, Hongxiang Peng.

**Figure 1.** Figure 1: Concept of the Spiking Self-Attention(SSA) and our Dual-channel Burst Spiking Self-Attention(DBSSA). (a) is the vanilla Spiking Self-Attention, only using binary spike matrix to calculate attention map. (b) is our DBSSA mechanism that introduces a burst spiking coded K to increase information capacity and a dual-channel V to capture both excitatory and inhibitory features while keeps the whole process add… view at source ↗

**Figure 2.** Figure 2: The overview of BSViT. W(ℓ+1)S (ℓ) burst[t] = Sburst (ℓ) X [t] k=1 W(ℓ+1) , (3) where Vθ denotes the interval between consecutive membrane potential thresholds, and n is the maximum allowed burst level. Sburst[t] ∈ 0, 1, . . . , n thus encodes the number of spikes emitted at timestep t. Conceptually related to the integer spike formulation in I-LIF [31], we convert integer values to binary values additio… view at source ↗

**Figure 3.** Figure 3: The neighbors of each patch in an image. neuron. This excitatory-inhibitory dynamic acts as a critical filtering mechanism that actively suppresses redundant attention scores, thereby significantly improving the signal-to-noise ratio in the aggregated attention maps. The formulations are as follows: Q = SN binary (BN(XWQ)), (12) K = SN burst (BN(XWK)), (13) V + = SN binary (BN(XWV + )), (14) V − = −SN bin… view at source ↗

read the original abstract

Spiking Vision Transformers (S-ViTs) offer a promising framework for energy-efficient visual learning. However, existing designs remain limited by two fundamental issues: the restricted information capacity of binary spike coding and the dense token interactions introduced by global self-attention. To address these challenges, this work proposes BSViT, a burst spiking-driven Vision Transformer featuring a Dual-Channel Burst Spiking Self-Attention (DBSSA) mechanism. DBSSA encodes queries with binary spikes and keys with burst spikes to enhance representational capacity. The value pathway adopts dual excitatory and inhibitory binary channels, enabling signed modulation and richer spike interactions. Importantly, the entire attention operation preserves addition-only computation, ensuring compatibility with energy-efficient neuromorphic hardware. To further reduce spike activity and incorporate spatial priors, a patch adjacency masking strategy is introduced to restrict attention to local neighborhoods, resulting in structure-aware sparsity and reduced computational overhead. In addition, burst spike coding is systematically integrated across the network to increase spike-level representational capacity beyond conventional binary spiking. Extensive experiments on both static and event-based vision benchmarks demonstrate that BSViT consistently outperforms existing spiking Transformers in accuracy while maintaining competitive energy efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BSViT adds burst keys and dual excitatory/inhibitory value channels to spiking ViTs with local masking, but the addition-only claim looks hard to square with temporal accumulation and signed modulation.

read the letter

The main new pieces are the DBSSA module that splits query, key, and value encoding (binary spikes for queries, bursts for keys, dual channels for values) and the patch adjacency mask that restricts attention to local neighborhoods. Burst coding is also applied throughout the network rather than just at the input. These choices directly target the low capacity of binary spikes and the cost of global attention in earlier spiking transformers while trying to stay neuromorphic-friendly.

Referee Report

1 major / 2 minor

Summary. The paper introduces BSViT, a burst spiking Vision Transformer featuring Dual-Channel Burst Spiking Self-Attention (DBSSA). Queries use binary spikes, keys use burst spikes, and values employ dual excitatory/inhibitory binary channels to boost representational capacity and spike interactions. The design claims to preserve strictly addition-only attention computation for neuromorphic hardware compatibility, augments this with patch-adjacency masking for local sparsity, and integrates burst coding network-wide. Experiments on static and event-based vision benchmarks are said to show consistent accuracy gains over prior spiking Transformers while retaining competitive energy efficiency.

Significance. If the empirical gains and addition-only property are verified, the work would meaningfully advance energy-efficient spiking vision models by addressing binary-coding capacity limits and global-attention density without sacrificing neuromorphic compatibility. The dual-channel and burst mechanisms, together with experiments spanning both static and event-based datasets, represent a concrete step toward richer yet hardware-friendly SNN representations.

major comments (1)

[DBSSA mechanism] The claim that DBSSA preserves addition-only computation (central to the energy-efficiency and neuromorphic-compatibility assertions) requires explicit verification. Burst encoding of keys inherently requires temporal accumulation, and dual excitatory/inhibitory channels for values typically introduce signed operations. The manuscript should supply the precise spike-interaction equations or circuit mapping (e.g., in the DBSSA definition) showing that no counting, scaling, or subtraction primitives are used.

minor comments (2)

[Abstract] The abstract states performance claims without any numerical results, baselines, or error bars; including at least headline metrics would strengthen immediate assessment.
Clarify the precise definition and temporal window used for burst spikes versus standard rate coding, and how patch-adjacency masking interacts with the attention mask in implementation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation of BSViT and for the constructive comment on the DBSSA mechanism. We address the concern directly below and will revise the manuscript accordingly to strengthen the verification of the addition-only property.

read point-by-point responses

Referee: [DBSSA mechanism] The claim that DBSSA preserves addition-only computation (central to the energy-efficiency and neuromorphic-compatibility assertions) requires explicit verification. Burst encoding of keys inherently requires temporal accumulation, and dual excitatory/inhibitory channels for values typically introduce signed operations. The manuscript should supply the precise spike-interaction equations or circuit mapping (e.g., in the DBSSA definition) showing that no counting, scaling, or subtraction primitives are used.

Authors: We appreciate this comment, which correctly identifies the need for more explicit verification to support the neuromorphic-compatibility claims. In the current manuscript, DBSSA is defined such that query-key interactions use binary spike queries and temporally accumulated burst keys, with all accumulation performed via successive additions to spike counters (no explicit counting or scaling operators). The dual excitatory/inhibitory value channels are realized as two independent binary spike streams whose contributions are summed separately before a final rate-based readout; the signed modulation emerges from the opposing spike polarities without introducing subtraction in the attention arithmetic itself. Nevertheless, we agree that the presentation would benefit from greater clarity. In the revised manuscript we will add the full set of spike-interaction equations together with a neuromorphic circuit mapping (new figure) that demonstrates every operation reduces to addition, thereby confirming the absence of counting, scaling, or subtraction primitives. revision: yes

Circularity Check

0 steps flagged

No circularity; central claims are empirical architecture proposals validated by experiments

full rationale

The paper introduces BSViT as a novel architecture with DBSSA (binary-spike queries, burst-spike keys, dual excitatory/inhibitory value channels) plus patch-adjacency masking, then reports benchmark results showing accuracy gains at competitive energy. No equations, fitted parameters, or derivations are presented that reduce by construction to the inputs; the addition-only compatibility and representational-capacity claims are architectural assertions tested empirically rather than proven via self-referential math or self-citation chains. The derivation chain is therefore self-contained as an engineering proposal plus external validation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The proposal rests on standard assumptions from the spiking-neural-network literature and introduces new architectural components whose independent validation is not provided in the abstract.

axioms (2)

domain assumption Spiking neural networks can perform visual representation learning with substantially lower energy than conventional networks
Implicit background assumption for all S-ViT work referenced in the abstract.
domain assumption Addition-only arithmetic is compatible with neuromorphic hardware implementations
Explicitly stated as a design goal for the attention operation.

invented entities (2)

Dual-Channel Burst Spiking Self-Attention (DBSSA) no independent evidence
purpose: To encode richer spike interactions via burst keys and signed dual-channel values while remaining addition-only
New attention block introduced by the paper
Burst spike coding integrated across the network no independent evidence
purpose: To raise spike-level representational capacity beyond binary spikes
Systematic integration claimed as a core contribution

pith-pipeline@v0.9.0 · 5507 in / 1407 out tokens · 49782 ms · 2026-05-08T08:31:41.534091+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 1 internal anchor

[1]

IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems34(10), 1537–1557 (2015)

Akopyan, F., Sawada, J., Cassidy, A., Alvarez-Icaza, R., Arthur, J., Merolla, P., Imam, N., Nakamura, Y., Datta, P., Nam, G.J.: Truenorth: Design and tool flow of 14 Hongxiang Peng, Dewei Bai, and Hong Qu() a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems34(10), 1537–15...

work page 2015
[2]

PloS one12(8), e0181773 (2017)

Bittner, S.R., Williamson, R.C., Snyder, A.C., Litwin-Kumar, A., Doiron, B., Chase, S.M., Smith, M.A., Yu, B.M.: Population activity structure of excitatory and inhibitory neurons. PloS one12(8), e0181773 (2017)

work page 2017
[3]

In: Proc

Bu, T., Fang, W., Ding, J., DAI, P., Yu, Z., Huang, T.: Optimal ann-snn conversion for high-accuracy and ultra-low-latency spiking neural networks. In: Proc. of ICLR (2022)

work page 2022
[4]

In: Proc

Chu, X., Tian, Z., Wang, Y., Zhang, B., Ren, H., Wei, X., Xia, H., Shen, C.: Twins: Revisiting the design of spatial attention in vision transformers. In: Proc. of NeurIPS. vol. 34, pp. 9355–9366 (2021)

work page 2021
[5]

Trends in Neurosciences13(3), 99–104 (1990)

Connors, B.W., Gutnick, M.J.: Intrinsic firing patterns of diverse neocortical neu- rons. Trends in Neurosciences13(3), 99–104 (1990)

work page 1990
[6]

In: Proc

Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: Randaugment: Practical automated data augmentation with a reduced search space. In: Proc. of CVPR. pp. 702–703 (2020)

work page 2020
[7]

IEEE/ACM International Symposium on Microarchitecture38(1), 82–99 (2018)

Davies, M., Srinivasa, N., Lin, T.H., Chinya, G., Cao, Y., Choday, S.H., Dimou, G., Joshi, P., Imam, N., Jain, S.: Loihi: A neuromorphic manycore processor with on- chip learning. IEEE/ACM International Symposium on Microarchitecture38(1), 82–99 (2018)

work page 2018
[8]

In: Proc

Dehghani, M., Djolonga, J., Mustafa, B., Padlewski, P., Heek, J., Gilmer, J., Steiner, A.P., Caron, M., Geirhos, R., Alabdulmohsin, I., et al.: Scaling vision transformers to 22 billion parameters. In: Proc. of ICML. pp. 7480–7512. PMLR (2023)

work page 2023
[9]

In: Proc

Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: Proc. of CVPR. pp. 248–255 (2009)

work page 2009
[10]

In: Proc

Deng, S., Gu, S.: Optimal conversion of conventional artificial neural networks to spiking neural networks. In: Proc. of ICLR (2021)

work page 2021
[11]

In: Proc

Deng, S., Li, Y., Zhang, S., Gu, S.: Temporal efficient training of spiking neural network via gradient re-weighting. In: Proc. of ICLR (2022)

work page 2022
[12]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review arXiv 2010
[13]

In: Proc

Fang, W., Yu, Z., Chen, Y., Huang, T., Masquelier, T., Tian, Y.: Deep residual learning in spiking neural networks. In: Proc. of NeurIPS. vol. 34, pp. 21056–21069 (2021)

work page 2021
[15]

In: Proc

Fang, W., Yu, Z., Chen, Y., Masquelier, T., Huang, T., Tian, Y.: Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In: Proc. of ICCV. pp. 2661–2671 (2021)

work page 2021
[16]

arXiv preprint arXiv:2210.06386 (2022)

Feng, L., Liu, Q., Tang, H., Ma, D., Pan, G.: Multi-level firing with spiking ds- resnet: Enabling better and deeper directly-trained spiking neural networks. arXiv preprint arXiv:2210.06386 (2022)

work page arXiv 2022
[17]

In: Proc

Guo, Y., Chen, Y., Liu, X., Peng, W., Zhang, Y., Huang, X., Ma, Z.: Ternary Spike: Learning ternary spikes for spiking neural networks. In: Proc. of AAAI. vol. 38, pp. 12244–12252 (2024) BSViT: A Burst Spiking Vision Transformer 15

work page 2024
[18]

In: Proc

Guo, Y., Liu, X., Chen, Y., Peng, W., Zhang, Y., Ma, Z.: Spiking transformer: Introducing accurate addition-only spiking self-attention for transformer. In: Proc. of CVPR. pp. 24398–24408 (2025)

work page 2025
[19]

In: Proc

Hassani, A., Walton, S., Li, J., Li, S., Shi, H.: Neighborhood attention transformer. In: Proc. of CVPR. pp. 6185–6194 (2023)

work page 2023
[20]

IEEE Transactions on Neural Networks and Learning Systems34(8), 5200–5205 (2021)

Hu, Y., Tang, H., Pan, G.: Spiking deep residual networks. IEEE Transactions on Neural Networks and Learning Systems34(8), 5200–5205 (2021)

work page 2021
[21]

IEEE transactions on neural networks and learning systems36(2), 2353–2367 (2024)

Hu, Y., Deng, L., Wu, Y., Yao, M., Li, G.: Advancing spiking neural networks toward deep residual learning. IEEE transactions on neural networks and learning systems36(2), 2353–2367 (2024)

work page 2024
[22]

Trends in Neuro- sciences26(3), 161–167 (2003)

Izhikevich, E.M., Desai, N.S., Walcott, E.C., Hoppensteadt, F.C.: Bursts as a unit of neural information: Selective communication via resonance. Trends in Neuro- sciences26(3), 161–167 (2003)

work page 2003
[23]

Kipf, T.N., Welling, M.: Variational graph auto-encoders (2016)

work page 2016
[24]

Krizhevsky, A., Nair, V., Hinton, G.: CIFAR-10 Dataset (2009), canadian Institute for Advanced Research

work page 2009
[25]

Frontiers in Neu- roscience14, 497482 (2020)

Lee, C., Sarwar, S.S., Panda, P., Srinivasan, G., Roy, K.: Enabling spike-based backpropagation for training deep neural network architectures. Frontiers in Neu- roscience14, 497482 (2020)

work page 2020
[26]

In: Proc

Lee, D., Li, Y., Kim, Y., Xiao, S., Panda, P.: Spiking transformer with spatial- temporal attention. In: Proc. of CVPR. pp. 13948–13958 (2025)

work page 2025
[27]

Frontiers in Neuroscience11(2017)

Li, H., Liu, H., Ji, X., Li, G., Shi, L.: CIFAR10-DVS: An Event-Stream Dataset for Object Classification. Frontiers in Neuroscience11(2017)

work page 2017
[28]

In: Proc

Li, Y., Deng, S., Dong, X., Gong, R., Gu, S.: A free lunch from ANN: Towards efficient, accurate spiking neural networks calibration. In: Proc. of ICML. pp. 6316– 6325 (2021)

work page 2021
[29]

In: Proc

Li, Y., Guo, Y., Zhang, S., Deng, S., Hai, Y., Gu, S.: Differentiable spike: Rethink- ing gradient-descent for training spiking neural networks. In: Proc. of NeurIPS. vol. 34, pp. 23426–23439 (2021)

work page 2021
[30]

In: Proc

Li, Y., Kim, Y., Park, H., Geller, T., Panda, P.: Neuromorphic data augmentation for training spiking neural networks. In: Proc. of ECCV. pp. 631–649. Springer (2022)

work page 2022
[31]

In: Proc

Luo, X., Yao, M., Chou, Y., Xu, B., Li, G.: Integer-valued training and spike-driven inference spiking neural network for high-performance and energy-efficient object detection. In: Proc. of ECCV. pp. 253–272. Springer (2024)

work page 2024
[32]

Neural networks10(9), 1659–1671 (1997)

Maass, W.: Networks of spiking neurons: the third generation of neural network models. Neural networks10(9), 1659–1671 (1997)

work page 1997
[33]

In: Proc

Meng, Q., Xiao, M., Yan, S., Wang, Y., Lin, Z., Luo, Z.Q.: Training high- performance low-latency spiking neural networks by differentiation on spike repre- sentation. In: Proc. of CVPR. pp. 12444–12453 (2022)

work page 2022
[34]

In: Proc

Min, E., Rong, Y., Xu, T., Bian, Y., Luo, D., Lin, K., Huang, J., Ananiadou, S., Zhao, P.: Neighbour interaction based click-through rate prediction via graph- masked transformer. In: Proc. of SIGIR. pp. 353–362 (2022)

work page 2022
[35]

Nature572(7767), 106–111 (2019)

Pei, J., Deng, L., Song, S., Zhao, M., Zhang, Y., Wu, S., Wang, G., Zou, Z., Wu, Z., He, W., et al.: Towards artificial general intelligence with hybrid tianjic chip architecture. Nature572(7767), 106–111 (2019)

work page 2019
[36]

Rathi, N., Roy, K.: Diet-snn: Direct input encoding with leakage and threshold optimization in deep spiking neural networks (2020)

work page 2020
[37]

In: Proc

Rathi, N., Srinivasan, G., Panda, P., Roy, K.: Enabling deep spiking neural net- works with hybrid conversion and spike timing dependent backpropagation. In: Proc. of ICLR (2020) 16 Hongxiang Peng, Dewei Bai, and Hong Qu()

work page 2020
[38]

Nature575(7784), 607–617 (2019)

Roy, K., Jaiswal, A., Panda, P.: Towards spike-based machine intelligence with neuromorphic computing. Nature575(7784), 607–617 (2019)

work page 2019
[39]

Frontiers in Neuroscience13, 95 (2019)

Sengupta, A., Ye, Y., Wang, R., Liu, C., Roy, K.: Going deeper in spiking neural networks: VGG and residual architectures. Frontiers in Neuroscience13, 95 (2019)

work page 2019
[40]

In: Proc

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Proc. of NeurIPS. vol. 30 (2017)

work page 2017
[41]

Biophysical journal12(1), 1–24 (1972)

Wilson, H.R., Cowan, J.D.: Excitatory and inhibitory interactions in localized pop- ulations of model neurons. Biophysical journal12(1), 1–24 (1972)

work page 1972
[42]

Frontiers in Neuroscience12, 331 (2018)

Wu, Y., Deng, L., Li, G., Zhu, J., Shi, L.: Spatio-temporal backpropagation for training high-performance spiking neural networks. Frontiers in Neuroscience12, 331 (2018)

work page 2018
[43]

In: Proc

Yao, M., Hu, J., Zhou, Z., Yuan, L., Tian, Y., Xu, B., Li, G.: Spike-driven trans- former. In: Proc. of NeurIPS. vol. 36, pp. 64043–64058 (2023)

work page 2023
[44]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

Yao, M., Qiu, X., Hu, T., Hu, J., Chou, Y., Tian, K., Liao, J., Leng, L., Xu, B., Li, G.: Scaling spike-driven transformer with efficient spike firing approximation training. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

work page 2025
[45]

In: Proc

Zheng, H., Wu, Y., Deng, L., Hu, Y., Li, G.: Going deeper with directly-trained larger spiking neural networks. In: Proc. of AAAI. vol. 35, pp. 11062–11070 (2021)

work page 2021
[46]

In: Proceedings of the AAAI conference on artificial intelligence

Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmenta- tion. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 13001–13008 (2020)

work page 2020
[47]

Zhou, C., Yu, L., Zhou, Z., Ma, Z., Zhang, H., Zhou, H., Tian, Y.: Spikingformer: Spike-driven residual learning for transformer-based spiking neural network (2023)

work page 2023
[48]

Zhou, Z., Che, K., Fang, W., Tian, K., Zhu, Y., Yan, S., Tian, Y., Yuan, L.: Spikformer v2: Join the high accuracy club on imagenet with an snn ticket (2024)

work page 2024
[49]

In: The Eleventh Proc

Zhou, Z., Zhu, Y., He, C., Wang, Y., YAN, S., Tian, Y., Yuan, L.: Spikformer: When spiking neural network meets transformer. In: The Eleventh Proc. of ICLR (2023) BSViT: A Burst Spiking Vision Transformer 17 A Preliminaries A.1 Spiking Neuron Model Spiking neurons are the fundamental computational units in Spiking Neural Net- works (SNNs), enabling event-...

work page 2023

[1] [1]

IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems34(10), 1537–1557 (2015)

Akopyan, F., Sawada, J., Cassidy, A., Alvarez-Icaza, R., Arthur, J., Merolla, P., Imam, N., Nakamura, Y., Datta, P., Nam, G.J.: Truenorth: Design and tool flow of 14 Hongxiang Peng, Dewei Bai, and Hong Qu() a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems34(10), 1537–15...

work page 2015

[2] [2]

PloS one12(8), e0181773 (2017)

Bittner, S.R., Williamson, R.C., Snyder, A.C., Litwin-Kumar, A., Doiron, B., Chase, S.M., Smith, M.A., Yu, B.M.: Population activity structure of excitatory and inhibitory neurons. PloS one12(8), e0181773 (2017)

work page 2017

[3] [3]

In: Proc

Bu, T., Fang, W., Ding, J., DAI, P., Yu, Z., Huang, T.: Optimal ann-snn conversion for high-accuracy and ultra-low-latency spiking neural networks. In: Proc. of ICLR (2022)

work page 2022

[4] [4]

In: Proc

Chu, X., Tian, Z., Wang, Y., Zhang, B., Ren, H., Wei, X., Xia, H., Shen, C.: Twins: Revisiting the design of spatial attention in vision transformers. In: Proc. of NeurIPS. vol. 34, pp. 9355–9366 (2021)

work page 2021

[5] [5]

Trends in Neurosciences13(3), 99–104 (1990)

Connors, B.W., Gutnick, M.J.: Intrinsic firing patterns of diverse neocortical neu- rons. Trends in Neurosciences13(3), 99–104 (1990)

work page 1990

[6] [6]

In: Proc

Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: Randaugment: Practical automated data augmentation with a reduced search space. In: Proc. of CVPR. pp. 702–703 (2020)

work page 2020

[7] [7]

IEEE/ACM International Symposium on Microarchitecture38(1), 82–99 (2018)

Davies, M., Srinivasa, N., Lin, T.H., Chinya, G., Cao, Y., Choday, S.H., Dimou, G., Joshi, P., Imam, N., Jain, S.: Loihi: A neuromorphic manycore processor with on- chip learning. IEEE/ACM International Symposium on Microarchitecture38(1), 82–99 (2018)

work page 2018

[8] [8]

In: Proc

Dehghani, M., Djolonga, J., Mustafa, B., Padlewski, P., Heek, J., Gilmer, J., Steiner, A.P., Caron, M., Geirhos, R., Alabdulmohsin, I., et al.: Scaling vision transformers to 22 billion parameters. In: Proc. of ICML. pp. 7480–7512. PMLR (2023)

work page 2023

[9] [9]

In: Proc

Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: Proc. of CVPR. pp. 248–255 (2009)

work page 2009

[10] [10]

In: Proc

Deng, S., Gu, S.: Optimal conversion of conventional artificial neural networks to spiking neural networks. In: Proc. of ICLR (2021)

work page 2021

[11] [11]

In: Proc

Deng, S., Li, Y., Zhang, S., Gu, S.: Temporal efficient training of spiking neural network via gradient re-weighting. In: Proc. of ICLR (2022)

work page 2022

[12] [12]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review arXiv 2010

[13] [13]

In: Proc

Fang, W., Yu, Z., Chen, Y., Huang, T., Masquelier, T., Tian, Y.: Deep residual learning in spiking neural networks. In: Proc. of NeurIPS. vol. 34, pp. 21056–21069 (2021)

work page 2021

[14] [15]

In: Proc

Fang, W., Yu, Z., Chen, Y., Masquelier, T., Huang, T., Tian, Y.: Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In: Proc. of ICCV. pp. 2661–2671 (2021)

work page 2021

[15] [16]

arXiv preprint arXiv:2210.06386 (2022)

Feng, L., Liu, Q., Tang, H., Ma, D., Pan, G.: Multi-level firing with spiking ds- resnet: Enabling better and deeper directly-trained spiking neural networks. arXiv preprint arXiv:2210.06386 (2022)

work page arXiv 2022

[16] [17]

In: Proc

Guo, Y., Chen, Y., Liu, X., Peng, W., Zhang, Y., Huang, X., Ma, Z.: Ternary Spike: Learning ternary spikes for spiking neural networks. In: Proc. of AAAI. vol. 38, pp. 12244–12252 (2024) BSViT: A Burst Spiking Vision Transformer 15

work page 2024

[17] [18]

In: Proc

Guo, Y., Liu, X., Chen, Y., Peng, W., Zhang, Y., Ma, Z.: Spiking transformer: Introducing accurate addition-only spiking self-attention for transformer. In: Proc. of CVPR. pp. 24398–24408 (2025)

work page 2025

[18] [19]

In: Proc

Hassani, A., Walton, S., Li, J., Li, S., Shi, H.: Neighborhood attention transformer. In: Proc. of CVPR. pp. 6185–6194 (2023)

work page 2023

[19] [20]

IEEE Transactions on Neural Networks and Learning Systems34(8), 5200–5205 (2021)

Hu, Y., Tang, H., Pan, G.: Spiking deep residual networks. IEEE Transactions on Neural Networks and Learning Systems34(8), 5200–5205 (2021)

work page 2021

[20] [21]

IEEE transactions on neural networks and learning systems36(2), 2353–2367 (2024)

Hu, Y., Deng, L., Wu, Y., Yao, M., Li, G.: Advancing spiking neural networks toward deep residual learning. IEEE transactions on neural networks and learning systems36(2), 2353–2367 (2024)

work page 2024

[21] [22]

Trends in Neuro- sciences26(3), 161–167 (2003)

Izhikevich, E.M., Desai, N.S., Walcott, E.C., Hoppensteadt, F.C.: Bursts as a unit of neural information: Selective communication via resonance. Trends in Neuro- sciences26(3), 161–167 (2003)

work page 2003

[22] [23]

Kipf, T.N., Welling, M.: Variational graph auto-encoders (2016)

work page 2016

[23] [24]

Krizhevsky, A., Nair, V., Hinton, G.: CIFAR-10 Dataset (2009), canadian Institute for Advanced Research

work page 2009

[24] [25]

Frontiers in Neu- roscience14, 497482 (2020)

Lee, C., Sarwar, S.S., Panda, P., Srinivasan, G., Roy, K.: Enabling spike-based backpropagation for training deep neural network architectures. Frontiers in Neu- roscience14, 497482 (2020)

work page 2020

[25] [26]

In: Proc

Lee, D., Li, Y., Kim, Y., Xiao, S., Panda, P.: Spiking transformer with spatial- temporal attention. In: Proc. of CVPR. pp. 13948–13958 (2025)

work page 2025

[26] [27]

Frontiers in Neuroscience11(2017)

Li, H., Liu, H., Ji, X., Li, G., Shi, L.: CIFAR10-DVS: An Event-Stream Dataset for Object Classification. Frontiers in Neuroscience11(2017)

work page 2017

[27] [28]

In: Proc

Li, Y., Deng, S., Dong, X., Gong, R., Gu, S.: A free lunch from ANN: Towards efficient, accurate spiking neural networks calibration. In: Proc. of ICML. pp. 6316– 6325 (2021)

work page 2021

[28] [29]

In: Proc

Li, Y., Guo, Y., Zhang, S., Deng, S., Hai, Y., Gu, S.: Differentiable spike: Rethink- ing gradient-descent for training spiking neural networks. In: Proc. of NeurIPS. vol. 34, pp. 23426–23439 (2021)

work page 2021

[29] [30]

In: Proc

Li, Y., Kim, Y., Park, H., Geller, T., Panda, P.: Neuromorphic data augmentation for training spiking neural networks. In: Proc. of ECCV. pp. 631–649. Springer (2022)

work page 2022

[30] [31]

In: Proc

Luo, X., Yao, M., Chou, Y., Xu, B., Li, G.: Integer-valued training and spike-driven inference spiking neural network for high-performance and energy-efficient object detection. In: Proc. of ECCV. pp. 253–272. Springer (2024)

work page 2024

[31] [32]

Neural networks10(9), 1659–1671 (1997)

Maass, W.: Networks of spiking neurons: the third generation of neural network models. Neural networks10(9), 1659–1671 (1997)

work page 1997

[32] [33]

In: Proc

Meng, Q., Xiao, M., Yan, S., Wang, Y., Lin, Z., Luo, Z.Q.: Training high- performance low-latency spiking neural networks by differentiation on spike repre- sentation. In: Proc. of CVPR. pp. 12444–12453 (2022)

work page 2022

[33] [34]

In: Proc

Min, E., Rong, Y., Xu, T., Bian, Y., Luo, D., Lin, K., Huang, J., Ananiadou, S., Zhao, P.: Neighbour interaction based click-through rate prediction via graph- masked transformer. In: Proc. of SIGIR. pp. 353–362 (2022)

work page 2022

[34] [35]

Nature572(7767), 106–111 (2019)

Pei, J., Deng, L., Song, S., Zhao, M., Zhang, Y., Wu, S., Wang, G., Zou, Z., Wu, Z., He, W., et al.: Towards artificial general intelligence with hybrid tianjic chip architecture. Nature572(7767), 106–111 (2019)

work page 2019

[35] [36]

Rathi, N., Roy, K.: Diet-snn: Direct input encoding with leakage and threshold optimization in deep spiking neural networks (2020)

work page 2020

[36] [37]

In: Proc

Rathi, N., Srinivasan, G., Panda, P., Roy, K.: Enabling deep spiking neural net- works with hybrid conversion and spike timing dependent backpropagation. In: Proc. of ICLR (2020) 16 Hongxiang Peng, Dewei Bai, and Hong Qu()

work page 2020

[37] [38]

Nature575(7784), 607–617 (2019)

Roy, K., Jaiswal, A., Panda, P.: Towards spike-based machine intelligence with neuromorphic computing. Nature575(7784), 607–617 (2019)

work page 2019

[38] [39]

Frontiers in Neuroscience13, 95 (2019)

Sengupta, A., Ye, Y., Wang, R., Liu, C., Roy, K.: Going deeper in spiking neural networks: VGG and residual architectures. Frontiers in Neuroscience13, 95 (2019)

work page 2019

[39] [40]

In: Proc

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Proc. of NeurIPS. vol. 30 (2017)

work page 2017

[40] [41]

Biophysical journal12(1), 1–24 (1972)

Wilson, H.R., Cowan, J.D.: Excitatory and inhibitory interactions in localized pop- ulations of model neurons. Biophysical journal12(1), 1–24 (1972)

work page 1972

[41] [42]

Frontiers in Neuroscience12, 331 (2018)

Wu, Y., Deng, L., Li, G., Zhu, J., Shi, L.: Spatio-temporal backpropagation for training high-performance spiking neural networks. Frontiers in Neuroscience12, 331 (2018)

work page 2018

[42] [43]

In: Proc

Yao, M., Hu, J., Zhou, Z., Yuan, L., Tian, Y., Xu, B., Li, G.: Spike-driven trans- former. In: Proc. of NeurIPS. vol. 36, pp. 64043–64058 (2023)

work page 2023

[43] [44]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

Yao, M., Qiu, X., Hu, T., Hu, J., Chou, Y., Tian, K., Liao, J., Leng, L., Xu, B., Li, G.: Scaling spike-driven transformer with efficient spike firing approximation training. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

work page 2025

[44] [45]

In: Proc

Zheng, H., Wu, Y., Deng, L., Hu, Y., Li, G.: Going deeper with directly-trained larger spiking neural networks. In: Proc. of AAAI. vol. 35, pp. 11062–11070 (2021)

work page 2021

[45] [46]

In: Proceedings of the AAAI conference on artificial intelligence

Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmenta- tion. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 13001–13008 (2020)

work page 2020

[46] [47]

Zhou, C., Yu, L., Zhou, Z., Ma, Z., Zhang, H., Zhou, H., Tian, Y.: Spikingformer: Spike-driven residual learning for transformer-based spiking neural network (2023)

work page 2023

[47] [48]

Zhou, Z., Che, K., Fang, W., Tian, K., Zhu, Y., Yan, S., Tian, Y., Yuan, L.: Spikformer v2: Join the high accuracy club on imagenet with an snn ticket (2024)

work page 2024

[48] [49]

In: The Eleventh Proc

Zhou, Z., Zhu, Y., He, C., Wang, Y., YAN, S., Tian, Y., Yuan, L.: Spikformer: When spiking neural network meets transformer. In: The Eleventh Proc. of ICLR (2023) BSViT: A Burst Spiking Vision Transformer 17 A Preliminaries A.1 Spiking Neuron Model Spiking neurons are the fundamental computational units in Spiking Neural Net- works (SNNs), enabling event-...

work page 2023