QKFormer: Hierarchical Spiking Transformer using Q-K Attention

Chenlin Zhou; Han Zhang; Huihui Zhou; Liutao Yu; Liwei Huang; Li Yuan; Xiaopeng Fan; Yonghong Tian; Zhaokun Zhou; Zhengyu Ma

arxiv: 2403.16552 · v2 · pith:WXDS3T2Vnew · submitted 2024-03-25 · 💻 cs.NE · cs.AI· cs.CV

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

Chenlin Zhou , Han Zhang , Zhaokun Zhou , Liutao Yu , Liwei Huang , Xiaopeng Fan , Li Yuan , Zhengyu Ma

show 2 more authors

Huihui Zhou Yonghong Tian

This is my paper

classification 💻 cs.NE cs.AIcs.CV

keywords spikingperformanceqkformerattentionmodelshierarchicalsnnstransformer

0 comments

read the original abstract

Spiking Transformers, which integrate Spiking Neural Networks (SNNs) with Transformer architectures, have attracted significant attention due to their potential for energy efficiency and high performance. However, existing models in this domain still suffer from suboptimal performance. We introduce several innovations to improve the performance: i) We propose a novel spike-form Q-K attention mechanism, tailored for SNNs, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. ii) We incorporate the hierarchical structure, which significantly benefits the performance of both the brain and artificial neural networks, into spiking transformers to obtain multi-scale spiking representation. iii) We design a versatile and powerful patch embedding module with a deformed shortcut specifically for spiking transformers. Together, we develop QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training. QKFormer shows significantly superior performance over existing state-of-the-art SNN models on various mainstream datasets. Notably, with comparable size to Spikformer (66.34 M, 74.81%), QKFormer (64.96 M) achieves a groundbreaking top-1 accuracy of 85.65% on ImageNet-1k, substantially outperforming Spikformer by 10.84%. To our best knowledge, this is the first time that directly training SNNs have exceeded 85% accuracy on ImageNet-1K. The code and models are publicly available at https://github.com/zhouchenlin2096/QKFormer

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Temporal-Aware Spiking Transformer Hashing Based on 3D-DWT
cs.CV 2025-01 unverdicted novelty 7.0

Spikinghash combines 3D-DWT Spiking WaveMixer, Spiking Self-Attention, and a dynamic soft similarity loss to produce energy-efficient hash codes for DVS data retrieval.
Image Classification via Random Dilated Convolution with Multi-Branch Feature Extraction and Context Excitation
cs.CV 2026-04 unverdicted novelty 3.0

RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.