EventCrab: Harnessing Frame and Point Synergy for Event-based Action Recognition and Beyond

Jiachao Zhang; Jinhui Tang; Meiqi Cao; Rui Yan; Xiangbo Shu; Zechao Li

arxiv: 2411.18328 · v2 · submitted 2024-11-27 · 💻 cs.CV

EventCrab: Harnessing Frame and Point Synergy for Event-based Action Recognition and Beyond

Meiqi Cao , Xiangbo Shu , Jiachao Zhang , Rui Yan , Zechao Li , Jinhui Tang This is my paper

Pith reviewed 2026-05-23 16:30 UTC · model grok-4.3

classification 💻 cs.CV

keywords event-based action recognitionframe-point synergyevent framesevent pointsspiking-like context learnerevent point encoderjoint representation spacehilbert scan

0 comments

The pith

EventCrab combines lighter frame networks for dense event data with heavier point networks for sparse points to balance accuracy and efficiency in action recognition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to fix the core mismatch in event-based action recognition, where methods either convert streams to dense frames handled by heavy networks or process sparse points with light networks, missing the data's mixed dense-temporal and sparse-spatial nature. It introduces EventCrab as a framework that pairs the two network types while adding a shared space linking frames, text, and points. Two new modules handle the point side: one extracts context from raw streams and the other encodes long-range features along a space-filling curve. The result is reported gains on four datasets, including over 5 percent on SeAct and 7 percent on HARDVS. A reader would care because event cameras produce asynchronous streams that standard pipelines waste or distort.

Core claim

EventCrab is a synergy-aware framework that integrates lighter frame-specific networks for dense event frames with heavier point-specific networks for sparse event points while establishing a joint frame-text-point representation space. It adds a Spiking-like Context Learner to pull contextualized points from raw streams and an Event Point Encoder that processes long spatiotemporal features through Hilbert scanning.

What carries the argument

The synergy-aware framework that pairs frame-specific and point-specific networks, realized through the Spiking-like Context Learner, Event Point Encoder, and joint frame-text-point representation space.

If this is right

The joint frame-text-point space allows direct transfer between dense and sparse event representations.
The Spiking-like Context Learner and Hilbert-scan encoder together capture both local context and long-range structure in event points.
Reported accuracy lifts of 5.17 percent on SeAct and 7.01 percent on HARDVS follow directly from the balanced integration.
The same architecture applies to additional event-based tasks beyond action recognition.
Efficiency gains arise because lighter frame networks offset the cost of heavier point networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same frame-point pairing could be tested on event-based object detection or tracking without retraining the core modules from scratch.
If the joint representation space generalizes, it might allow text prompts to guide point-feature selection during inference.
A follow-up experiment could measure whether the Hilbert-scan ordering still helps when event density varies across scenes.
The approach implicitly suggests that other asynchronous sensor streams, such as LiDAR points, might benefit from analogous dense-sparse pairing.

Load-bearing premise

The dense temporal and sparse spatial traits of asynchronous event streams can be handled by merging frame and point networks without creating new training or inference conflicts.

What would settle it

A controlled comparison on SeAct or HARDVS in which the combined EventCrab model shows no accuracy gain or efficiency improvement over the best frame-only or point-only baseline using the same backbone networks.

Figures

Figures reproduced from arXiv: 2411.18328 by Jiachao Zhang, Jinhui Tang, Meiqi Cao, Rui Yan, Xiangbo Shu, Zechao Li.

**Figure 2.** Figure 2: Framework of the proposed EventCrab. For the event-point embedding, the Spiking-like Context Learner (SCL) and the Event [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Visualization of events before/after processed by SCL [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 7.** Figure 7: Computational effectiveness analysis between ours [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 6.** Figure 6: Visualization of the Top-3 predicted results on the SeAct [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

Event-based Action Recognition (EAR) possesses the advantages of high-temporal resolution capturing and privacy preservation compared with traditional action recognition. Current leading EAR solutions typically follow two regimes: project unconstructed event streams into dense constructed event frames and adopt powerful frame-specific networks, or employ lightweight point-specific networks to handle sparse unconstructed event points directly. However, such two regimes are blind to a fundamental issue: failing to accommodate the unique dense temporal and sparse spatial properties of asynchronous event data. In this article, we present a synergy-aware framework, i.e., EventCrab, that adeptly integrates the "lighter" frame-specific networks for dense event frames with the "heavier" point-specific networks for sparse event points, balancing accuracy and efficiency. Furthermore, we establish a joint frame-text-point representation space to bridge distinct event frames and points. In specific, to better exploit the unique spatiotemporal relationships inherent in asynchronous event points, we devise two strategies for the "heavier" point-specific embedding: i) a Spiking-like Context Learner (SCL) that extracts contextualized event points from raw event streams. ii) an Event Point Encoder (EPE) that further explores event-point long spatiotemporal features in a Hilbert-scan way. Experiments on four datasets demonstrate the significant performance of our proposed EventCrab, particularly gaining improvements of 5.17% on SeAct and 7.01% on HARDVS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EventCrab mixes lighter frame networks with heavier point networks plus new SCL and EPE modules for event action recognition and reports gains on four datasets, but the abstract gives no evidence that the joint setup actually avoids the training conflicts it claims to solve.

read the letter

The main point is that this paper presents EventCrab as a way to combine frame-based and point-based processing for event data, using a Spiking-like Context Learner, a Hilbert-scan Event Point Encoder, and a shared frame-text-point space. It shows concrete gains of 5.17% on SeAct and 7.01% on HARDVS across four datasets, which is the kind of empirical result worth noting in this niche.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that existing event-based action recognition (EAR) methods fail to accommodate the dense temporal and sparse spatial properties of asynchronous event data, and proposes EventCrab as a synergy-aware framework that integrates lighter frame-specific networks for dense event frames with heavier point-specific networks for sparse event points. It introduces a Spiking-like Context Learner (SCL) and Hilbert-scan Event Point Encoder (EPE) for point embedding, establishes a joint frame-text-point representation space, and reports empirical gains including 5.17% on SeAct and 7.01% on HARDVS across four datasets.

Significance. If the results hold under rigorous validation, the work is significant for offering a practical hybrid approach to EAR that balances accuracy and efficiency while introducing a multimodal joint representation space. The empirical gains on multiple datasets constitute a concrete strength for an applied framework paper; the design of SCL and EPE as targeted modules for event-point properties is a clear contribution if the synergy is shown to function as claimed.

major comments (2)

[Abstract / Method] Abstract and method description: the central claim that the proposed synergy (frame + point networks via SCL/EPE plus joint space) resolves the failure mode of prior regimes by accommodating dense-temporal/sparse-spatial properties without training conflicts is load-bearing but unsupported; no explicit analysis, constraint, or diagnostic is provided showing that joint optimization preserves distinct signals rather than allowing branch dominance or alignment artifacts.
[Experiments] Experiments section: the reported improvements (5.17% on SeAct, 7.01% on HARDVS) are presented as evidence of the framework's effectiveness, yet without ablations isolating the contribution of the joint frame-text-point space, SCL, or EPE versus baseline combinations, it remains unclear whether the gains derive from the claimed synergy or from other implementation choices.

minor comments (2)

[Abstract] The abstract refers to results on four datasets but names only SeAct and HARDVS; the full list of datasets and per-dataset breakdowns should be explicitly stated in the experiments section for completeness.
[Method] The notation and architectural details for SCL and EPE would benefit from accompanying equations or pseudocode in the method section to clarify the spiking-like context extraction and Hilbert-scan encoding mechanisms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that strengthening the evidence for the claimed synergy mechanism and providing more targeted ablations will improve the manuscript. We will revise accordingly by adding the requested analyses and experiments.

read point-by-point responses

Referee: [Abstract / Method] Abstract and method description: the central claim that the proposed synergy (frame + point networks via SCL/EPE plus joint space) resolves the failure mode of prior regimes by accommodating dense-temporal/sparse-spatial properties without training conflicts is load-bearing but unsupported; no explicit analysis, constraint, or diagnostic is provided showing that joint optimization preserves distinct signals rather than allowing branch dominance or alignment artifacts.

Authors: We acknowledge that explicit diagnostics would better substantiate the claim that joint optimization preserves distinct frame and point signals. While the performance gains across datasets and the design of SCL and EPE are intended to address the dense-temporal/sparse-spatial properties, we will add in revision a dedicated analysis subsection. This will include t-SNE visualizations of the joint representation space, per-branch performance breakdowns, and training dynamics (e.g., gradient norms) to demonstrate that neither branch dominates nor that alignment artifacts arise. revision: yes
Referee: [Experiments] Experiments section: the reported improvements (5.17% on SeAct, 7.01% on HARDVS) are presented as evidence of the framework's effectiveness, yet without ablations isolating the contribution of the joint frame-text-point space, SCL, or EPE versus baseline combinations, it remains unclear whether the gains derive from the claimed synergy or from other implementation choices.

Authors: We agree that isolating the individual contributions of the joint frame-text-point space, SCL, and EPE is necessary to attribute gains specifically to the synergy. The manuscript already reports overall results and some module comparisons, but we will expand the experiments with new ablation tables that systematically remove or replace each component (joint space, SCL, EPE) while keeping other factors fixed. These will be added to clarify the source of the reported improvements. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical architecture proposal validated on external datasets

full rationale

The paper proposes EventCrab as an empirical framework that combines frame-specific and point-specific networks via SCL, EPE, and a joint frame-text-point space, with performance shown via experiments on four datasets (e.g., +5.17% on SeAct). No equations, parameter fits, or self-citations are presented that reduce any claimed result to its own inputs by construction. The design choices address stated limitations of prior regimes through new components whose efficacy is measured externally rather than assumed or renamed from prior fits.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities beyond the named modules can be identified or verified.

invented entities (2)

Spiking-like Context Learner (SCL) no independent evidence
purpose: extracts contextualized event points from raw event streams
New component introduced to handle point context
Event Point Encoder (EPE) no independent evidence
purpose: explores event-point long spatiotemporal features in a Hilbert-scan way
New component introduced for long-range point features

pith-pipeline@v0.9.0 · 5801 in / 1173 out tokens · 23435 ms · 2026-05-23T16:30:47.590999+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 8 internal anchors

[1]

A low power, fully event-based gesture recognition system

Arnon Amir, Brian Taba, David Berg, Timothy Melano, Jef- frey McKinstry, Carmelo Di Nolfo, Tapan Nayak, Alexander Andreopoulos, Guillaume Garreau, Marcela Mendoza, et al. A low power, fully event-based gesture recognition system. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7243–7252, 2017. 1, 5, 6

work page 2017
[2]

Eventtransact: A video transformer-based framework for event-camera based action recognition

Tristan de Blegiers, Ishan Rajendrakumar Dave, Adeel Yousaf, and Mubarak Shah. Eventtransact: A video transformer-based framework for event-camera based action recognition. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1–7. IEEE, 2023. 2, 6

work page 2023
[3]

other contributors

Wei Fang, Yanqi Chen, Jianhao Ding, Ding Chen, Zhaofei Yu, Huihui Zhou, and Yonghong Tian. other contributors. spikingjelly, 2020. 6

work page 2020
[4]

Slowfast networks for video recognition

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6202–6211, 2019. 6

work page 2019
[5]

Hungry hungry hippos: To- wards language modeling with state space models

Daniel Y Fu, Tri Dao, Khaled K Saab, Armin W Thomas, Atri Rudra, and Christopher R´e. Hungry hungry hippos: To- wards language modeling with state space models. arXiv preprint arXiv:2212.14052, 2022. 3

work page arXiv 2022
[6]

Event-based vision: A survey

Guillermo Gallego, Tobi Delbr ¨uck, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J Davison, J ¨org Conradt, Kostas Daniilidis, et al. Event-based vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):154–180, 2020. 1

work page 2020
[7]

Action recognition and benchmark using event cameras

Yue Gao, Jiaxuan Lu, Siqi Li, Nan Ma, Shaoyi Du, Yipeng Li, and Qionghai Dai. Action recognition and benchmark using event cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 6

work page 2023
[8]

Bridging video-text retrieval with multiple choice questions

Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, Xi- aohu Qie, and Ping Luo. Bridging video-text retrieval with multiple choice questions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 16167–16176, 2022. 8

work page 2022
[9]

A reservoir-based convolu- tional spiking neural network for gesture recognition from dvs input

Arun M George, Dighanchal Banerjee, Sounak Dey, Arijit Mukherjee, and P Balamurali. A reservoir-based convolu- tional spiking neural network for gesture recognition from dvs input. In International Joint Conference on Neural Net- works (IJCNN), pages 1–9. IEEE, 2020. 2

work page 2020
[10]

Spiking neural networks

Samanwoy Ghosh-Dastidar and Hojjat Adeli. Spiking neural networks. International Journal of Neural Systems, 19(04): 295–308, 2009. 4, 7

work page 2009
[11]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023. 3, 5

work page internal anchor Pith review Pith/arXiv arXiv 2023
[12]

Efficiently Modeling Long Sequences with Structured State Spaces

Albert Gu, Karan Goel, and Christopher R ´e. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021. 3

work page internal anchor Pith review Pith/arXiv arXiv 2021
[13]

Stca: Spatio-temporal credit assignment with delayed feedback in deep spiking neural networks

Pengjie Gu, Rong Xiao, Gang Pan, and Huajin Tang. Stca: Spatio-temporal credit assignment with delayed feedback in deep spiking neural networks. In International Joint Confer- ence on Artificial Intelligence, pages 1366–1372, 2019. 6

work page 2019
[14]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016. 6

work page 2016
[15]

Spiking deep residual networks

Yangfan Hu, Huajin Tang, and Gang Pan. Spiking deep residual networks. IEEE Transactions on Neural Networks and Learning Systems, 34(8):5200–5205, 2021. 3

work page 2021
[16]

Temporal binary representa- tion for event-based action recognition

Simone Undri Innocenti, Federico Becattini, Federico Per- nici, and Alberto Del Bimbo. Temporal binary representa- tion for event-based action recognition. In 2020 25th Inter- national Conference on Pattern Recognition , pages 10426– 10432. IEEE, 2021. 2, 6

work page 2020
[17]

Point-voxel absorbing graph representation learning for event stream based recog- nition

Bo Jiang, Chengguo Yuan, Xiao Wang, Zhimin Bao, Lin Zhu, Yonghong Tian, and Jin Tang. Point-voxel absorbing graph representation learning for event stream based recog- nition. arXiv preprint arXiv:2306.05239, 2023. 2

work page arXiv 2023
[18]

Embodied Neuromorphic Vision with Event-Driven Random Backpropagation

Jacques Kaiser, Alexander Friedrich, J Tieck, Daniel Re- ichard, Arne Roennau, Emre Neftci, and R ¨udiger Dillmann. Embodied neuromorphic vision with event-driven random backpropagation. arXiv preprint arXiv:1904.04805 , 2019. 6

work page internal anchor Pith review Pith/arXiv arXiv 1904
[19]

Synap- tic plasticity dynamics for deep continuous local learning

Jacques Kaiser, Hesham Mostafa, and Emre Neftci. Synap- tic plasticity dynamics for deep continuous local learning. Frontiers in Neuroscience, 14:424, 2020. 6

work page 2020
[20]

Exposing and mitigating spurious correlations for cross-modal retrieval

Jae Myung Kim, A Koepke, Cordelia Schmid, and Zeynep Akata. Exposing and mitigating spurious correlations for cross-modal retrieval. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 2585–2595, 2023. 8

work page 2023
[21]

Adam: A Method for Stochastic Optimization

Diederik P Kingma. Adam: A method for stochastic opti- mization. arXiv preprint arXiv:1412.6980, 2014. 6

work page internal anchor Pith review Pith/arXiv arXiv 2014
[22]

Spikemba: Multi-modal spiking saliency mamba for temporal video grounding

Wenrui Li, Xiaopeng Hong, Ruiqin Xiong, and Xi- aopeng Fan. Spikemba: Multi-modal spiking saliency mamba for temporal video grounding. arXiv preprint arXiv:2404.01174, 2024. 3

work page arXiv 2024
[23]

Pointmamba: A simple state space model for point cloud analysis

Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, and Xiang Bai. Pointmamba: A simple state space model for point cloud analysis. arXiv preprint arXiv:2402.10739, 2024. 3

work page arXiv 2024
[24]

Tsm: Temporal shift module for efficient video understanding

Ji Lin, Chuang Gan, and Song Han. Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 7083–7093, 2019. 6

work page 2019
[25]

Event-based action recognition using motion informa- tion and spiking neural networks

Qianhui Liu, Dong Xing, Huajin Tang, De Ma, and Gang Pan. Event-based action recognition using motion informa- tion and spiking neural networks. InInternational Joint Con- ference on Artificial Intelligence, pages 1743–1749, 2021. 6

work page 2021
[26]

VMamba: Visual State Space Model

Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, and Yunfan Liu. Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166,

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Tam: Temporal adaptive module for video recog- nition

Zhaoyang Liu, Limin Wang, Wayne Wu, Chen Qian, and Tong Lu. Tam: Temporal adaptive module for video recog- nition. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 13708–13718, 2021. 6

work page 2021
[28]

Video swin transformer

Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. Video swin transformer. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3202–3211, 2022. 6

work page 2022
[29]

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

Jun Ma, Feifei Li, and Bo Wang. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[30]

Event-based gesture recognition with dynamic background suppression using smartphone computational capabilities

Jean-Matthieu Maro, Sio-Hoi Ieng, and Ryad Benosman. Event-based gesture recognition with dynamic background suppression using smartphone computational capabilities. Frontiers in Neuroscience, 14:275, 2020. 6

work page 2020
[31]

Neuromorphic vision datasets for pedestrian detection, action recognition, and fall detection

Shu Miao, Guang Chen, Xiangyu Ning, Yang Zi, Kejia Ren, Zhenshan Bing, and Alois Knoll. Neuromorphic vision datasets for pedestrian detection, action recognition, and fall detection. Frontiers in Neurorobotics, 13:38, 2019. 5, 6

work page 2019
[32]

Learn- ing transferable visual models from natural language super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. In International Conference on Machine Learning , pages 8748–8763, 2021. 6

work page 2021
[33]

Exploring neuromorphic computing based on spiking neural networks: Algorithms to hardware

Nitin Rathi, Indranil Chakraborty, Adarsh Kosta, Abhronil Sengupta, Aayush Ankit, Priyadarshini Panda, and Kaushik Roy. Exploring neuromorphic computing based on spiking neural networks: Algorithms to hardware. ACM Computing Surveys, 55(12):1–49, 2023. 2

work page 2023
[34]

Events-to-video: Bringing modern computer vision to event cameras

Henri Rebecq, Ren ´e Ranftl, Vladlen Koltun, and Davide Scaramuzza. Events-to-video: Bringing modern computer vision to event cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 3857–3866, 2019. 1

work page 2019
[35]

High speed and high dynamic range video with an event camera

Henri Rebecq, Ren ´e Ranftl, Vladlen Koltun, and Davide Scaramuzza. High speed and high dynamic range video with an event camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(6):1964–1980, 2019. 1

work page 1964
[36]

Ttpoint: A tensorized point cloud network for lightweight action recognition with event cam- eras

Hongwei Ren, Yue Zhou, Haotian Fu, Yulong Huang, Ren- jing Xu, and Bojun Cheng. Ttpoint: A tensorized point cloud network for lightweight action recognition with event cam- eras. In Proceedings of the 31st ACM International Confer- ence on Multimedia, pages 8026–8034, 2023. 2

work page 2023
[37]

Spikepoint: An efficient point-based spiking neural network for event cam- eras action recognition

Hongwei Ren, Yue Zhou, Yulong Huang, Haotian Fu, Xi- aopeng Lin, Jie Song, and Bojun Cheng. Spikepoint: An efficient point-based spiking neural network for event cam- eras action recognition. arXiv preprint arXiv:2310.07189 ,

work page arXiv
[38]

Event transformer

Alberto Sabater, Luis Montesano, and Ana C Murillo. Event transformer. a sparse-aware solution for efficient event data processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 2677– 2686, 2022. 6

work page 2022
[39]

Spikingres- former: Bridging resnet and vision transformer in spiking neural networks

Xinyu Shi, Zecheng Hao, and Zhaofei Yu. Spikingres- former: Bridging resnet and vision transformer in spiking neural networks. In Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition , pages 5610–5619, 2024. 2

work page 2024
[40]

Slayer: Spike layer error reassignment in time

Sumit B Shrestha and Garrick Orchard. Slayer: Spike layer error reassignment in time. Advances in Neural Information Processing Systems, 31, 2018. 6

work page 2018
[41]

Hierarchical long short-term concurrent memory for human interaction recognition

Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Wei Liu, and Jian Yang. Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Transactions on Pat- tern Analysis and Machine Intelligence , 43(3):1110–1118,

work page
[42]

Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction

Xiangbo Shu, Liyan Zhang, Guo-Jun Qi, Wei Liu, and Jinhui Tang. Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence , 44(6):3300– 3315, 2021. 1

work page 2021
[43]

Simplified State Space Layers for Sequence Modeling

Jimmy TH Smith, Andrew Warrington, and Scott W Linder- man. Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933, 2022. 3

work page internal anchor Pith review Pith/arXiv arXiv 2022
[44]

Learning spatiotemporal features with 3d convolutional networks

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning spatiotemporal features with 3d convolutional networks. InProceedings of the IEEE Inter- national Conference on Computer Vision, pages 4489–4497,

work page
[45]

A closer look at spatiotemporal convolutions for action recognition

Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 6450–6459, 2018. 6

work page 2018
[46]

Unleashing the power of cnn and transformer for balanced rgb-event video recog- nition

Xiao Wang, Yao Rong, Shiao Wang, Yuan Chen, Zhe Wu, Bo Jiang, Yonghong Tian, and Jin Tang. Unleashing the power of cnn and transformer for balanced rgb-event video recog- nition. arXiv preprint arXiv:2312.11128, 2023. 6

work page arXiv 2023
[47]

Sstformer: bridging spiking neural network and memory support transformer for frame- event based recognition

Xiao Wang, Zongzhen Wu, Yao Rong, Lin Zhu, Bo Jiang, Jin Tang, and Yonghong Tian. Sstformer: bridging spiking neural network and memory support transformer for frame- event based recognition. arXiv preprint arXiv:2308.04369,

work page arXiv
[48]

Hardvs: Re- visiting human activity recognition with dynamic vision sen- sors

Xiao Wang, Zongzhen Wu, Bo Jiang, Zhimin Bao, Lin Zhu, Guoqi Li, Yaowei Wang, and Yonghong Tian. Hardvs: Re- visiting human activity recognition with dynamic vision sen- sors. In Association for the Advancement of Artificial Intel- ligence, pages 5615–5623, 2024. 5, 6

work page 2024
[49]

Action-net: Multipath excitation for action recognition

Zhengwei Wang, Qi She, and Aljosa Smolic. Action-net: Multipath excitation for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 13214–13223, 2021. 6

work page 2021
[50]

Masked spiking trans- former

Ziqing Wang, Yuetong Fang, Jiahang Cao, Qiang Zhang, Zhongrui Wang, and Renjing Xu. Masked spiking trans- former. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 1761–1771, 2023. 6

work page 2023
[51]

Eas-snn: End-to-end adaptive sampling and representation for event-based detec- tion with recurrent spiking neural networks

Ziming Wang, Ziling Wang, Huaning Li, Lang Qin, Run- hao Jiang, De Ma, and Huajin Tang. Eas-snn: End-to-end adaptive sampling and representation for event-based detec- tion with recurrent spiking neural networks. arXiv preprint arXiv:2403.12574, 2024. 4

work page arXiv 2024
[52]

An event-driven categorization model for aer im- age sensors using multispike encoding and learning

Rong Xiao, Huajin Tang, Yuhao Ma, Rui Yan, and Garrick Orchard. An event-driven categorization model for aer im- age sensors using multispike encoding and learning. IEEE Transactions on Neural Networks and Learning Systems, 31 (9):3649–3657, 2019. 6

work page 2019
[53]

Spiking neural networks and their applications: A review

Kashu Yamazaki, Viet-Khoa V o-Ho, Darshan Bulsara, and Ngan Le. Spiking neural networks and their applications: A review. Brain Sciences, 12(7):863, 2022. 2

work page 2022
[54]

Temporal-wise at- tention spiking neural networks for event streams classifica- tion

Man Yao, Huanhuan Gao, Guangshe Zhao, Dingheng Wang, Yihan Lin, Zhaoxu Yang, and Guoqi Li. Temporal-wise at- tention spiking neural networks for event streams classifica- tion. In Proceedings of the IEEE/CVF International Confer- ence on Computer Vision, pages 10221–10230, 2021. 4

work page 2021
[55]

Eventdance: Unsupervised source- free cross-modal adaptation for event-based object recogni- tion

Xu Zheng and Lin Wang. Eventdance: Unsupervised source- free cross-modal adaptation for event-based object recogni- tion. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 17448–17458,

work page
[56]

Deep learning for event-based vision: A comprehensive survey and bench- marks

Xu Zheng, Yexin Liu, Yunfan Lu, Tongyan Hua, Tianbo Pan, Weiming Zhang, Dacheng Tao, and Lin Wang. Deep learning for event-based vision: A comprehensive survey and bench- marks. arXiv preprint arXiv:2302.08890, 2023. 1

work page arXiv 2023
[57]

E- clip: Towards label-efficient event-based open-world under- standing by clip

Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, and Lin Wang. E- clip: Towards label-efficient event-based open-world under- standing by clip. arXiv preprint arXiv:2308.03135, 2023. 6

work page arXiv 2023
[58]

Ex- act: Language-guided conceptual reasoning and uncertainty estimation for event-based action recognition and more

Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, and Lin Wang. Ex- act: Language-guided conceptual reasoning and uncertainty estimation for event-based action recognition and more. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 18633–18643, 2024. 2, 5, 6, 7, 8

work page 2024
[59]

Spik- former: When spiking neural network meets transformer,

Zhaokun Zhou, Yuesheng Zhu, Chao He, Yaowei Wang, Shuicheng Yan, Yonghong Tian, and Li Yuan. Spikformer: When spiking neural network meets transformer. arXiv preprint arXiv:2209.15425, 2022. 5

work page arXiv 2022
[60]

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[1] [1]

A low power, fully event-based gesture recognition system

Arnon Amir, Brian Taba, David Berg, Timothy Melano, Jef- frey McKinstry, Carmelo Di Nolfo, Tapan Nayak, Alexander Andreopoulos, Guillaume Garreau, Marcela Mendoza, et al. A low power, fully event-based gesture recognition system. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7243–7252, 2017. 1, 5, 6

work page 2017

[2] [2]

Eventtransact: A video transformer-based framework for event-camera based action recognition

Tristan de Blegiers, Ishan Rajendrakumar Dave, Adeel Yousaf, and Mubarak Shah. Eventtransact: A video transformer-based framework for event-camera based action recognition. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1–7. IEEE, 2023. 2, 6

work page 2023

[3] [3]

other contributors

Wei Fang, Yanqi Chen, Jianhao Ding, Ding Chen, Zhaofei Yu, Huihui Zhou, and Yonghong Tian. other contributors. spikingjelly, 2020. 6

work page 2020

[4] [4]

Slowfast networks for video recognition

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6202–6211, 2019. 6

work page 2019

[5] [5]

Hungry hungry hippos: To- wards language modeling with state space models

Daniel Y Fu, Tri Dao, Khaled K Saab, Armin W Thomas, Atri Rudra, and Christopher R´e. Hungry hungry hippos: To- wards language modeling with state space models. arXiv preprint arXiv:2212.14052, 2022. 3

work page arXiv 2022

[6] [6]

Event-based vision: A survey

Guillermo Gallego, Tobi Delbr ¨uck, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J Davison, J ¨org Conradt, Kostas Daniilidis, et al. Event-based vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):154–180, 2020. 1

work page 2020

[7] [7]

Action recognition and benchmark using event cameras

Yue Gao, Jiaxuan Lu, Siqi Li, Nan Ma, Shaoyi Du, Yipeng Li, and Qionghai Dai. Action recognition and benchmark using event cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 6

work page 2023

[8] [8]

Bridging video-text retrieval with multiple choice questions

Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, Xi- aohu Qie, and Ping Luo. Bridging video-text retrieval with multiple choice questions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 16167–16176, 2022. 8

work page 2022

[9] [9]

A reservoir-based convolu- tional spiking neural network for gesture recognition from dvs input

Arun M George, Dighanchal Banerjee, Sounak Dey, Arijit Mukherjee, and P Balamurali. A reservoir-based convolu- tional spiking neural network for gesture recognition from dvs input. In International Joint Conference on Neural Net- works (IJCNN), pages 1–9. IEEE, 2020. 2

work page 2020

[10] [10]

Spiking neural networks

Samanwoy Ghosh-Dastidar and Hojjat Adeli. Spiking neural networks. International Journal of Neural Systems, 19(04): 295–308, 2009. 4, 7

work page 2009

[11] [11]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023. 3, 5

work page internal anchor Pith review Pith/arXiv arXiv 2023

[12] [12]

Efficiently Modeling Long Sequences with Structured State Spaces

Albert Gu, Karan Goel, and Christopher R ´e. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021. 3

work page internal anchor Pith review Pith/arXiv arXiv 2021

[13] [13]

Stca: Spatio-temporal credit assignment with delayed feedback in deep spiking neural networks

Pengjie Gu, Rong Xiao, Gang Pan, and Huajin Tang. Stca: Spatio-temporal credit assignment with delayed feedback in deep spiking neural networks. In International Joint Confer- ence on Artificial Intelligence, pages 1366–1372, 2019. 6

work page 2019

[14] [14]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016. 6

work page 2016

[15] [15]

Spiking deep residual networks

Yangfan Hu, Huajin Tang, and Gang Pan. Spiking deep residual networks. IEEE Transactions on Neural Networks and Learning Systems, 34(8):5200–5205, 2021. 3

work page 2021

[16] [16]

Temporal binary representa- tion for event-based action recognition

Simone Undri Innocenti, Federico Becattini, Federico Per- nici, and Alberto Del Bimbo. Temporal binary representa- tion for event-based action recognition. In 2020 25th Inter- national Conference on Pattern Recognition , pages 10426– 10432. IEEE, 2021. 2, 6

work page 2020

[17] [17]

Point-voxel absorbing graph representation learning for event stream based recog- nition

Bo Jiang, Chengguo Yuan, Xiao Wang, Zhimin Bao, Lin Zhu, Yonghong Tian, and Jin Tang. Point-voxel absorbing graph representation learning for event stream based recog- nition. arXiv preprint arXiv:2306.05239, 2023. 2

work page arXiv 2023

[18] [18]

Embodied Neuromorphic Vision with Event-Driven Random Backpropagation

Jacques Kaiser, Alexander Friedrich, J Tieck, Daniel Re- ichard, Arne Roennau, Emre Neftci, and R ¨udiger Dillmann. Embodied neuromorphic vision with event-driven random backpropagation. arXiv preprint arXiv:1904.04805 , 2019. 6

work page internal anchor Pith review Pith/arXiv arXiv 1904

[19] [19]

Synap- tic plasticity dynamics for deep continuous local learning

Jacques Kaiser, Hesham Mostafa, and Emre Neftci. Synap- tic plasticity dynamics for deep continuous local learning. Frontiers in Neuroscience, 14:424, 2020. 6

work page 2020

[20] [20]

Exposing and mitigating spurious correlations for cross-modal retrieval

Jae Myung Kim, A Koepke, Cordelia Schmid, and Zeynep Akata. Exposing and mitigating spurious correlations for cross-modal retrieval. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 2585–2595, 2023. 8

work page 2023

[21] [21]

Adam: A Method for Stochastic Optimization

Diederik P Kingma. Adam: A method for stochastic opti- mization. arXiv preprint arXiv:1412.6980, 2014. 6

work page internal anchor Pith review Pith/arXiv arXiv 2014

[22] [22]

Spikemba: Multi-modal spiking saliency mamba for temporal video grounding

Wenrui Li, Xiaopeng Hong, Ruiqin Xiong, and Xi- aopeng Fan. Spikemba: Multi-modal spiking saliency mamba for temporal video grounding. arXiv preprint arXiv:2404.01174, 2024. 3

work page arXiv 2024

[23] [23]

Pointmamba: A simple state space model for point cloud analysis

Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, and Xiang Bai. Pointmamba: A simple state space model for point cloud analysis. arXiv preprint arXiv:2402.10739, 2024. 3

work page arXiv 2024

[24] [24]

Tsm: Temporal shift module for efficient video understanding

Ji Lin, Chuang Gan, and Song Han. Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 7083–7093, 2019. 6

work page 2019

[25] [25]

Event-based action recognition using motion informa- tion and spiking neural networks

Qianhui Liu, Dong Xing, Huajin Tang, De Ma, and Gang Pan. Event-based action recognition using motion informa- tion and spiking neural networks. InInternational Joint Con- ference on Artificial Intelligence, pages 1743–1749, 2021. 6

work page 2021

[26] [26]

VMamba: Visual State Space Model

Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, and Yunfan Liu. Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166,

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

Tam: Temporal adaptive module for video recog- nition

Zhaoyang Liu, Limin Wang, Wayne Wu, Chen Qian, and Tong Lu. Tam: Temporal adaptive module for video recog- nition. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 13708–13718, 2021. 6

work page 2021

[28] [28]

Video swin transformer

Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. Video swin transformer. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3202–3211, 2022. 6

work page 2022

[29] [29]

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

Jun Ma, Feifei Li, and Bo Wang. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[30] [30]

Event-based gesture recognition with dynamic background suppression using smartphone computational capabilities

Jean-Matthieu Maro, Sio-Hoi Ieng, and Ryad Benosman. Event-based gesture recognition with dynamic background suppression using smartphone computational capabilities. Frontiers in Neuroscience, 14:275, 2020. 6

work page 2020

[31] [31]

Neuromorphic vision datasets for pedestrian detection, action recognition, and fall detection

Shu Miao, Guang Chen, Xiangyu Ning, Yang Zi, Kejia Ren, Zhenshan Bing, and Alois Knoll. Neuromorphic vision datasets for pedestrian detection, action recognition, and fall detection. Frontiers in Neurorobotics, 13:38, 2019. 5, 6

work page 2019

[32] [32]

Learn- ing transferable visual models from natural language super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. In International Conference on Machine Learning , pages 8748–8763, 2021. 6

work page 2021

[33] [33]

Exploring neuromorphic computing based on spiking neural networks: Algorithms to hardware

Nitin Rathi, Indranil Chakraborty, Adarsh Kosta, Abhronil Sengupta, Aayush Ankit, Priyadarshini Panda, and Kaushik Roy. Exploring neuromorphic computing based on spiking neural networks: Algorithms to hardware. ACM Computing Surveys, 55(12):1–49, 2023. 2

work page 2023

[34] [34]

Events-to-video: Bringing modern computer vision to event cameras

Henri Rebecq, Ren ´e Ranftl, Vladlen Koltun, and Davide Scaramuzza. Events-to-video: Bringing modern computer vision to event cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 3857–3866, 2019. 1

work page 2019

[35] [35]

High speed and high dynamic range video with an event camera

Henri Rebecq, Ren ´e Ranftl, Vladlen Koltun, and Davide Scaramuzza. High speed and high dynamic range video with an event camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(6):1964–1980, 2019. 1

work page 1964

[36] [36]

Ttpoint: A tensorized point cloud network for lightweight action recognition with event cam- eras

Hongwei Ren, Yue Zhou, Haotian Fu, Yulong Huang, Ren- jing Xu, and Bojun Cheng. Ttpoint: A tensorized point cloud network for lightweight action recognition with event cam- eras. In Proceedings of the 31st ACM International Confer- ence on Multimedia, pages 8026–8034, 2023. 2

work page 2023

[37] [37]

Spikepoint: An efficient point-based spiking neural network for event cam- eras action recognition

Hongwei Ren, Yue Zhou, Yulong Huang, Haotian Fu, Xi- aopeng Lin, Jie Song, and Bojun Cheng. Spikepoint: An efficient point-based spiking neural network for event cam- eras action recognition. arXiv preprint arXiv:2310.07189 ,

work page arXiv

[38] [38]

Event transformer

Alberto Sabater, Luis Montesano, and Ana C Murillo. Event transformer. a sparse-aware solution for efficient event data processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 2677– 2686, 2022. 6

work page 2022

[39] [39]

Spikingres- former: Bridging resnet and vision transformer in spiking neural networks

Xinyu Shi, Zecheng Hao, and Zhaofei Yu. Spikingres- former: Bridging resnet and vision transformer in spiking neural networks. In Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition , pages 5610–5619, 2024. 2

work page 2024

[40] [40]

Slayer: Spike layer error reassignment in time

Sumit B Shrestha and Garrick Orchard. Slayer: Spike layer error reassignment in time. Advances in Neural Information Processing Systems, 31, 2018. 6

work page 2018

[41] [41]

Hierarchical long short-term concurrent memory for human interaction recognition

Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Wei Liu, and Jian Yang. Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Transactions on Pat- tern Analysis and Machine Intelligence , 43(3):1110–1118,

work page

[42] [42]

Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction

Xiangbo Shu, Liyan Zhang, Guo-Jun Qi, Wei Liu, and Jinhui Tang. Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence , 44(6):3300– 3315, 2021. 1

work page 2021

[43] [43]

Simplified State Space Layers for Sequence Modeling

Jimmy TH Smith, Andrew Warrington, and Scott W Linder- man. Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933, 2022. 3

work page internal anchor Pith review Pith/arXiv arXiv 2022

[44] [44]

Learning spatiotemporal features with 3d convolutional networks

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning spatiotemporal features with 3d convolutional networks. InProceedings of the IEEE Inter- national Conference on Computer Vision, pages 4489–4497,

work page

[45] [45]

A closer look at spatiotemporal convolutions for action recognition

Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 6450–6459, 2018. 6

work page 2018

[46] [46]

Unleashing the power of cnn and transformer for balanced rgb-event video recog- nition

Xiao Wang, Yao Rong, Shiao Wang, Yuan Chen, Zhe Wu, Bo Jiang, Yonghong Tian, and Jin Tang. Unleashing the power of cnn and transformer for balanced rgb-event video recog- nition. arXiv preprint arXiv:2312.11128, 2023. 6

work page arXiv 2023

[47] [47]

Sstformer: bridging spiking neural network and memory support transformer for frame- event based recognition

Xiao Wang, Zongzhen Wu, Yao Rong, Lin Zhu, Bo Jiang, Jin Tang, and Yonghong Tian. Sstformer: bridging spiking neural network and memory support transformer for frame- event based recognition. arXiv preprint arXiv:2308.04369,

work page arXiv

[48] [48]

Hardvs: Re- visiting human activity recognition with dynamic vision sen- sors

Xiao Wang, Zongzhen Wu, Bo Jiang, Zhimin Bao, Lin Zhu, Guoqi Li, Yaowei Wang, and Yonghong Tian. Hardvs: Re- visiting human activity recognition with dynamic vision sen- sors. In Association for the Advancement of Artificial Intel- ligence, pages 5615–5623, 2024. 5, 6

work page 2024

[49] [49]

Action-net: Multipath excitation for action recognition

Zhengwei Wang, Qi She, and Aljosa Smolic. Action-net: Multipath excitation for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 13214–13223, 2021. 6

work page 2021

[50] [50]

Masked spiking trans- former

Ziqing Wang, Yuetong Fang, Jiahang Cao, Qiang Zhang, Zhongrui Wang, and Renjing Xu. Masked spiking trans- former. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 1761–1771, 2023. 6

work page 2023

[51] [51]

Eas-snn: End-to-end adaptive sampling and representation for event-based detec- tion with recurrent spiking neural networks

Ziming Wang, Ziling Wang, Huaning Li, Lang Qin, Run- hao Jiang, De Ma, and Huajin Tang. Eas-snn: End-to-end adaptive sampling and representation for event-based detec- tion with recurrent spiking neural networks. arXiv preprint arXiv:2403.12574, 2024. 4

work page arXiv 2024

[52] [52]

An event-driven categorization model for aer im- age sensors using multispike encoding and learning

Rong Xiao, Huajin Tang, Yuhao Ma, Rui Yan, and Garrick Orchard. An event-driven categorization model for aer im- age sensors using multispike encoding and learning. IEEE Transactions on Neural Networks and Learning Systems, 31 (9):3649–3657, 2019. 6

work page 2019

[53] [53]

Spiking neural networks and their applications: A review

Kashu Yamazaki, Viet-Khoa V o-Ho, Darshan Bulsara, and Ngan Le. Spiking neural networks and their applications: A review. Brain Sciences, 12(7):863, 2022. 2

work page 2022

[54] [54]

Temporal-wise at- tention spiking neural networks for event streams classifica- tion

Man Yao, Huanhuan Gao, Guangshe Zhao, Dingheng Wang, Yihan Lin, Zhaoxu Yang, and Guoqi Li. Temporal-wise at- tention spiking neural networks for event streams classifica- tion. In Proceedings of the IEEE/CVF International Confer- ence on Computer Vision, pages 10221–10230, 2021. 4

work page 2021

[55] [55]

Eventdance: Unsupervised source- free cross-modal adaptation for event-based object recogni- tion

Xu Zheng and Lin Wang. Eventdance: Unsupervised source- free cross-modal adaptation for event-based object recogni- tion. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 17448–17458,

work page

[56] [56]

Deep learning for event-based vision: A comprehensive survey and bench- marks

Xu Zheng, Yexin Liu, Yunfan Lu, Tongyan Hua, Tianbo Pan, Weiming Zhang, Dacheng Tao, and Lin Wang. Deep learning for event-based vision: A comprehensive survey and bench- marks. arXiv preprint arXiv:2302.08890, 2023. 1

work page arXiv 2023

[57] [57]

E- clip: Towards label-efficient event-based open-world under- standing by clip

Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, and Lin Wang. E- clip: Towards label-efficient event-based open-world under- standing by clip. arXiv preprint arXiv:2308.03135, 2023. 6

work page arXiv 2023

[58] [58]

Ex- act: Language-guided conceptual reasoning and uncertainty estimation for event-based action recognition and more

Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, and Lin Wang. Ex- act: Language-guided conceptual reasoning and uncertainty estimation for event-based action recognition and more. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 18633–18643, 2024. 2, 5, 6, 7, 8

work page 2024

[59] [59]

Spik- former: When spiking neural network meets transformer,

Zhaokun Zhou, Yuesheng Zhu, Chao He, Yaowei Wang, Shuicheng Yan, Yonghong Tian, and Li Yuan. Spikformer: When spiking neural network meets transformer. arXiv preprint arXiv:2209.15425, 2022. 5

work page arXiv 2022

[60] [60]

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024