EventCrab: Harnessing Frame and Point Synergy for Event-based Action Recognition and Beyond
Pith reviewed 2026-05-23 16:30 UTC · model grok-4.3
The pith
EventCrab combines lighter frame networks for dense event data with heavier point networks for sparse points to balance accuracy and efficiency in action recognition.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EventCrab is a synergy-aware framework that integrates lighter frame-specific networks for dense event frames with heavier point-specific networks for sparse event points while establishing a joint frame-text-point representation space. It adds a Spiking-like Context Learner to pull contextualized points from raw streams and an Event Point Encoder that processes long spatiotemporal features through Hilbert scanning.
What carries the argument
The synergy-aware framework that pairs frame-specific and point-specific networks, realized through the Spiking-like Context Learner, Event Point Encoder, and joint frame-text-point representation space.
If this is right
- The joint frame-text-point space allows direct transfer between dense and sparse event representations.
- The Spiking-like Context Learner and Hilbert-scan encoder together capture both local context and long-range structure in event points.
- Reported accuracy lifts of 5.17 percent on SeAct and 7.01 percent on HARDVS follow directly from the balanced integration.
- The same architecture applies to additional event-based tasks beyond action recognition.
- Efficiency gains arise because lighter frame networks offset the cost of heavier point networks.
Where Pith is reading between the lines
- The same frame-point pairing could be tested on event-based object detection or tracking without retraining the core modules from scratch.
- If the joint representation space generalizes, it might allow text prompts to guide point-feature selection during inference.
- A follow-up experiment could measure whether the Hilbert-scan ordering still helps when event density varies across scenes.
- The approach implicitly suggests that other asynchronous sensor streams, such as LiDAR points, might benefit from analogous dense-sparse pairing.
Load-bearing premise
The dense temporal and sparse spatial traits of asynchronous event streams can be handled by merging frame and point networks without creating new training or inference conflicts.
What would settle it
A controlled comparison on SeAct or HARDVS in which the combined EventCrab model shows no accuracy gain or efficiency improvement over the best frame-only or point-only baseline using the same backbone networks.
Figures
read the original abstract
Event-based Action Recognition (EAR) possesses the advantages of high-temporal resolution capturing and privacy preservation compared with traditional action recognition. Current leading EAR solutions typically follow two regimes: project unconstructed event streams into dense constructed event frames and adopt powerful frame-specific networks, or employ lightweight point-specific networks to handle sparse unconstructed event points directly. However, such two regimes are blind to a fundamental issue: failing to accommodate the unique dense temporal and sparse spatial properties of asynchronous event data. In this article, we present a synergy-aware framework, i.e., EventCrab, that adeptly integrates the "lighter" frame-specific networks for dense event frames with the "heavier" point-specific networks for sparse event points, balancing accuracy and efficiency. Furthermore, we establish a joint frame-text-point representation space to bridge distinct event frames and points. In specific, to better exploit the unique spatiotemporal relationships inherent in asynchronous event points, we devise two strategies for the "heavier" point-specific embedding: i) a Spiking-like Context Learner (SCL) that extracts contextualized event points from raw event streams. ii) an Event Point Encoder (EPE) that further explores event-point long spatiotemporal features in a Hilbert-scan way. Experiments on four datasets demonstrate the significant performance of our proposed EventCrab, particularly gaining improvements of 5.17% on SeAct and 7.01% on HARDVS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that existing event-based action recognition (EAR) methods fail to accommodate the dense temporal and sparse spatial properties of asynchronous event data, and proposes EventCrab as a synergy-aware framework that integrates lighter frame-specific networks for dense event frames with heavier point-specific networks for sparse event points. It introduces a Spiking-like Context Learner (SCL) and Hilbert-scan Event Point Encoder (EPE) for point embedding, establishes a joint frame-text-point representation space, and reports empirical gains including 5.17% on SeAct and 7.01% on HARDVS across four datasets.
Significance. If the results hold under rigorous validation, the work is significant for offering a practical hybrid approach to EAR that balances accuracy and efficiency while introducing a multimodal joint representation space. The empirical gains on multiple datasets constitute a concrete strength for an applied framework paper; the design of SCL and EPE as targeted modules for event-point properties is a clear contribution if the synergy is shown to function as claimed.
major comments (2)
- [Abstract / Method] Abstract and method description: the central claim that the proposed synergy (frame + point networks via SCL/EPE plus joint space) resolves the failure mode of prior regimes by accommodating dense-temporal/sparse-spatial properties without training conflicts is load-bearing but unsupported; no explicit analysis, constraint, or diagnostic is provided showing that joint optimization preserves distinct signals rather than allowing branch dominance or alignment artifacts.
- [Experiments] Experiments section: the reported improvements (5.17% on SeAct, 7.01% on HARDVS) are presented as evidence of the framework's effectiveness, yet without ablations isolating the contribution of the joint frame-text-point space, SCL, or EPE versus baseline combinations, it remains unclear whether the gains derive from the claimed synergy or from other implementation choices.
minor comments (2)
- [Abstract] The abstract refers to results on four datasets but names only SeAct and HARDVS; the full list of datasets and per-dataset breakdowns should be explicitly stated in the experiments section for completeness.
- [Method] The notation and architectural details for SCL and EPE would benefit from accompanying equations or pseudocode in the method section to clarify the spiking-like context extraction and Hilbert-scan encoding mechanisms.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We agree that strengthening the evidence for the claimed synergy mechanism and providing more targeted ablations will improve the manuscript. We will revise accordingly by adding the requested analyses and experiments.
read point-by-point responses
-
Referee: [Abstract / Method] Abstract and method description: the central claim that the proposed synergy (frame + point networks via SCL/EPE plus joint space) resolves the failure mode of prior regimes by accommodating dense-temporal/sparse-spatial properties without training conflicts is load-bearing but unsupported; no explicit analysis, constraint, or diagnostic is provided showing that joint optimization preserves distinct signals rather than allowing branch dominance or alignment artifacts.
Authors: We acknowledge that explicit diagnostics would better substantiate the claim that joint optimization preserves distinct frame and point signals. While the performance gains across datasets and the design of SCL and EPE are intended to address the dense-temporal/sparse-spatial properties, we will add in revision a dedicated analysis subsection. This will include t-SNE visualizations of the joint representation space, per-branch performance breakdowns, and training dynamics (e.g., gradient norms) to demonstrate that neither branch dominates nor that alignment artifacts arise. revision: yes
-
Referee: [Experiments] Experiments section: the reported improvements (5.17% on SeAct, 7.01% on HARDVS) are presented as evidence of the framework's effectiveness, yet without ablations isolating the contribution of the joint frame-text-point space, SCL, or EPE versus baseline combinations, it remains unclear whether the gains derive from the claimed synergy or from other implementation choices.
Authors: We agree that isolating the individual contributions of the joint frame-text-point space, SCL, and EPE is necessary to attribute gains specifically to the synergy. The manuscript already reports overall results and some module comparisons, but we will expand the experiments with new ablation tables that systematically remove or replace each component (joint space, SCL, EPE) while keeping other factors fixed. These will be added to clarify the source of the reported improvements. revision: yes
Circularity Check
No circularity; empirical architecture proposal validated on external datasets
full rationale
The paper proposes EventCrab as an empirical framework that combines frame-specific and point-specific networks via SCL, EPE, and a joint frame-text-point space, with performance shown via experiments on four datasets (e.g., +5.17% on SeAct). No equations, parameter fits, or self-citations are presented that reduce any claimed result to its own inputs by construction. The design choices address stated limitations of prior regimes through new components whose efficacy is measured externally rather than assumed or renamed from prior fits.
Axiom & Free-Parameter Ledger
invented entities (2)
-
Spiking-like Context Learner (SCL)
no independent evidence
-
Event Point Encoder (EPE)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A low power, fully event-based gesture recognition system
Arnon Amir, Brian Taba, David Berg, Timothy Melano, Jef- frey McKinstry, Carmelo Di Nolfo, Tapan Nayak, Alexander Andreopoulos, Guillaume Garreau, Marcela Mendoza, et al. A low power, fully event-based gesture recognition system. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7243–7252, 2017. 1, 5, 6
work page 2017
-
[2]
Eventtransact: A video transformer-based framework for event-camera based action recognition
Tristan de Blegiers, Ishan Rajendrakumar Dave, Adeel Yousaf, and Mubarak Shah. Eventtransact: A video transformer-based framework for event-camera based action recognition. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1–7. IEEE, 2023. 2, 6
work page 2023
-
[3]
Wei Fang, Yanqi Chen, Jianhao Ding, Ding Chen, Zhaofei Yu, Huihui Zhou, and Yonghong Tian. other contributors. spikingjelly, 2020. 6
work page 2020
-
[4]
Slowfast networks for video recognition
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6202–6211, 2019. 6
work page 2019
-
[5]
Hungry hungry hippos: To- wards language modeling with state space models
Daniel Y Fu, Tri Dao, Khaled K Saab, Armin W Thomas, Atri Rudra, and Christopher R´e. Hungry hungry hippos: To- wards language modeling with state space models. arXiv preprint arXiv:2212.14052, 2022. 3
-
[6]
Guillermo Gallego, Tobi Delbr ¨uck, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J Davison, J ¨org Conradt, Kostas Daniilidis, et al. Event-based vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):154–180, 2020. 1
work page 2020
-
[7]
Action recognition and benchmark using event cameras
Yue Gao, Jiaxuan Lu, Siqi Li, Nan Ma, Shaoyi Du, Yipeng Li, and Qionghai Dai. Action recognition and benchmark using event cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 6
work page 2023
-
[8]
Bridging video-text retrieval with multiple choice questions
Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, Xi- aohu Qie, and Ping Luo. Bridging video-text retrieval with multiple choice questions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 16167–16176, 2022. 8
work page 2022
-
[9]
A reservoir-based convolu- tional spiking neural network for gesture recognition from dvs input
Arun M George, Dighanchal Banerjee, Sounak Dey, Arijit Mukherjee, and P Balamurali. A reservoir-based convolu- tional spiking neural network for gesture recognition from dvs input. In International Joint Conference on Neural Net- works (IJCNN), pages 1–9. IEEE, 2020. 2
work page 2020
-
[10]
Samanwoy Ghosh-Dastidar and Hojjat Adeli. Spiking neural networks. International Journal of Neural Systems, 19(04): 295–308, 2009. 4, 7
work page 2009
-
[11]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023. 3, 5
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[12]
Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu, Karan Goel, and Christopher R ´e. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021. 3
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[13]
Stca: Spatio-temporal credit assignment with delayed feedback in deep spiking neural networks
Pengjie Gu, Rong Xiao, Gang Pan, and Huajin Tang. Stca: Spatio-temporal credit assignment with delayed feedback in deep spiking neural networks. In International Joint Confer- ence on Artificial Intelligence, pages 1366–1372, 2019. 6
work page 2019
-
[14]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016. 6
work page 2016
-
[15]
Spiking deep residual networks
Yangfan Hu, Huajin Tang, and Gang Pan. Spiking deep residual networks. IEEE Transactions on Neural Networks and Learning Systems, 34(8):5200–5205, 2021. 3
work page 2021
-
[16]
Temporal binary representa- tion for event-based action recognition
Simone Undri Innocenti, Federico Becattini, Federico Per- nici, and Alberto Del Bimbo. Temporal binary representa- tion for event-based action recognition. In 2020 25th Inter- national Conference on Pattern Recognition , pages 10426– 10432. IEEE, 2021. 2, 6
work page 2020
-
[17]
Point-voxel absorbing graph representation learning for event stream based recog- nition
Bo Jiang, Chengguo Yuan, Xiao Wang, Zhimin Bao, Lin Zhu, Yonghong Tian, and Jin Tang. Point-voxel absorbing graph representation learning for event stream based recog- nition. arXiv preprint arXiv:2306.05239, 2023. 2
-
[18]
Embodied Neuromorphic Vision with Event-Driven Random Backpropagation
Jacques Kaiser, Alexander Friedrich, J Tieck, Daniel Re- ichard, Arne Roennau, Emre Neftci, and R ¨udiger Dillmann. Embodied neuromorphic vision with event-driven random backpropagation. arXiv preprint arXiv:1904.04805 , 2019. 6
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[19]
Synap- tic plasticity dynamics for deep continuous local learning
Jacques Kaiser, Hesham Mostafa, and Emre Neftci. Synap- tic plasticity dynamics for deep continuous local learning. Frontiers in Neuroscience, 14:424, 2020. 6
work page 2020
-
[20]
Exposing and mitigating spurious correlations for cross-modal retrieval
Jae Myung Kim, A Koepke, Cordelia Schmid, and Zeynep Akata. Exposing and mitigating spurious correlations for cross-modal retrieval. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 2585–2595, 2023. 8
work page 2023
-
[21]
Adam: A Method for Stochastic Optimization
Diederik P Kingma. Adam: A method for stochastic opti- mization. arXiv preprint arXiv:1412.6980, 2014. 6
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[22]
Spikemba: Multi-modal spiking saliency mamba for temporal video grounding
Wenrui Li, Xiaopeng Hong, Ruiqin Xiong, and Xi- aopeng Fan. Spikemba: Multi-modal spiking saliency mamba for temporal video grounding. arXiv preprint arXiv:2404.01174, 2024. 3
-
[23]
Pointmamba: A simple state space model for point cloud analysis
Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, and Xiang Bai. Pointmamba: A simple state space model for point cloud analysis. arXiv preprint arXiv:2402.10739, 2024. 3
-
[24]
Tsm: Temporal shift module for efficient video understanding
Ji Lin, Chuang Gan, and Song Han. Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 7083–7093, 2019. 6
work page 2019
-
[25]
Event-based action recognition using motion informa- tion and spiking neural networks
Qianhui Liu, Dong Xing, Huajin Tang, De Ma, and Gang Pan. Event-based action recognition using motion informa- tion and spiking neural networks. InInternational Joint Con- ference on Artificial Intelligence, pages 1743–1749, 2021. 6
work page 2021
-
[26]
VMamba: Visual State Space Model
Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, and Yunfan Liu. Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166,
work page internal anchor Pith review Pith/arXiv arXiv
-
[27]
Tam: Temporal adaptive module for video recog- nition
Zhaoyang Liu, Limin Wang, Wayne Wu, Chen Qian, and Tong Lu. Tam: Temporal adaptive module for video recog- nition. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 13708–13718, 2021. 6
work page 2021
-
[28]
Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. Video swin transformer. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3202–3211, 2022. 6
work page 2022
-
[29]
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
Jun Ma, Feifei Li, and Bo Wang. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[30]
Jean-Matthieu Maro, Sio-Hoi Ieng, and Ryad Benosman. Event-based gesture recognition with dynamic background suppression using smartphone computational capabilities. Frontiers in Neuroscience, 14:275, 2020. 6
work page 2020
-
[31]
Neuromorphic vision datasets for pedestrian detection, action recognition, and fall detection
Shu Miao, Guang Chen, Xiangyu Ning, Yang Zi, Kejia Ren, Zhenshan Bing, and Alois Knoll. Neuromorphic vision datasets for pedestrian detection, action recognition, and fall detection. Frontiers in Neurorobotics, 13:38, 2019. 5, 6
work page 2019
-
[32]
Learn- ing transferable visual models from natural language super- vision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. In International Conference on Machine Learning , pages 8748–8763, 2021. 6
work page 2021
-
[33]
Exploring neuromorphic computing based on spiking neural networks: Algorithms to hardware
Nitin Rathi, Indranil Chakraborty, Adarsh Kosta, Abhronil Sengupta, Aayush Ankit, Priyadarshini Panda, and Kaushik Roy. Exploring neuromorphic computing based on spiking neural networks: Algorithms to hardware. ACM Computing Surveys, 55(12):1–49, 2023. 2
work page 2023
-
[34]
Events-to-video: Bringing modern computer vision to event cameras
Henri Rebecq, Ren ´e Ranftl, Vladlen Koltun, and Davide Scaramuzza. Events-to-video: Bringing modern computer vision to event cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 3857–3866, 2019. 1
work page 2019
-
[35]
High speed and high dynamic range video with an event camera
Henri Rebecq, Ren ´e Ranftl, Vladlen Koltun, and Davide Scaramuzza. High speed and high dynamic range video with an event camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(6):1964–1980, 2019. 1
work page 1964
-
[36]
Ttpoint: A tensorized point cloud network for lightweight action recognition with event cam- eras
Hongwei Ren, Yue Zhou, Haotian Fu, Yulong Huang, Ren- jing Xu, and Bojun Cheng. Ttpoint: A tensorized point cloud network for lightweight action recognition with event cam- eras. In Proceedings of the 31st ACM International Confer- ence on Multimedia, pages 8026–8034, 2023. 2
work page 2023
-
[37]
Spikepoint: An efficient point-based spiking neural network for event cam- eras action recognition
Hongwei Ren, Yue Zhou, Yulong Huang, Haotian Fu, Xi- aopeng Lin, Jie Song, and Bojun Cheng. Spikepoint: An efficient point-based spiking neural network for event cam- eras action recognition. arXiv preprint arXiv:2310.07189 ,
-
[38]
Alberto Sabater, Luis Montesano, and Ana C Murillo. Event transformer. a sparse-aware solution for efficient event data processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 2677– 2686, 2022. 6
work page 2022
-
[39]
Spikingres- former: Bridging resnet and vision transformer in spiking neural networks
Xinyu Shi, Zecheng Hao, and Zhaofei Yu. Spikingres- former: Bridging resnet and vision transformer in spiking neural networks. In Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition , pages 5610–5619, 2024. 2
work page 2024
-
[40]
Slayer: Spike layer error reassignment in time
Sumit B Shrestha and Garrick Orchard. Slayer: Spike layer error reassignment in time. Advances in Neural Information Processing Systems, 31, 2018. 6
work page 2018
-
[41]
Hierarchical long short-term concurrent memory for human interaction recognition
Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Wei Liu, and Jian Yang. Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Transactions on Pat- tern Analysis and Machine Intelligence , 43(3):1110–1118,
-
[42]
Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction
Xiangbo Shu, Liyan Zhang, Guo-Jun Qi, Wei Liu, and Jinhui Tang. Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence , 44(6):3300– 3315, 2021. 1
work page 2021
-
[43]
Simplified State Space Layers for Sequence Modeling
Jimmy TH Smith, Andrew Warrington, and Scott W Linder- man. Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933, 2022. 3
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[44]
Learning spatiotemporal features with 3d convolutional networks
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning spatiotemporal features with 3d convolutional networks. InProceedings of the IEEE Inter- national Conference on Computer Vision, pages 4489–4497,
-
[45]
A closer look at spatiotemporal convolutions for action recognition
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 6450–6459, 2018. 6
work page 2018
-
[46]
Unleashing the power of cnn and transformer for balanced rgb-event video recog- nition
Xiao Wang, Yao Rong, Shiao Wang, Yuan Chen, Zhe Wu, Bo Jiang, Yonghong Tian, and Jin Tang. Unleashing the power of cnn and transformer for balanced rgb-event video recog- nition. arXiv preprint arXiv:2312.11128, 2023. 6
-
[47]
Xiao Wang, Zongzhen Wu, Yao Rong, Lin Zhu, Bo Jiang, Jin Tang, and Yonghong Tian. Sstformer: bridging spiking neural network and memory support transformer for frame- event based recognition. arXiv preprint arXiv:2308.04369,
-
[48]
Hardvs: Re- visiting human activity recognition with dynamic vision sen- sors
Xiao Wang, Zongzhen Wu, Bo Jiang, Zhimin Bao, Lin Zhu, Guoqi Li, Yaowei Wang, and Yonghong Tian. Hardvs: Re- visiting human activity recognition with dynamic vision sen- sors. In Association for the Advancement of Artificial Intel- ligence, pages 5615–5623, 2024. 5, 6
work page 2024
-
[49]
Action-net: Multipath excitation for action recognition
Zhengwei Wang, Qi She, and Aljosa Smolic. Action-net: Multipath excitation for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 13214–13223, 2021. 6
work page 2021
-
[50]
Ziqing Wang, Yuetong Fang, Jiahang Cao, Qiang Zhang, Zhongrui Wang, and Renjing Xu. Masked spiking trans- former. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 1761–1771, 2023. 6
work page 2023
-
[51]
Ziming Wang, Ziling Wang, Huaning Li, Lang Qin, Run- hao Jiang, De Ma, and Huajin Tang. Eas-snn: End-to-end adaptive sampling and representation for event-based detec- tion with recurrent spiking neural networks. arXiv preprint arXiv:2403.12574, 2024. 4
-
[52]
An event-driven categorization model for aer im- age sensors using multispike encoding and learning
Rong Xiao, Huajin Tang, Yuhao Ma, Rui Yan, and Garrick Orchard. An event-driven categorization model for aer im- age sensors using multispike encoding and learning. IEEE Transactions on Neural Networks and Learning Systems, 31 (9):3649–3657, 2019. 6
work page 2019
-
[53]
Spiking neural networks and their applications: A review
Kashu Yamazaki, Viet-Khoa V o-Ho, Darshan Bulsara, and Ngan Le. Spiking neural networks and their applications: A review. Brain Sciences, 12(7):863, 2022. 2
work page 2022
-
[54]
Temporal-wise at- tention spiking neural networks for event streams classifica- tion
Man Yao, Huanhuan Gao, Guangshe Zhao, Dingheng Wang, Yihan Lin, Zhaoxu Yang, and Guoqi Li. Temporal-wise at- tention spiking neural networks for event streams classifica- tion. In Proceedings of the IEEE/CVF International Confer- ence on Computer Vision, pages 10221–10230, 2021. 4
work page 2021
-
[55]
Eventdance: Unsupervised source- free cross-modal adaptation for event-based object recogni- tion
Xu Zheng and Lin Wang. Eventdance: Unsupervised source- free cross-modal adaptation for event-based object recogni- tion. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 17448–17458,
-
[56]
Deep learning for event-based vision: A comprehensive survey and bench- marks
Xu Zheng, Yexin Liu, Yunfan Lu, Tongyan Hua, Tianbo Pan, Weiming Zhang, Dacheng Tao, and Lin Wang. Deep learning for event-based vision: A comprehensive survey and bench- marks. arXiv preprint arXiv:2302.08890, 2023. 1
-
[57]
E- clip: Towards label-efficient event-based open-world under- standing by clip
Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, and Lin Wang. E- clip: Towards label-efficient event-based open-world under- standing by clip. arXiv preprint arXiv:2308.03135, 2023. 6
-
[58]
Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, and Lin Wang. Ex- act: Language-guided conceptual reasoning and uncertainty estimation for event-based action recognition and more. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 18633–18643, 2024. 2, 5, 6, 7, 8
work page 2024
-
[59]
Spik- former: When spiking neural network meets transformer,
Zhaokun Zhou, Yuesheng Zhu, Chao He, Yaowei Wang, Shuicheng Yan, Yonghong Tian, and Li Yuan. Spikformer: When spiking neural network meets transformer. arXiv preprint arXiv:2209.15425, 2022. 5
-
[60]
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.