DarkShake-DVS: Event-based Human Action Recognition under Low-light andShaking Camera Conditions
Pith reviewed 2026-05-21 05:34 UTC · model grok-4.3
The pith
Event cameras paired with IMU motion compensation enable reliable human action recognition in low-light and shaking conditions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that an Event-IMU Stabilized HAR (EIS-HAR) system, built around a non-linear warping function derived from synchronized IMU measurements to produce motion-compensated event frames and a four-stage hybrid network to extract spatiotemporal features, achieves consistent gains over state-of-the-art methods on the newly introduced DarkShake-DVS benchmark and two other datasets.
What carries the argument
The EIS module, which derives a non-linear warping function from IMU data to reconstruct motion-compensated event frames for input to the downstream HAR network.
If this is right
- Action recognition systems can now be deployed in low-light environments with unconstrained handheld or vehicle-mounted cameras.
- The DarkShake-DVS dataset becomes a standard testbed for evaluating event-based methods under combined darkness and 6-DoF motion.
- The four-stage hybrid architecture provides an efficient way to process the high-temporal-resolution data produced by compensated event streams.
- Synchronized IMU data becomes a standard auxiliary input for any event-camera pipeline that must handle camera ego-motion.
Where Pith is reading between the lines
- The same warping technique could be tested on other event-based tasks such as gesture spotting or object tracking in moving cameras.
- Combining the compensated events with conventional RGB frames might further improve performance in mixed lighting without requiring full sensor replacement.
- If the non-linear model proves robust, similar compensation could be applied to event data in robotics or autonomous vehicles where IMU readings are already available.
Load-bearing premise
The non-linear warping function derived from IMU measurements produces motion-compensated event frames whose spatiotemporal statistics stay close enough to the original events that the four-stage network can still extract reliable action features.
What would settle it
Running the four-stage network on raw unwarped event frames from the DarkShake-DVS dataset and obtaining equal or higher accuracy than the full EIS-HAR pipeline would falsify the claim that the IMU-based compensation step is what drives the reported gains.
Figures
read the original abstract
Human Action Recognition (HAR) is a fundamental computer vision task with diverse real-world applications. Practical deployments often involve low-light environments and unconstrained 6-DoF camera motion, conditions that degrade visual quality, disrupt temporal coherence, and compromise reliability of existing methods. Event cameras, with high low-light sensitivity and microsecond-level temporal resolution, paired with an inertial measurement unit (IMU), present a promising solution. However, current research faces two key challenges: absence of a benchmark integrating low-light conditions, 6-DoF motion, and synchronized IMU data; and lack of effective motion compensation techniques. To address these, we propose Event-IMU Stabilized HAR (EIS-HAR), with two modules. The first is an EIS module that reduces motion blur via a non-linear warping function to reconstruct a motion-compensated input. The second is a HAR module with a four-stage hybrid architecture to efficiently extract spatiotemporal features for accurate action recognition. To alleviate data scarcity, we introduce DarkShake-DVS, the first large-scale event-based HAR benchmark that includes 18,041 realworld clips captured in low light and intense 6-DoF motion, supplemented by synchronized IMU data. Extensive experiments on three datasets demonstrate consistent superiority of EIS-HAR over state-of-the-art methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DarkShake-DVS, a new large-scale event-based HAR benchmark with 18,041 real-world clips under low-light and 6-DoF shaking conditions with synchronized IMU data. It proposes EIS-HAR consisting of an EIS module that applies a non-linear warping function derived from IMU measurements to produce motion-compensated event frames, followed by a four-stage hybrid HAR network for spatiotemporal feature extraction. The central claim is that extensive experiments on three datasets demonstrate consistent superiority of EIS-HAR over state-of-the-art methods.
Significance. If the superiority claim holds after addressing validation gaps, the work would be significant for practical event-based vision: it supplies the first benchmark integrating low-light, intense 6-DoF motion, and IMU synchronization, and demonstrates a concrete pipeline that combines IMU-driven compensation with a hybrid network. The empirical nature (no free parameters or closed-form derivations) is offset by the introduction of a reproducible dataset and the potential for real-world deployment in robotics or surveillance.
major comments (2)
- [EIS module] EIS module description: the superiority claim requires that the non-linear IMU warping produces compensated frames whose polarity, timing, and spatial density remain sufficiently close to clean low-light events for the downstream four-stage HAR network to extract reliable features. No quantitative check (e.g., event-rate histograms, polarity distribution statistics, or optical-flow consistency before/after warping) is reported to confirm this invariance; without it the performance gains could arise from network exploitation of warping-induced artifacts rather than true motion compensation.
- [Experimental results] Experimental results (abstract and § on datasets): the abstract states consistent outperformance on three datasets but supplies no error bars, statistical significance tests, or ablation isolating the warping function from the network architecture. This omission leaves the central claim plausible yet incompletely supported, as the contribution of each module cannot be quantified.
minor comments (2)
- [Title] The title contains a missing space: 'Low-light andShaking'.
- [HAR module] Clarify the exact four-stage architecture of the HAR module (e.g., which layers are convolutional vs. recurrent) and provide a diagram or pseudocode for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and indicate the revisions we will incorporate to strengthen the manuscript.
read point-by-point responses
-
Referee: [EIS module] EIS module description: the superiority claim requires that the non-linear IMU warping produces compensated frames whose polarity, timing, and spatial density remain sufficiently close to clean low-light events for the downstream four-stage HAR network to extract reliable features. No quantitative check (e.g., event-rate histograms, polarity distribution statistics, or optical-flow consistency before/after warping) is reported to confirm this invariance; without it the performance gains could arise from network exploitation of warping-induced artifacts rather than true motion compensation.
Authors: We agree that explicit quantitative validation of the warping step would more directly support the claim that performance gains derive from motion compensation. In the revised manuscript we will add event-rate histograms, polarity distribution statistics, and optical-flow consistency metrics computed before and after the non-linear IMU warping on representative sequences from DarkShake-DVS. These analyses will be placed in the EIS-module subsection to demonstrate that the compensated frames retain the essential statistical properties of the original low-light events. revision: yes
-
Referee: [Experimental results] Experimental results (abstract and § on datasets): the abstract states consistent outperformance on three datasets but supplies no error bars, statistical significance tests, or ablation isolating the warping function from the network architecture. This omission leaves the central claim plausible yet incompletely supported, as the contribution of each module cannot be quantified.
Authors: We acknowledge that the current presentation would benefit from greater statistical rigor and component-wise quantification. In the revision we will (i) report mean accuracy together with standard deviation across multiple random seeds in all tables, (ii) add paired statistical significance tests (e.g., t-tests) against the strongest baselines, and (iii) include a dedicated ablation that compares the full EIS-HAR pipeline against an identical four-stage network trained on unwarped event frames. The abstract will be updated to reference these additional controls. revision: yes
Circularity Check
Empirical pipeline validated on external benchmarks with no derivation reducing to self-inputs
full rationale
The paper describes an empirical method: an EIS module applying non-linear IMU-based warping for motion compensation, followed by a four-stage hybrid HAR network for feature extraction. A new dataset DarkShake-DVS is introduced, and performance is measured via experiments on three datasets against prior SOTA baselines. No equations, fitted parameters, or self-citations are shown to reduce the reported accuracy gains to quantities defined inside the paper by construction. The central claims rest on external dataset results rather than internal redefinitions or self-referential premises.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption IMU measurements provide sufficiently accurate 6-DoF pose to drive a non-linear warping that reduces motion blur in event data
Reference graph
Works this paper leans on
-
[1]
A low power, fully event-based gesture recognition system
Arnon Amir, Brian Taba, David Berg, Timothy Melano, Jef- frey McKinstry, Carmelo Di Nolfo, Tapan Nayak, Alexander Andreopoulos, Guillaume Garreau, Marcela Mendoza, et al. A low power, fully event-based gesture recognition system. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7243–7252, 2017. 2, 3
work page 2017
-
[2]
Is space-time attention all you need for video understanding? InIcml, page 4, 2021
Gedas Bertasius, Heng Wang, and Lorenzo Torresani. Is space-time attention all you need for video understanding? InIcml, page 4, 2021. 7, 8
work page 2021
-
[3]
Christian Brandli, Raphael Berner, Minhao Yang, Shih-Chii Liu, and Tobi Delbruck. A 240 × 180 130 db 3 µs latency global shutter spatiotemporal vision sensor.IEEE Journal of Solid-State Circuits, 49(10):2333–2341, 2014. 1
work page 2014
-
[4]
Optimal ann- snn conversion for high-accuracy and ultra-low-latency spiking neural networks,
Tong Bu, Wei Fang, Jianhao Ding, PengLin Dai, Zhaofei Yu, and Tiejun Huang. Optimal ann-snn conversion for high- accuracy and ultra-low-latency spiking neural networks. arXiv preprint arXiv:2303.04347, 2023. 3
-
[5]
Yongqiang Cao, Yang Chen, and Deepak Khosla. Spiking deep convolutional neural networks for energy-efficient ob- ject recognition.International Journal of Computer Vision, 113:54–66, 2015. 3
work page 2015
-
[6]
Spikmamba: When snn meets mamba in event-based human action recognition
Jiaqi Chen, Yan Yang, Shizhuo Deng, Da Teng, and Liyuan Pan. Spikmamba: When snn meets mamba in event-based human action recognition. InProceedings of the 6th ACM International Conference on Multimedia in Asia, pages 1–8,
-
[7]
Tobi Delbruck, Vicente Villanueva, and Luca Longinotti. In- tegration of dynamic vision sensor with inertial measurement unit for electronically stabilized event-based vision. In2014 IEEE International Symposium on Circuits and Systems (IS- CAS), pages 2636–2639. IEEE, 2014. 2
work page 2014
-
[8]
Dy- namic obstacle avoidance for quadrotors with event cameras
Davide Falanga, Kevin Kleber, and Davide Scaramuzza. Dy- namic obstacle avoidance for quadrotors with event cameras. Science Robotics, 5(40):eaaz9712, 2020. 2
work page 2020
-
[9]
Slowfast networks for video recognition
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6202–6211, 2019. 7, 8
work page 2019
-
[10]
Guillermo Gallego and Davide Scaramuzza. Accurate angu- lar velocity estimation with an event camera.IEEE Robotics and Automation Letters, 2(2):632–639, 2017. 2
work page 2017
-
[11]
Guillermo Gallego, Henri Rebecq, and Davide Scaramuzza. A unifying contrast maximization framework for event cam- eras, with applications to motion, depth, and optical flow estimation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3867–3876,
-
[12]
Yue Gao, Jiaxuan Lu, Siqi Li, Nan Ma, Shaoyi Du, Yipeng Li, and Qionghai Dai. Action recognition and benchmark using event cameras.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 2, 3
work page 2023
-
[13]
End-to-end learning of repre- sentations for asynchronous event-based data
Daniel Gehrig, Antonio Loquercio, Konstantinos G Derpa- nis, and Davide Scaramuzza. End-to-end learning of repre- sentations for asynchronous event-based data. InProceed- ings of the IEEE/CVF international conference on computer vision, pages 5633–5643, 2019. 7
work page 2019
-
[14]
A reservoir-based convolu- tional spiking neural network for gesture recognition from dvs input
Arun M George, Dighanchal Banerjee, Sounak Dey, Arijit Mukherjee, and P Balamurali. A reservoir-based convolu- tional spiking neural network for gesture recognition from dvs input. In2020 International Joint Conference on Neural Networks (IJCNN), pages 1–9. IEEE, 2020. 3
work page 2020
-
[15]
Ternary spike: Learning ternary spikes for spiking neural networks
Yufei Guo, Yuanpei Chen, Xiaode Liu, Weihang Peng, Yuhan Zhang, Xuhui Huang, and Zhe Ma. Ternary spike: Learning ternary spikes for spiking neural networks. InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 12244–12252, 2024. 3
work page 2024
-
[16]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 7, 8
work page 2016
-
[17]
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 3d convolu- tional neural networks for human action recognition.IEEE transactions on pattern analysis and machine intelligence, 35(1):221–231, 2012. 7
work page 2012
-
[18]
Chankyu Lee, Priyadarshini Panda, Gopalakrishnan Srini- vasan, and Kaushik Roy. Training deep spiking convolu- tional neural networks with stdp-based unsupervised pre- training followed by supervised fine-tuning.Frontiers in neuroscience, 12:435, 2018. 3
work page 2018
-
[19]
Videomamba: State space model for efficient video understanding
Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, Limin Wang, and Yu Qiao. Videomamba: State space model for efficient video understanding. InEuropean conference on computer vision, pages 237–255. Springer, 2024. 2, 8
work page 2024
-
[20]
Patrick Lichtsteiner, Christoph Posch, and Tobi Delbruck. A 128×128 120 db 15µs latency asynchronous temporal con- trast vision sensor.IEEE Journal of Solid-State Circuits, 43 (2):566–576, 2008. 1
work page 2008
-
[21]
Tsm: Temporal shift module for efficient video understanding
Ji Lin, Chuang Gan, and Song Han. Tsm: Temporal shift module for efficient video understanding. InProceedings of the IEEE/CVF international conference on computer vision, pages 7083–7093, 2019. 7, 8
work page 2019
-
[22]
Storyboard-guided alignment for fine-grained video action recognition
Enqi Liu, Liyuan Pan, Yan Yang, Yiran Zhong, Zhijing Wu, Xinxiao Wu, and Liu Liu. Storyboard-guided alignment for fine-grained video action recognition. InThe Thirty-ninth Annual Conference on Neural Information Processing Sys- tems, 2025. 1
work page 2025
-
[23]
Event-based action recognition using motion informa- tion and spiking neural networks
Qianhui Liu, Dong Xing, Huajin Tang, De Ma, and Gang Pan. Event-based action recognition using motion informa- tion and spiking neural networks. InIJCAI, pages 1743– 1749, 2021. 3
work page 2021
-
[24]
Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, Jianbin Jiao, and Yunfan Liu. Vmamba: Visual state space model.Advances in neural information processing systems, 37:103031–103063, 2024. 8
work page 2024
-
[25]
Tam: Temporal adaptive module for video recogni- tion
Zhaoyang Liu, Limin Wang, Wayne Wu, Chen Qian, and Tong Lu. Tam: Temporal adaptive module for video recogni- tion. InProceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV), pages 13708–13718, 2021. 7, 8
work page 2021
-
[26]
Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. Video swin transformer. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3202–3211, 2022. 2, 6, 7, 8
work page 2022
-
[27]
Sgdr: Stochastic gradient descent with warm restarts
Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. InInternational Conference on Learning Representations, 2017. 6
work page 2017
-
[28]
Qualitative action recog- nition by wireless radio signals in human–machine systems
Shaohe Lv, Yong Lu, Mianxiong Dong, Xiaodong Wang, Yong Dou, and Weihua Zhuang. Qualitative action recog- nition by wireless radio signals in human–machine systems. IEEE Transactions on Human-Machine Systems, 47(6):789– 800, 2017. 1
work page 2017
-
[29]
Event-based moving object detection and tracking
Anton Mitrokhin, Cornelia Ferm ¨uller, Chethan Paramesh- wara, and Yiannis Aloimonos. Event-based moving object detection and tracking. In2018 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), pages 1–9. IEEE, 2018. 2, 7, 8
work page 2018
-
[30]
Garrick Orchard, Ajinkya Jayawant, Gregory K Cohen, and Nitish Thakor. Converting static image datasets to spiking neuromorphic datasets using saccades.Frontiers in neuro- science, 9:437, 2015. 2
work page 2015
-
[31]
Bringing a blurry frame alive at high frame-rate with an event camera
Liyuan Pan, Cedric Scheerlinck, Xin Yu, Richard Hartley, Miaomiao Liu, and Yuchao Dai. Bringing a blurry frame alive at high frame-rate with an event camera. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2
work page 2019
-
[32]
Liyuan Pan, Richard Hartley, Cedric Scheerlinck, Miaomiao Liu, Xin Yu, and Yuchao Dai. High frame rate video re- construction based on an event camera.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5):2519– 2533, 2020. 1
work page 2020
-
[33]
Single image optical flow estimation with an event camera
Liyuan Pan, Miaomiao Liu, and Richard Hartley. Single image optical flow estimation with an event camera. In 2020 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 1669–1678. IEEE, 2020. 1
work page 2020
-
[34]
0- mms: Zero-shot multi-motion segmentation with a monocu- lar event camera
Chethan M Parameshwara, Nitin J Sanket, Chahat Deep Singh, Cornelia Ferm ¨uller, and Yiannis Aloimonos. 0- mms: Zero-shot multi-motion segmentation with a monocu- lar event camera. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 9594–9600. IEEE,
-
[35]
Get: Group event transformer for event-based vision
Yansong Peng, Yueyi Zhang, Zhiwei Xiong, Xiaoyan Sun, and Feng Wu. Get: Group event transformer for event-based vision. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 6038–6048, 2023. 7
work page 2023
-
[36]
Dayong Ren, Zhe Ma, Yuanpei Chen, Weihang Peng, Xiaode Liu, Yuhan Zhang, and Yufei Guo. Spiking pointnet: Spik- ing neural networks for point clouds.Advances in Neural Information Processing Systems, 36, 2024. 3
work page 2024
-
[37]
Alberto Sabater, Luis Montesano, and Ana C Murillo. Event transformer. a sparse-aware solution for efficient event data processing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2677– 2686, 2022. 3
work page 2022
-
[38]
Nicholas Soures and Dhireesha Kudithipudi. Deep liquid state machines with neural plasticity for video activity recog- nition.Frontiers in neuroscience, 13:686, 2019. 3
work page 2019
-
[39]
Event-based motion segmentation by motion compensation
Timo Stoffregen, Guillermo Gallego, Tom Drummond, Lindsay Kleeman, and Davide Scaramuzza. Event-based motion segmentation by motion compensation. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 7244–7253, 2019. 2
work page 2019
-
[40]
A closer look at spatiotemporal convolutions for action recognition
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. A closer look at spatiotemporal convolutions for action recognition. InProceedings of the IEEE conference on Computer Vision and Pattern Recogni- tion, pages 6450–6459, 2018. 7
work page 2018
-
[41]
Dailydvs-200: A comprehen- sive benchmark dataset for event-based action recognition
Qi Wang, Zhou Xu, Yuming Lin, Jingtao Ye, Hongsheng Li, Guangming Zhu, Syed Afaq Ali Shah, Mohammed Ben- namoun, and Liang Zhang. Dailydvs-200: A comprehen- sive benchmark dataset for event-based action recognition. InEuropean Conference on Computer Vision, pages 55–72. Springer, 2024. 1, 6
work page 2024
-
[42]
Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, and Yonghong Tian. Event stream based human action recognition: a high-definition benchmark dataset and algo- rithms.arXiv preprint arXiv:2408.09764, 2024. 2, 3, 7
-
[43]
Hardvs: Re- visiting human activity recognition with dynamic vision sen- sors
Xiao Wang, Zongzhen Wu, Bo Jiang, Zhimin Bao, Lin Zhu, Guoqi Li, Yaowei Wang, and Yonghong Tian. Hardvs: Re- visiting human activity recognition with dynamic vision sen- sors. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5615–5623, 2024. 2, 6, 7, 8
work page 2024
-
[44]
Action-net: Multipath excitation for action recognition
Zhengwei Wang, Qi She, and Aljosa Smolic. Action-net: Multipath excitation for action recognition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13214–13223, 2021. 7
work page 2021
-
[45]
Bochen Xie, Yongjian Deng, Zhanpeng Shao, Hai Liu, Qing- song Xu, and Youfu Li. Event voxel set transformer for spa- tiotemporal representation learning on event streams.arXiv preprint arXiv:2303.03856, 2023. 3
-
[46]
Jianyang Xie, Yitian Zhao, Yanda Meng, He Zhao, Anh Nguyen, and Yalin Zheng. Are spatial-temporal graph convolution networks for human action recognition over- parameterized? InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 24309–24319, 2025. 1
work page 2025
-
[47]
Long-Hao Yang, Fei-Fei Ye, Chris Nugent, Jun Liu, and Ying-Ming Wang. Belief-rule-based system with self- organizing and multi-temporal modeling for sensor-based human activity recognition.IEEE Journal of Biomedical and Health Informatics, 29(2):1062–1073, 2025. 1
work page 2025
-
[48]
Event camera data pre-training
Yan Yang, Liyuan Pan, and Liu Liu. Event camera data pre-training. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10699– 10709, 2023. 1
work page 2023
-
[49]
Ezsr: Event-based zero-shot recognition
Yan Yang, Liyuan Pan, Dongxu Li, and Liu Liu. Ezsr: Event-based zero-shot recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4628–4638, 2025. 2
work page 2025
-
[50]
Event camera data dense pre-training
Yan Yang, Liyuan Pan, and Liu Liu. Event camera data dense pre-training. InComputer Vision – ECCV 2024, pages 292– 310, Cham, 2025. Springer Nature Switzerland. 1
work page 2024
-
[51]
Event-based few-shot fine-grained human action recognition
Zonglin Yang, Yan Yang, Yuheng Shi, Hao Yang, Ruikun Zhang, Liu Liu, Xinxiao Wu, and Liyuan Pan. Event-based few-shot fine-grained human action recognition. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 519–526. IEEE, 2024. 1
work page 2024
-
[52]
Spike-driven transformer.Ad- vances in neural information processing systems, 36:64043– 64058, 2023
Man Yao, Jiakui Hu, Zhaokun Zhou, Li Yuan, Yonghong Tian, Bo Xu, and Guoqi Li. Spike-driven transformer.Ad- vances in neural information processing systems, 36:64043– 64058, 2023. 7
work page 2023
-
[53]
Xugao Yu and Mohammed A. A. Al-qaness. Human ac- tivity recognition using deep residual convolutional network based on wearable sensors.IEEE Journal of Biomedical and Health Informatics, 29(3):1950–1958, 2025. 1
work page 1950
-
[54]
Renjie Zhang, Di Lin, Xin Wang, George Baciu, C. L. Philip Chen, and Ping Li. Accurate-pgnet: Learning to assemble perceptual body parts for accurate human skeleton establish- ment.IEEE Transactions on Multimedia, 27:1706–1721,
-
[55]
Event-based real-time moving object detection based on imu ego-motion compensation
Chunhui Zhao, Yakun Li, and Yang Lyu. Event-based real-time moving object detection based on imu ego-motion compensation. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 690–696. IEEE,
-
[56]
Hanyu Zhou, Zhiwei Shi, Hao Dong, Shihan Peng, Yi Chang, and Luxin Yan. Jstr: Joint spatio-temporal reason- ing for event-based moving object detection.arXiv preprint arXiv:2403.07436, 2024. 2
-
[57]
Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, and Lin Wang. Ex- act: Language-guided conceptual reasoning and uncertainty estimation for event-based action recognition and more. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 18633–18643,
-
[58]
Yi Zhou, Guillermo Gallego, Xiuyuan Lu, Siqi Liu, and Shaojie Shen. Event-based motion segmentation with spatio- temporal graph cuts.IEEE transactions on neural networks and learning systems, 34(8):4868–4880, 2021. 2, 5
work page 2021
-
[59]
Spikformer: When spiking neural network meets transformer
Zhaokun Zhou, Yuesheng Zhu, Chao He, Yaowei Wang, Shuicheng Y AN, Yonghong Tian, and Li Yuan. Spikformer: When spiking neural network meets transformer. InThe Eleventh International Conference on Learning Representa- tions. 2, 7, 8
-
[60]
Semantic-guided cross-modal prompt learning for skeleton-based zero-shot action recognition
Anqi Zhu, Jingmin Zhu, James Bailey, Mingming Gong, and Qiuhong Ke. Semantic-guided cross-modal prompt learning for skeleton-based zero-shot action recognition. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13876–13885, 2025. 1
work page 2025
-
[61]
Vision mamba: efficient visual representation learning with bidirectional state space model
Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: efficient visual representation learning with bidirectional state space model. InProceedings of the 41st International Conference on Machine Learning, pages 62429–62442, 2024. 8
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.