SpikeDet: Better Firing Patterns for Accurate and Energy-Efficient Object Detection with Spiking Neural Networks
Pith reviewed 2026-05-23 04:58 UTC · model grok-4.3
The pith
SpikeDet improves spiking neural network object detection by adjusting membrane inputs and multi-direction fusion to reduce local firing saturation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By redesigning the spiking backbone to adjust membrane synaptic input distributions layer by layer and adding multi-direction fusion in the neck, SpikeDet produces firing patterns with less local saturation, which directly raises feature quality for detection while lowering total spike rates and therefore energy use.
What carries the argument
MDSNet, which adjusts membrane synaptic input distribution at each layer to improve neuron firing patterns, together with the Spiking Multi-direction Fusion Module that performs multi-direction spiking feature fusion.
If this is right
- The same firing-pattern changes produce top results on event-based GEN1, underwater URPC 2019, low-light ExDARK, and dense CrowdHuman detection.
- Energy use drops by roughly half while AP rises 3.3 points over prior SNN detectors.
- LFSI provides a numeric way to track and target saturation during training or architecture search.
- Multi-scale detection improves because high-quality backbone features are preserved rather than lost to uniform high firing rates.
Where Pith is reading between the lines
- If membrane-input redistribution is the effective lever, the same principle could be tested on spiking versions of other vision backbones or on non-detection tasks.
- LFSI might become a standard training regularizer or early-stopping signal for any SNN model that shows saturation.
- Hardware implementations could exploit the lower average spike rates to reduce buffer sizes or clock speeds beyond what the paper measures.
- The approach leaves open whether similar saturation occurs in deeper or recurrent SNNs and whether the same modules would scale without retraining.
Load-bearing premise
Local firing saturation is the dominant limiter of accuracy and efficiency in current SNN detectors, and the MDSNet and SMFM changes fix it without creating new offsetting problems.
What would settle it
An ablation that removes the membrane-input adjustments from MDSNet or the multi-direction fusion from SMFM and shows detection AP and energy returning to the levels of previous SNN detectors on COCO.
Figures
read the original abstract
Spiking Neural Networks (SNNs) are the third generation of neural networks. They have gained widespread attention in object detection due to their low energy consumption and biological interpretability. However, existing SNN-based object detection methods suffer from local firing saturation, where adjacent neurons concurrently reach maximum firing rates, especially in object-centric regions. This abnormal neuron firing pattern reduces the feature discrimination capability and detection accuracy, while also increasing the firing rates that prevent SNNs from achieving their potential energy efficiency. To address this problem, we propose SpikeDet, a novel spiking object detector that optimizes firing patterns for accurate and energy-efficient detection. Specifically, we design a spiking backbone network, MDSNet, which effectively adjusts the membrane synaptic input distribution at each layer, achieving better neuron firing patterns during spiking feature extraction. For the neck, to better utilize and preserve these high-quality backbone features, we introduce the Spiking Multi-direction Fusion Module (SMFM), which realizes multi-direction fusion of spiking features, enhancing the multi-scale detection capability of the model. Furthermore, we propose the Local Firing Saturation Index (LFSI) to quantitatively measure local firing saturation. Experimental results validate the effectiveness of our method. On the COCO 2017 dataset, it achieves 52.2% AP, outperforming previous SNN-based methods by 3.3% AP while requiring only half the energy consumption. On object detection sub-tasks, including event-based GEN1, underwater URPC 2019, low-light ExDARK, and dense scene CrowdHuman datasets, SpikeDet also achieves the best performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SpikeDet, an SNN-based object detector addressing local firing saturation in existing methods. It introduces MDSNet as a spiking backbone that adjusts membrane synaptic input distributions per layer for improved firing patterns, SMFM for multi-directional spiking feature fusion in the neck to enhance multi-scale detection, and LFSI as a metric to quantify local firing saturation. On COCO 2017 it reports 52.2% AP (3.3% above prior SNN detectors) at half the energy; it also claims state-of-the-art results on GEN1, URPC 2019, ExDARK, and CrowdHuman.
Significance. If the experimental claims hold after verification, the work would be a meaningful advance for energy-efficient object detection with SNNs. The LFSI metric supplies a concrete, quantitative handle on firing-pattern quality that could be adopted more broadly, and the reported accuracy-energy trade-off on COCO plus generalization to event-based, low-light, and dense scenes would strengthen the case for SNNs in practical vision pipelines.
major comments (1)
- The central attribution—that MDSNet and SMFM directly mitigate local firing saturation and thereby produce the 3.3% AP gain and halved energy—rests on the experimental results. The abstract states that experiments validate this link, yet the provided text supplies no table or section reference to the required ablation (e.g., SpikeDet minus MDSNet, minus SMFM) or to the exact prior SNN baselines and energy-measurement protocol. Without those controls the causal claim cannot be assessed from the manuscript alone.
minor comments (2)
- Notation for membrane potential and synaptic input in the MDSNet description should be defined once with consistent symbols before any equations are used.
- Figure captions for the firing-pattern visualizations should explicitly state the layer and input image used so readers can reproduce the LFSI calculation.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive suggestion. The major comment correctly identifies that the causal link between our proposed components and the reported gains requires explicit experimental support. We address this below and will revise the manuscript to strengthen the presentation.
read point-by-point responses
-
Referee: [—] The central attribution—that MDSNet and SMFM directly mitigate local firing saturation and thereby produce the 3.3% AP gain and halved energy—rests on the experimental results. The abstract states that experiments validate this link, yet the provided text supplies no table or section reference to the required ablation (e.g., SpikeDet minus MDSNet, minus SMFM) or to the exact prior SNN baselines and energy-measurement protocol. Without those controls the causal claim cannot be assessed from the manuscript alone.
Authors: We agree that the manuscript must supply explicit ablations and protocol details to allow readers to evaluate the contribution of each component. The current version contains ablation experiments (Section 4.3) and energy-measurement details (Section 3.4 and Appendix B), but these are not cross-referenced from the abstract or introduction. In the revision we will (1) add a new Table 3 that reports AP and energy for SpikeDet, SpikeDet w/o MDSNet, and SpikeDet w/o SMFM on COCO 2017, (2) expand the baseline comparison paragraph in Section 4.2 to list the exact prior SNN detectors and their reported metrics, and (3) move the energy protocol description into the main text with a clear reference from the abstract. These changes will make the causal attribution verifiable without altering any experimental results. revision: yes
Circularity Check
No significant circularity
full rationale
The paper introduces architectural components (MDSNet backbone, SMFM neck module) and a diagnostic metric (LFSI) for spiking object detection, then reports empirical gains on COCO and other datasets. No derivation chain, first-principles predictions, or fitted parameters appear; performance claims rest on experimental outcomes rather than any quantity being redefined or forced by construction from its own inputs. No self-citation load-bearing steps or ansatz smuggling are present in the supplied text.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard SNN training assumptions such as surrogate gradient methods for backpropagation through spikes hold.
invented entities (3)
-
MDSNet
no independent evidence
-
SMFM
no independent evidence
-
LFSI
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Z. Huang, J. Ding, Z. Pan, H. Li, Y . Fang, Z. Yu, and J. K. Liu, “Converting high-performance and low-latency snns through explicit modeling of residual error in anns,”IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 9, pp. 16 788–16 802, 2025
work page 2025
-
[2]
Networks of spiking neurons: The third generation of neural network models,
W. Maass, “Networks of spiking neurons: The third generation of neural network models,”Neural Netw., vol. 10, no. 9, pp. 1659–1671, 1997
work page 1997
-
[3]
R. Zhang, L. Leng, K. Che, H. Zhang, J. Cheng, Q. Guo, J. Liao, and R. Cheng, “Accurate and efficient event-based semantic segmentation using adaptive spiking encoder–decoder network,”IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 5, pp. 9326–9340, 2025
work page 2025
-
[4]
Learning to time-decode in spiking neural networks through the information bottleneck,
N. Skatchkovsky, O. Simeone, and H. Jang, “Learning to time-decode in spiking neural networks through the information bottleneck,” inProc. Adv. Neural Inf. Process. Syst., Dec. 2021, pp. 17 049–17 059
work page 2021
-
[5]
Event-driven video restoration with spiking-convolutional architecture,
C. Cao, X. Fu, Y . Zhu, Z. Sun, and Z.-J. Zha, “Event-driven video restoration with spiking-convolutional architecture,”IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 1, pp. 866–880, 2025
work page 2025
-
[6]
Object detection with spiking neural networks on automotive event data,
L. Cordone, B. Miramond, and P. Thierion, “Object detection with spiking neural networks on automotive event data,” inProc. Int. Joint Conf. Neural Netw. (IJCNN), 2022, pp. 1–8
work page 2022
-
[7]
Spiking neural networks for frame-based and event-based single object localization,
S. Barchid, J. Mennesson, J. Eshraghian, C. Dj ´eraba, and M. Ben- namoun, “Spiking neural networks for frame-based and event-based single object localization,”Neurocomputing, vol. 559, Nov. 2023, art. no. 126805
work page 2023
-
[8]
A. Safa, F. Corradi, L. Keuninckx, I. Ocket, A. Bourdoux, F. Catthoor, and G. G. Gielen, “Improving the accuracy of spiking neural networks for radar gesture recognition through preprocessing,”IEEE Trans. Neu- ral Netw. Learn. Syst., vol. 34, no. 6, pp. 2869–2881, 2021
work page 2021
-
[9]
M. Yao, J. Hu, T. Hu, Y . Xu, Z. Zhou, Y . Tian, B. Xu, and G. Li, “Spike-driven transformer v2: Meta spiking neural network architecture inspiring the design of next-generation neuromorphic chips,” inProc. Int. Conf. Learn. Represent, May 2024, pp. 1–23
work page 2024
-
[10]
Scaling spike-driven transformer with efficient spike firing approximation training,
M. Yao, X. Qiu, T. Hu, J. Hu, Y . Chou, K. Tian, J. Liao, L. Leng, B. Xu, and G. Li, “Scaling spike-driven transformer with efficient spike firing approximation training,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 47, no. 4, pp. 2973–2990, Jan. 2025
work page 2025
-
[11]
Pay attention to them: Deep rein- forcement learning-based cascade object detection,
S. Liu, D. Huang, and Y . Wang, “Pay attention to them: Deep rein- forcement learning-based cascade object detection,”IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 7, pp. 2544–2556, 2019
work page 2019
-
[12]
Dpnet: Dual- path network for real-time object detection with lightweight attention,
Q. Zhou, H. Shi, W. Xiang, B. Kang, and L. J. Latecki, “Dpnet: Dual- path network for real-time object detection with lightweight attention,” IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 3, pp. 4504–4518, 2024
work page 2024
-
[13]
Spiking-yolo: Spiking neural network for energy-efficient object detection,
S. Kim, S. Park, B. Na, and S. Yoon, “Spiking-yolo: Spiking neural network for energy-efficient object detection,” inProc. AAAI Conf. Artif. Intell., vol. 34, Apr. 2020, pp. 11 270–11 277
work page 2020
-
[14]
Deep directly-trained spiking neural networks for object detection,
Q. Su, Y . Chou, Y . Hu, J. Li, S. Mei, Z. Zhang, and G. Li, “Deep directly-trained spiking neural networks for object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, pp. 6555–6565
work page 2023
-
[15]
Spikingvit: A multiscale spiking vision transformer model for event- based object detection,
L. Yu, H. Chen, Z. Wang, S. Zhan, J. Shao, Q. Liu, and S. Xu, “Spikingvit: A multiscale spiking vision transformer model for event- based object detection,”IEEE Trans. Cogn. Develop. Syst., vol. 17, no. 1, pp. 130–146, Jul. 2024. 13
work page 2024
-
[16]
Sfod: Spiking fusion object detector,
Y . Fan, W. Zhang, C. Liu, M. Li, and W. Lu, “Sfod: Spiking fusion object detector,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 17 191–17 200
work page 2024
-
[17]
Spiking neural network for ultralow-latency and high-accurate object detection,
J. Qu, Z. Gao, T. Zhang, Y . Lu, H. Tang, and H. Qiao, “Spiking neural network for ultralow-latency and high-accurate object detection,”IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 3, pp. 4934–4946, Mar. 2024
work page 2024
-
[18]
Z. Wang, Z. Wang, H. Li, L. Qin, R. Jiang, D. Ma, and H. Tang, “Eas- snn: End-to-end adaptive sampling and representation for event-based detection with recurrent spiking neural networks,” inProc. Eur. Conf. Comput. Vis. (ECCV), Sep. 2024, pp. 310–328
work page 2024
-
[19]
X. Luo, M. Yao, Y . Chou, B. Xu, and G. Li, “Integer-valued training and spike-driven inference spiking neural network for high-performance and energy-efficient object detection,” inProc. Eur. Conf. Comput. Vis. (ECCV), Sep. 2024, pp. 253–272
work page 2024
-
[20]
Fcos: Fully convolutional one- stage object detection,
Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional one- stage object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 9627–9636
work page 2019
-
[21]
S. Zhang, C. Chi, Y . Yao, Z. Lei, and S. Z. Li, “Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9759–9768
work page 2020
-
[22]
Advancing spiking neural networks toward deep residual learning,
Y . Hu, L. Deng, Y . Wu, M. Yao, and G. Li, “Advancing spiking neural networks toward deep residual learning,”IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 2, pp. 2353–2367, Feb. 2024
work page 2024
-
[23]
Microsoft coco: Common objects in context,
T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inProc. Eur. Conf. Comput. Vis. (ECCV), vol. 8693, 2014, pp. 740–755
work page 2014
-
[24]
Going deeper with directly-trained larger spiking neural networks,
H. Zheng, Y . Wu, L. Deng, Y . Hu, and G. Li, “Going deeper with directly-trained larger spiking neural networks,” inProc. AAAI Conf. Artif. Intell., vol. 35, Feb. 2021, pp. 11 062–11 070
work page 2021
-
[25]
Deep residual learning in spiking neural networks,
W. Fang, Z. Yu, Y . Chen, T. Huang, T. Masquelier, and Y . Tian, “Deep residual learning in spiking neural networks,”Proc. Adv. Neural Inf. Process. Syst., vol. 34, pp. 21 056–21 069, Dec. 2021
work page 2021
-
[26]
Rapid object detection using a boosted cascade of simple features,
P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 1, Dec. 2001, pp. I–I
work page 2001
-
[27]
Semi- supervised semantic segmentation with multi-constraint consistency learning,
J. Yin, T. Chen, G. Pei, H. Liu, Y . Yao, L. Nie, and X. Hua, “Semi- supervised semantic segmentation with multi-constraint consistency learning,”IEEE Trans. Multimedia, vol. 27, pp. 6449–6461, 2025
work page 2025
-
[28]
Uncertainty-participation context consistency learning for semi-supervised semantic segmen- tation,
J. Yin, Y . Chen, Z. Zheng, J. Zhou, and Y . Gu, “Uncertainty-participation context consistency learning for semi-supervised semantic segmen- tation,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP). IEEE, Apr. 2025, pp. 1–5
work page 2025
-
[29]
Proposal distribution calibration for few-shot object detection,
B. Li, C. Liu, M. Shi, X. Chen, X. Ji, and Q. Ye, “Proposal distribution calibration for few-shot object detection,”IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 1, pp. 1911–1918, 2025
work page 1911
-
[30]
G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics yolov8,” 2023. [Online]. Available: https://github.com/ultralytics/ultralytics
work page 2023
-
[31]
Gold- yolo: Efficient object detector via gather-and-distribute mechanism,
C. Wang, W. He, Y . Nie, J. Guo, C. Liu, Y . Wang, and K. Han, “Gold- yolo: Efficient object detector via gather-and-distribute mechanism,” Proc. Adv. Neural Inf. Process. Syst., vol. 36, pp. 51 094–51 112, Dec. 2023
work page 2023
-
[32]
G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024. [Online]. Available: https://github.com/ultralytics/ultralytics
work page 2024
-
[33]
End-to-end object detection with transformers,
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” inProc. Eur. Conf. Comput. Vis. (ECCV), Aug. 2020, vol. 12346, pp. 213–229
work page 2020
-
[34]
S. Kim, S. Park, B. Na, J. Kim, and S. Yoon, “Towards fast and accurate object detection in bio-inspired spiking neural networks through bayesian optimization,”IEEE Access, vol. 9, pp. 2633–2643, Nov. 2020
work page 2020
-
[35]
A. L. Hodgkin and A. F. Huxley, “A quantitative description of mem- brane current and its application to conduction and excitation in nerve,” J. Physiol., vol. 117, no. 4, p. 500, 1952
work page 1952
-
[36]
W. Gerstner, W. M. Kistler, R. Naud, and L. Paninski,Neuronal dynamics: From single neurons to networks and models of cognition. Cambridge University Press, 2014
work page 2014
-
[37]
Lapicque’s introduction of the integrate-and-fire model neuron (1907),
L. F. Abbott, “Lapicque’s introduction of the integrate-and-fire model neuron (1907),”Brain Res. Bull., vol. 50, no. 5-6, pp. 303–304, 1999
work page 1907
-
[38]
Detrs beat yolos on real-time object detection,
Y . Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y . Liu, and J. Chen, “Detrs beat yolos on real-time object detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 16 965– 16 974
work page 2024
-
[39]
Yolo-ms: Rethinking multi-scale representation learning for real-time object detection,
Y . Chen, X. Yuan, J. Wang, R. Wu, X. Li, Q. Hou, and M.-M. Cheng, “Yolo-ms: Rethinking multi-scale representation learning for real-time object detection,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 47, no. 6, pp. 4240–4252, Feb. 2025
work page 2025
-
[40]
Y . Kim, H. Park, A. Moitra, A. Bhattacharjee, Y . Venkatesha, and P. Panda, “Rate coding or direct coding: Which one is better for accurate, robust, and energy-efficient spiking neural networks?” inProc. IEEE Int. Conf. Acoustics, Speech, Signal Process., May 2022, pp. 71–75
work page 2022
-
[41]
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1026– 1034
work page 2015
-
[42]
Z. Chen, L. Deng, B. Wang, G. Li, and Y . Xie, “A comprehensive and modularized statistical framework for gradient norm equality in deep neural networks,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 13–31, Jul. 2020
work page 2020
-
[43]
Exponential expressivity in deep neural networks through transient chaos,
B. Poole, S. Lahiri, M. Raghu, J. Sohl-Dickstein, and S. Ganguli, “Exponential expressivity in deep neural networks through transient chaos,”Proc. Adv. Neural Inf. Process. Syst., vol. 29, Dec. 2016
work page 2016
-
[44]
Distance-iou loss: Faster and better learning for bounding box regression,
Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-iou loss: Faster and better learning for bounding box regression,” inProc. AAAI Conf. Artif. Intell., vol. 34, Apr. 2020, pp. 12 993–13 000
work page 2020
-
[45]
X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, “Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,” inProc. Adv. Neural Inf. Process. Syst., vol. 33, Dec. 2020, pp. 21 002–21 012
work page 2020
-
[46]
Focal loss for dense object detection,
T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2980–2988
work page 2017
-
[47]
Centernet++ for object detection,
K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “Centernet++ for object detection,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 5, pp. 3509–3521, Dec. 2023
work page 2023
-
[48]
Rtmdet: An empirical study of designing real-time object detectors,
C. Lyu, W. Zhang, H. Huang, Y . Zhou, Y . Wang, Y . Liu, S. Zhang, and K. Chen, “Rtmdet: An empirical study of designing real-time object detectors,” 2022, arXiv:2212.07784
-
[49]
mixup: Beyond Empirical Risk Minimization
H. Zhang, M. Cisse, Y . N. Dauphin, and D. Lopez-Paz, “Mixup: Beyond empirical risk minimization,” 2018, arXiv:1710.09412
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[50]
Feature pyramid networks for object detection,
T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2117–2125
work page 2017
-
[51]
Path aggregation network for instance segmentation,
S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 8759–8768
work page 2018
-
[52]
Efficientdet: Scalable and efficient ob- ject detection,
M. Tan, R. Pang, and Q. V . Le, “Efficientdet: Scalable and efficient ob- ject detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 10 781–10 790
work page 2020
-
[53]
Tood: Task-aligned one-stage object detection,
C. Feng, Y . Zhong, Y . Gao, M. R. Scott, and W. Huang, “Tood: Task-aligned one-stage object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 3490–3499
work page 2021
-
[54]
Generalized concordant vision transformer with masked image tokens for object detection,
Y . Quan, D. Zhang, and J. Tang, “Generalized concordant vision transformer with masked image tokens for object detection,”IEEE Trans. Circuits Syst. Video Technol., vol. 35, no. 11, pp. 10 616–10 631, 2025
work page 2025
-
[55]
YOLOv12: Attention-Centric Real-Time Object Detectors
Y . Tian, Q. Ye, and D. Doermann, “Yolov12: Attention-centric real-time object detectors,” 2025, arXiv:2502.12524
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[56]
A mul- tisynaptic spiking neuron for simultaneously encoding spatiotemporal dynamics,
L. Fan, H. Shen, X. Lian, Y . Li, M. Yao, G. Li, and D. Hu, “A mul- tisynaptic spiking neuron for simultaneously encoding spatiotemporal dynamics,”Nature Commun., vol. 16, no. 1, 2025, Art. no. 7155
work page 2025
-
[57]
A large scale event-based detection dataset for automotive,
P. de Tournemire, D. Nitti, E. Perot, D. Migliore, and A. Sironi, “A large scale event-based detection dataset for automotive,” 2020, arXiv:2001.08499
-
[58]
Ulo: An underwater light-weight object detector for edge computing,
L. Wang, X. Ye, S. Wang, and P. Li, “Ulo: An underwater light-weight object detector for edge computing,”Machines, vol. 10, no. 8, Jul. 2022, Art. no. 629
work page 2022
-
[59]
Getting to know low-light images with the exclusively dark dataset,
Y . P. Loh and C. S. Chan, “Getting to know low-light images with the exclusively dark dataset,”Comput. Vis. Image Underst., vol. 178, pp. 30–42, 2019
work page 2019
-
[60]
CrowdHuman: A Benchmark for Detecting Human in a Crowd
S. Shao, Z. Zhao, B. Li, T. Xiao, G. Yu, X. Zhang, and J. Sun, “Crowdhuman: A benchmark for detecting human in a crowd,” 2018, arXiv:1805.00123
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[61]
Learning to detect objects with a 1 megapixel event camera,
E. Perot, P. De Tournemire, D. Nitti, J. Masci, and A. Sironi, “Learning to detect objects with a 1 megapixel event camera,”Proc. Adv. Neural Inf. Process. Syst., vol. 33, pp. 16 639–16 652, Dec. 2020
work page 2020
-
[62]
Recurrent vision transformers for object detection with event cameras,
M. Gehrig and D. Scaramuzza, “Recurrent vision transformers for object detection with event cameras,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 13 884–13 893
work page 2023
-
[63]
State space models for event cameras,
N. Zubic, M. Gehrig, and D. Scaramuzza, “State space models for event cameras,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 5819–5828. 14
work page 2024
-
[64]
Optimization and application of improved yolov9s-ui for underwater object detection,
W. Pan, J. Chen, B. Lv, and L. Peng, “Optimization and application of improved yolov9s-ui for underwater object detection,”Appl. Sci., vol. 14, no. 16, Aug. 2024, Art. no. 7162
work page 2024
-
[65]
Su-yolo: Spiking neu- ral network for efficient underwater object detection,
C. Li, W. Liu, G. Gong, X. Ding, and X. Zhong, “Su-yolo: Spiking neu- ral network for efficient underwater object detection,”Neurocomputing, vol. 644, Sep. 2025, art. no. 130310
work page 2025
-
[66]
Understanding the difficulty of training deep feedforward neural networks,
X. Glorot and Y . Bengio, “Understanding the difficulty of training deep feedforward neural networks,” inProc. Int. Conf. Artif. Intell. Stat. (AISTATS). JMLR Workshop and Conference Proceedings, 2010, pp. 249–256
work page 2010
-
[67]
Deep Residual Networks and Weight Initialization
M. Taki, “Deep residual networks and weight initialization,” 2017, arXiv:1709.02956. APPENDIXA PROOF OF THEPROPOSITIONS A. Proof of Proposition 1 Proposition 1.Fordstacked LCB layers (d≥1) with arbitrary kernel sizes, under the convolution zero-mean weight and zero bias initialization, the output at the(n+d)-th layer yt,n+d is uncorrelated with the input ...
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[68]
Datasets Introduction:The GEN1 dataset [57] repre- sents the initial large-scale collection for object detection using event cameras. It comprises car footage spanning over 39 hours, captured by the GEN1 device with a spatial resolution of 304×240. The dataset includes bounding box annotations for vehicles and pedestrians, provided at rates of 1 to 4Hz. T...
work page 2019
-
[69]
More Implementation Details:The MDSNet structure is detailed in Table X. For SpikeDet-L, we increase the channel dimensions of MDSNet104 by a factor of 1.25 to improve model performance. On the Gen1 dataset, we employ the zoom-in and zoom-out augmentation strategies from [62]. The model is trained for 100 epochs with a batch size of
-
[70]
For URPC 2019, ExDARK, and CrowdHuman datasets, we employ mosaic augmentation. Specifically, following [58], [64], [65], we resize images to 320×320 for URPC 2019, while using 640×640 for the other datasets. All models are trained for 300 epochs with a batch size of 64. APPENDIXC MOREVISUALIZATION In this section, we present the visualization results of o...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.