pith. sign in

arxiv: 2504.05679 · v2 · submitted 2025-04-08 · 💻 cs.CV

Event-based Civil Infrastructure Visual Defect Detection: ev-CIVIL Dataset and Benchmark

Pith reviewed 2026-05-22 20:48 UTC · model grok-4.3

classification 💻 cs.CV
keywords event-based visioncivil infrastructuredefect detectiondynamic vision sensorcrack detectionspallingUAV inspectiondataset
0
0 comments X

The pith

Event-based cameras support reliable detection of cracks and spalling on civil structures even under rapidly changing light.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates the first dedicated dataset of event streams from dynamic vision sensors for finding cracks and spalling on infrastructure. Data were gathered with a DAVIS346 sensor that records both event streams and simultaneous grayscale frames, across hundreds of field and laboratory sequences. Standard real-time object detectors trained on these streams achieve usable performance where conventional cameras lose detail due to blur or saturation. A sympathetic reader would care because UAV inspections of bridges and buildings currently lose effectiveness whenever lighting shifts, and event sensors avoid that loss by design. If the results hold, maintenance teams could shift to lower-power, higher-reliability sensors without changing the rest of their detection pipeline.

Core claim

The central claim is that dynamic vision sensors produce event streams sufficient for real-time object detection of civil defects; the ev-CIVIL dataset supplies 680 recording sequences containing 678 cracks and 429 spalling instances, each captured simultaneously as events and APS frames, and four detection models trained on the event data demonstrate applicability under the same lighting conditions that degrade frame-based methods.

What carries the argument

The ev-CIVIL dataset of paired event streams and intensity frames recorded with the DAVIS346 camera, focused on cracks and spalling in field and laboratory settings.

If this is right

  • DVS data can be fed directly to existing real-time detectors without requiring new hardware beyond the sensor itself.
  • Inspections remain possible during dawn, dusk, or under moving shadows where frame cameras lose contrast.
  • Power consumption per inspection can decrease because event cameras transmit data only on change.
  • Separate training on event streams and on APS frames allows direct comparison of the two modalities on identical scenes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same event streams could support tracking of defect growth over repeated flights without storing full video.
  • Combining event and frame data in one model might improve robustness beyond either modality alone.
  • Extension to additional defect types such as corrosion or joint separation would require only new labels on similar recordings.

Load-bearing premise

The specific sequences collected with one camera model and the defects labeled in them represent the range of real-world civil infrastructure surfaces and lighting variations.

What would settle it

A new set of recordings on previously unseen structures under lighting conditions outside the collected range where event-based detectors drop below usable precision while frame-based detectors remain usable.

Figures

Figures reproduced from arXiv: 2504.05679 by Cesar Cadena, Luca Zanatta, Matteo Fumagalli, Silvia Tolu, T Delbruck, Udayanga G.W.K.N. Gamage, Xuanni Huo.

Figure 1
Figure 1. Figure 1: DAVIS346 event volume formation [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Field and Laboratory data examples structures, including roads, pavements, tunnels, buildings, and walls containing 458 unique crack instances and 121 spalling instances. Examples of these data samples are visualized in fig. 2a. The figure displays grayscale image frames captured by the APS sensor, denoted as ’fr’ in the corresponding columns. Data captured by the DVS sensor are represented as 2D event his… view at source ↗
Figure 3
Figure 3. Figure 3: Number of sequences collected under different lighting conditions. timestamp x y p timestamp x y p timestamp x y p timestamp image_array timestamp image_array timestamp <class_id> (bbox_xlow) (bbox_ylow) (bbox_width) (bbox_height) events.h5 frames.h5 label.npy (a) Template outlining the composition of files events.h5 frames.h5 14015274 0 2.5 13.0 65 30.6 14015272 308 144 0 14015272 167 16 1 14015373 78 60 … view at source ↗
Figure 4
Figure 4. Figure 4: Structure of a recording sequence in the ev-CIVIL Dataset: each event is characterized by a timestamp in µs, x and y are pixel coordinates within the 346x260 DAVIS346 spatial resolution, and a polarity value (p). The polarity p indicates the type of event: 1 for an increase in pixel intensity and 0 for a decrease. Data Collection For our study, which evaluates event-based defect detection in comparison to … view at source ↗
Figure 5
Figure 5. Figure 5: Data Collection Setup 0.10 0.05 0.00 0.05 0.10 0.15 0.20 X (m) 0.025 0.020 0.015 0.010 0.005 0.000 0.005 0.010 Y (m) 0.0 0.2 0.4 0.6 0.8 1.0 Z (m) 3D Trajectory (a) Z-shaped camera trajectory with smooth, flowing bends 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Time (s) 0.0 0.5 1.0 1.5 2.0 Velocity Magnitude (m/s) (b) Instant velocity magnitude throughout the trajectory 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Time (s) 1 0 1 … view at source ↗
Figure 6
Figure 6. Figure 6: An illustration of the DAVIS346 camera’s ’Z’-shaped trajectory with smooth, flowing bends is shown in (a), depicting the trajectory’s two horizontal segments at different distances from the object, connected by a diagonal segment. The trajectory spans a range of 0.2 to 0.7 m, effectively covering this distance. Additionally, the corresponding instantaneous velocity magnitude profile (b), the variation in c… view at source ↗
Figure 7
Figure 7. Figure 7: Overview of the data collection process consisting of preparing the DAVIS346 camera, capturing grayscale images and DVS events simultaneously, and transferring the data to a PC after each recording sequence. to capture any meaningful information due to insufficient ambient lighting. To configure the DAVIS346 for data collection, we adjusted the bias values, optimized focus settings, and performed calibrati… view at source ↗
Figure 8
Figure 8. Figure 8: The process of preparing and integrating an IR laser as an external illuminator with the DAVIS346 camera for scenarios requiring external illumination. (a) The IR laser projector of the Intel RealSense D435 is covered with a thin plastic strip containing tiny holes, where the hole diameter is smaller than that of the structured dot patterns. The covered laser projector is then attached to the handheld moun… view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of Fixed time length based 2D event histogram formation with our 2D event histogram formation method illustrated in algorithm 1: defect areas (in first-row “crack” defect, second row “spalling” defect) are localized by drawing bounding boxes extracted feature maps to detect objects efficiently in real￾time applications. YOLOv6 Architecture Variants : In our evaluations, we used the YOLOv6m16 and… view at source ↗
Figure 10
Figure 10. Figure 10: SSD300 Architecture various object shapes and sizes, particularly in real-time applications where computational efficiency is crucial. SSD300 Architecture and Model Variants : In SSD300-ResNet50, as shown in fig. 10 the backbone uses ResNet5019, a deep residual network known for its ability to capture high-level, semantic features. ResNet50 generates feature maps at different layers, which are then utiliz… view at source ↗
Figure 11
Figure 11. Figure 11: Yolov6 Architecture71 AP(c) = 1 n Xn i=1 P(IOUi) (3) where, n : the number of IoU thresholds. (More specifically, for coco AP0.5:0.95, these IOU thresholds range from 0.5 to 0.95 with a step size of 0.05. And for coco AP0.5 n = 1, as it calculated for the IOU threshold 0.5) P(IOUi) : the precision calculated at IoU threshold IOUi as in eq. (5) F1iou0.5 score : F1iou0.5 metric which combines precision and … view at source ↗
Figure 12
Figure 12. Figure 12: Extraction of grayscale images and event-based data from the evCIVIL dataset for benchmarking crack and spalling detection. First, 10-15 samples are obtained from each recording sequence. For each sample, 2D event histograms are generated from the corresponding events. The extracted grayscale image frames are then preprocessed. These preprocessed grayscale images, 2D event histograms, and extracted annota… view at source ↗
Figure 13
Figure 13. Figure 13: Qualititative visualization of event-based and frame-based crack and spalling detection results of four detection models on Adequately-Illuminated Test Set [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Qualitative visualization of the event-based and image-based crack and spalling detection results for the YOLOv6m and YOLOv6lite-s models on the Low-Light Test Set: ”Other scenarios” refers to conditions involving saturated and dynamic lighting, in addition to dimly lit environments. formation method outperformed the fixed temporal length￾based formulations when fixed temporal lengths were set to 25 ms, 3… view at source ↗
Figure 15
Figure 15. Figure 15: Percentage reduction of mAP@0.5 and F1iou0.5 metrics of defect detections with (a) YOLOv6m, (b) YOLOv6lite-s, (c) SSD300 with ResNet backbone, and (d) SSD300 with MobileNetV2 backbone, when those models were trained without Laboratory data. The results are displayed with respect to the two test sets; that is adequate lighting test set and low/dynamic lighting test set T_10 T_15 T_20 T_25 Ours method 0.0 0… view at source ↗
Figure 16
Figure 16. Figure 16: Comparison of our 2 channel histogram method explained in algorithm 1 (Ours) with fixed temporal length based histogram formation method. For adequately illuminated data, the fixed temporal lengths are 10 ms (T 10), 15 ms (T 15), 20 ms (T 20), and 25 ms (T 25). For low-illuminated data, the fixed temporal lengths are 25 ms (T 25), 30 ms (T 30), 35 ms (T 35), and 40 ms (T 40). Performance is evaluated in t… view at source ↗
Figure 17
Figure 17. Figure 17: Variation in classification accuracy among ResNet34, VGG16, and MobileNetV2 models for frame-based and event-based classification tasks across different input spatial resolutions (32x32, 64x64, 128x128, 224x224) in both Adequately-Illuminated and Low-Light test datasets 128×128. For the same Low-light Test Set, the highest event￾based classification accuracy of 93% was also achieved with the EfficientNet-… view at source ↗
Figure 18
Figure 18. Figure 18: Image-based and event-based detection errors 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 ECC… view at source ↗
Figure 1
Figure 1. Figure 1: Issues involved with event-based detections: (a) partial detections due gure 19. Issues involved with eventbased detections: (a) partial detections due to directionality of DVS, (b) blur in nighttime vent-based 2D histograms as camera movement increases. [PITH_FULL_IMAGE:figures/full_fig_p023_1.png] view at source ↗
read the original abstract

Small unmanned aerial vehicle (UAV)-based visual inspections are a more efficient alternative to manual methods for examining civil structural defects, offering safe access to hazardous areas and significant cost savings by reducing labor requirements. However, traditional frame-based cameras, widely used in UAV-based inspections, often struggle to capture defects under low or dynamic lighting conditions. In contrast, dynamic vision sensors (DVS), or event-based cameras, excel in such scenarios by minimizing motion blur, enhancing power efficiency, and maintaining high-quality imaging across diverse lighting conditions without saturation or information loss. Despite these advantages, existing research lacks studies exploring the feasibility of using DVS for detecting civil structural defects. Moreover, there is no dedicated event-based dataset tailored for this purpose. Addressing this gap, this study introduces the first event-based civil infrastructure defect detection dataset, capturing defective surfaces as a spatio-temporal event stream using DVS. In addition to event-based data, the dataset includes grayscale intensity image frames captured simultaneously using an active pixel sensor (APS). Both data types were collected using the DAVIS346 camera, which integrates DVS and APS sensors. The dataset focuses on two types of defects: cracks and spalling, and includes data from both field and laboratory environments. The field dataset comprises 318 recording sequences, documenting 458 distinct cracks and 121 distinct spalling instances. The laboratory dataset includes 362 recording sequences, covering 220 distinct cracks and 308 spalling instances. We evaluated the dataset using four real-time object detection models.The results demonstrate the applicability of DVS cameras for robust detection of civil infrastructure defects under challenging lighting conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces the ev-CIVIL dataset, the first event-based dataset for civil infrastructure visual defect detection. It collects spatio-temporal event streams and simultaneous APS grayscale frames using a DAVIS346 camera across 318 field sequences (458 cracks, 121 spalling) and 362 laboratory sequences (220 cracks, 308 spalling). Four real-time object detection models are evaluated on the data, with the central claim that the results demonstrate the applicability of DVS cameras for robust detection of cracks and spalling under challenging lighting conditions.

Significance. If the dataset collection protocols and model evaluations establish a clear performance advantage for event data over frame-based imaging specifically in low or dynamic lighting, the work would be significant as the first dedicated benchmark in this application domain. The dual field/lab collection and inclusion of both DVS and APS modalities provide a useful resource for future UAV-based inspection research.

major comments (3)
  1. [Abstract and Dataset section] Abstract and Dataset section: The claim that DVS enables 'robust detection ... under challenging lighting conditions' is not supported by any reported lux ranges, dynamic lighting protocols, or quantitative APS-vs-DVS performance differentials. Without these, the central robustness claim cannot be evaluated.
  2. [Experiments section] Experiments section: No quantitative performance numbers (mAP, precision-recall, or error analysis) are supplied for the four object detection models, nor any baseline comparison against frame-based methods on the same sequences. This leaves the 'results demonstrate' statement without empirical grounding.
  3. [Dataset section] Dataset section: The representativeness of the 680 total sequences for real-world civil infrastructure under conditions where frame-based cameras fail is asserted but not demonstrated; no details on lighting variability, motion speeds, or failure cases of APS are provided to substantiate the weakest assumption.
minor comments (2)
  1. [Dataset section] Clarify the exact train/validation/test splits and labeling protocol (e.g., how event streams were annotated) to improve reproducibility.
  2. [Experiments section] Add a table summarizing the four models, their input modalities (events vs. APS), and key hyperparameters.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript introducing the ev-CIVIL dataset. We have reviewed each major comment and provide our responses below, along with plans for revision.

read point-by-point responses
  1. Referee: [Abstract and Dataset section] The claim that DVS enables 'robust detection ... under challenging lighting conditions' is not supported by any reported lux ranges, dynamic lighting protocols, or quantitative APS-vs-DVS performance differentials. Without these, the central robustness claim cannot be evaluated.

    Authors: We acknowledge this point and agree that additional details are needed to support the claim. In the revised manuscript, we will include measured lux ranges for the field and laboratory sequences, describe the dynamic lighting protocols used during collection, and provide quantitative performance comparisons between the DVS event data and the simultaneous APS frames for the detection models. This will be incorporated into both the Dataset and Experiments sections. revision: yes

  2. Referee: [Experiments section] No quantitative performance numbers (mAP, precision-recall, or error analysis) are supplied for the four object detection models, nor any baseline comparison against frame-based methods on the same sequences. This leaves the 'results demonstrate' statement without empirical grounding.

    Authors: We agree that the manuscript lacks sufficient quantitative details. In the revision, we will supply the mAP, precision-recall, and error analysis numbers for the four models, as well as include baseline comparisons against frame-based methods using the APS data on the same sequences to provide empirical grounding for the results. revision: yes

  3. Referee: [Dataset section] The representativeness of the 680 total sequences for real-world civil infrastructure under conditions where frame-based cameras fail is asserted but not demonstrated; no details on lighting variability, motion speeds, or failure cases of APS are provided to substantiate the weakest assumption.

    Authors: To address this, we will expand the Dataset section with specific details on lighting variability across the sequences, typical UAV motion speeds during recording, and documented cases where APS frames exhibited failures such as motion blur or saturation, while the corresponding event streams enabled successful defect capture. This will better demonstrate the real-world applicability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset collection and standard model benchmarking

full rationale

The paper introduces the ev-CIVIL dataset collected with a DAVIS346 camera and evaluates four off-the-shelf real-time object detection models on event streams and APS frames for crack and spalling detection. No derivations, equations, or parameter-fitting steps appear in the provided text. The central claim rests on new field and laboratory recordings plus standard benchmark results rather than any self-referential reduction of outputs to inputs defined by the authors. Self-citations, if present, are not load-bearing for the empirical demonstration.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies primarily on standard practices in event camera data collection and off-the-shelf object detection models rather than introducing new free parameters, axioms, or invented entities.

axioms (1)
  • standard math Standard assumptions about event generation and calibration in DAVIS346 DVS/APS sensors hold for the collected sequences.
    Implicit in the use of the integrated camera for simultaneous event and intensity data capture.

pith-pipeline@v0.9.0 · 5847 in / 1304 out tokens · 52982 ms · 2026-05-22T20:48:51.018306+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Real-Time Frame- and Event-based Object Detection with Spiking Neural Networks on Edge Neuromorphic Hardware: Design, Deployment and Benchmark

    cs.CV 2026-04 unverdicted novelty 4.0

    SNNs deployed on Loihi 2 achieve real-time object detection with the lowest dynamic energy per inference and recover 87-100% of ANN accuracy via distillation-aware training.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages · cited by 1 Pith paper · 4 internal anchors

  1. [1]

    Drone- based non-destructive inspection of industrial sites: A review and case studies

    Nooralishahi P, Ibarra-Castanedo C, Deane S et al. Drone- based non-destructive inspection of industrial sites: A review and case studies. Drones 2021; 5(4). DOI:10. 3390/drones5040106. URL https://www.mdpi.com/ 2504-446X/5/4/106

  2. [2]

    Pilot visual detection of small unmanned aircraft systems (suas) equipped with strobe lighting

    Wallace R, Loffi J, Vance S et al. Pilot visual detection of small unmanned aircraft systems (suas) equipped with strobe lighting. Journal of Aviation Technology and Engineering 2018; 7. DOI:10.7771/2159-6670.1177

  3. [3]

    Drone spotlights, 2024

    Unmanned Systems Technology. Drone spotlights, 2024. URL https://www.unmannedsystemstechnology. com/expo/drone-spotlights/

  4. [4]

    Event-based human intrusion detection in uas using deep learning

    Prez-Cutio M, Eguluz AG, Dios JMd et al. Event-based human intrusion detection in uas using deep learning. In 2021 International Conference on Unmanned Aircraft Systems (ICUAS). pp. 91–100. DOI:10.1109/ICUAS51884.2021. 9476677

  5. [5]

    Kristianto, G

    Gallego G, Delbrck T, Orchard G et al. Event-based vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 2022; 44(1): 154–180. DOI:10.1109/TPAMI. 2020.3008413

  6. [6]

    Design and experimental evaluation of an aerial solution for visual inspection of tunnel- like infrastructures

    Bendris B and Cayero Becerra J. Design and experimental evaluation of an aerial solution for visual inspection of tunnel- like infrastructures. Remote Sensing 2022; 14(1). DOI: 10.3390/rs14010195. URL https://www.mdpi.com/ 2072-4292/14/1/195

  7. [7]

    Basler ace, Accessed 2024

    Basler. Basler ace, Accessed 2024. URL https://www. baslerweb.com/en/shop/aca4112-20um/

  8. [8]

    Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types

    Cha YJ, Choi W, Suh G et al. Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Computer-Aided Civil and Infrastructure Engineering 2018; 33(9): 731–747

  9. [9]

    A novel hybrid approach for crack detection

    Fang F, Li L, Gu Y et al. A novel hybrid approach for crack detection. Pattern Recognition 2020; 107: 107474

  10. [10]

    A survey and evaluation of promising approaches for automatic image- based defect detection of bridge structures

    Jahanshahi MR, Kelly JS, Masri SF et al. A survey and evaluation of promising approaches for automatic image- based defect detection of bridge structures. Structure and Infrastructure Engineering 2009; 5(6): 455–486

  11. [11]

    A review of computer visionbased structural health monitoring at local and global levels

    Dong CZ and Catbas N. A review of computer visionbased structural health monitoring at local and global levels. Structural Health Monitoring 2020; 20: 692 –

  12. [12]

    URL https://api.semanticscholar.org/ CorpusID:225627479

  13. [13]

    Accessed: Dec

    First Principles of Computer Vision, What is an Edge? — Edge Detection , YouTube, 2024, https: //www.youtube.com/watch?v=G8yp6f9V_6c. Accessed: Dec. 2, 2024

  14. [14]

    Accessed: Dec

    Augmented AI, Support V ector Machine (SVM) in 7 minutes , YouTube, 2024, https://www.youtube.com/watch? v=Y6RRHw9uN9o. Accessed: Dec. 2, 2024

  15. [15]

    com/watch?v=2xqkSUhmmXU

    Alexander Amini, MIT 6.S191: Convolutional Neural Net- works, YouTube, May 2024, https://www.youtube. com/watch?v=2xqkSUhmmXU. Accessed: Dec. 2, 2024

  16. [16]

    youtube.com/watch?v=ErnWZxJovaM&list= PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI

    Alexander Amini, MIT Introduction to Deep Learning — 6.S191 , YouTube, May 2024, https://www. youtube.com/watch?v=ErnWZxJovaM&list= PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI. Accessed: Dec. 2, 2024

  17. [17]

    Meituan. YOLOv6. https://github.com/meituan/ YOLOv6. Accessed: March 7, 2024

  18. [18]

    Ssd: Single shot multibox detector

    Liu W, Anguelov D, Erhan D et al. Ssd: Single shot multibox detector. In Leibe B, Matas J, Sebe N et al. (eds.) Computer Vision – ECCV 2016. Cham: Springer International Publishing, pp. 21–37

  19. [19]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Simonyan K and Zisserman A. Very deep convolutional networks for large-scale image recognition. CoRR 2014; abs/1409.1556. URL https://api. semanticscholar.org/CorpusID:14124313

  20. [20]

    Deep residual learning for image recognition

    He K, Zhang X, Ren S et al. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015; : 770–778URL https://api.semanticscholar.org/ CorpusID:206594692

  21. [21]

    A comprehensive review of deep learning-based crack detection approaches

    Hamishebahar Y , Guan H, So S et al. A comprehensive review of deep learning-based crack detection approaches. Applied Sciences 2022; 12(3): 1374

  22. [22]

    Artificial intelli- gence assisted infrastructure assessment using mixed reality systems

    Karaaslan E, Bagci U and Catbas FN. Artificial intelli- gence assisted infrastructure assessment using mixed reality systems. Transportation Research Record 2018; 2673: 413 – 424. URL https://api.semanticscholar.org/ CorpusID:55702295

  23. [23]

    Event-based classification ofdefects incivil infrastructures withartificial andspiking neural networks

    Gamage UKNGW, Zanatta L, Fumagalli M et al. Event-based classification ofdefects incivil infrastructures withartificial andspiking neural networks. In Rojas I, Joya G and Catala A (eds.) Advances in Computational Intelligence . Cham: Springer Nature Switzerland. ISBN 978-3-031-43078-7, pp. 629–640

  24. [24]

    A 128× 128 120 db 15 µs latency asynchronous temporal contrast vision sensor.IEEE Journal of Solid-State Circuits 2008; 43(2): 566–576

    Lichtsteiner P, Posch C and Delbruck T. A 128× 128 120 db 15 µs latency asynchronous temporal contrast vision sensor.IEEE Journal of Solid-State Circuits 2008; 43(2): 566–576. DOI: 10.1109/JSSC.2007.914337

  25. [25]

    Measuring diameters and velocities of artificial raindrops with a neuromorphic dynamic vision sensor disdrometer

    Steiner JG, Micev K, Aydin A et al. Measuring diameters and velocities of artificial raindrops with a neuromorphic dynamic vision sensor disdrometer. URL https://api. semanticscholar.org/CorpusID:253707979

  26. [26]

    URL https://inivation.com/

    Inivation. URL https://inivation.com/

  27. [27]

    A large scale event-based detection dataset for automotive,

    de Tournemire P, Nitti DO, Perot E et al. A large scale event-based detection dataset for automotive. ArXiv 2020; abs/2001.08499. URL https://api. semanticscholar.org/CorpusID:210860813

  28. [28]

    Learning to detect objects with a 1 megapixel event camera

    Perot E, de Tournemire P, Nitti D et al. Learning to detect objects with a 1 megapixel event camera. In Proceedings of the 34th International Conference on Neural Information Processing Systems . NIPS’20, Red Hook, NY , USA: Curran Associates Inc. ISBN 9781713829546

  29. [29]

    Pushing the limits of asynchronous graph-based object detection with event cameras

    Gehrig D and Scaramuzza D. Pushing the limits of asynchronous graph-based object detection with event cameras. arXiv 2022

  30. [30]

    High-temporal-resolution object detection and tracking using images and events

    El Shair Z and Rawashdeh SA. High-temporal-resolution object detection and tracking using images and events. Journal of Imaging 2022; 8(8): 210. DOI:10.3390/ jimaging8080210. URL https://www.mdpi.com/ 2313-433X/8/8/210

  31. [31]

    Pedro: an event- based dataset for person detection in robotics

    Boretti C, Bich P, Pareschi F et al. Pedro: an event- based dataset for person detection in robotics. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

  32. [32]

    MoCoGAN: Decomposing motion and content for video generation

    Sironi A, Brambilla M, Bourdis N et al. Hats: His- tograms of averaged time surfaces for robust event-based object classification. In 2018 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) . IEEE Com- puter Society, pp. 1731–1740. DOI:10.1109/CVPR.2018. 00186. URL https://doi.ieeecomputersociety. org/10.1109/CVPR.2018.00186

  33. [33]

    Dsec: A stereo event camera dataset for driving scenarios

    Gehrig M, Aarents W, Gehrig D et al. Dsec: A stereo event camera dataset for driving scenarios. IEEE Robotics and Automation Letters 2021; DOI:10.1109/LRA.2021.3068942

  34. [34]

    In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

    Hu Y , Liu S and Delbruck T. v2e: From video frames to realistic dvs events. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). pp. 1312–1321. DOI:10.1109/CVPRW53098.2021. 00144. URL https://doi.ieeecomputersociety. org/10.1109/CVPRW53098.2021.00144

  35. [35]

    M3ed: Multi-robot, multi- sensor, multi-environment event dataset

    Chaney K, Cladera F, Wang Z et al. M3ed: Multi-robot, multi- sensor, multi-environment event dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 4015–4022

  36. [36]

    Converting static image datasets to spiking neuromorphic datasets using saccades

    Orchard G, Jayawant A, Cohen G et al. Converting static image datasets to spiking neuromorphic datasets using saccades. Frontiers in Neuroscience 2015; 9. URL https://api. semanticscholar.org/CorpusID:940928

  37. [37]

    Cifar10-dvs: An event-stream dataset for object classification

    Li H, Liu H, Ji X et al. Cifar10-dvs: An event-stream dataset for object classification. Frontiers in Neuroscience 2017; 11. URL https://api.semanticscholar. org/CorpusID:2406565

  38. [38]

    Intel RealSense D400 Series Product Family Datasheet,

    Intel, “Intel RealSense D400 Series Product Family Datasheet,” Available: https://www.intel.com/ content/www/us/en/content-details/841984/ intel-realsense-d400-series-product-family-datasheet. html. [Accessed: 31-Jan-2025]

  39. [39]

    URL https://www.sz3km.cn/index.php/content/71

    The application of ir laser illuminator in the drones. URL https://www.sz3km.cn/index.php/content/71. Prepared using sagej.cls 25

  40. [40]

    Introduction to the Physics and Techniques of Remote Sensing

    Elachi C and van Zyl J. Introduction to the Physics and Techniques of Remote Sensing . John Wiley & Sons,

  41. [41]

    URL https://onlinelibrary.wiley.com/ doi/book/10.1002/0471783390

  42. [42]

    Intel realsense sdk api how-to: Controlling the laser, Year

    Intel Corporation. Intel realsense sdk api how-to: Controlling the laser, Year. URL https://github. com/IntelRealSense/librealsense/wiki/ API-How-To#controlling-the-laser

  43. [43]

    Meta-learning convolutional neural architectures for multi-target concrete defect classification with the concrete defect bridge image dataset

    Mundt M, Majumder S, Murali S et al. Meta-learning convolutional neural architectures for multi-target concrete defect classification with the concrete defect bridge image dataset. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . pp. 11188–11197. DOI: 10.1109/CVPR.2019.01145

  44. [44]

    Deep concrete inspection using unmanned aerial vehicle towards cssc database

    Yang L, Li B, Li W et al. Deep concrete inspection using unmanned aerial vehicle towards cssc database

  45. [45]

    Detecting cracks and spalling automatically in extreme events by end-to-end deep learning frameworks

    Bai Y , Sezen H and Yilmaz A. Detecting cracks and spalling automatically in extreme events by end-to-end deep learning frameworks. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences 2021; V-2-2021: 161–168. DOI:10.5194/isprs-annals-V-2-2021-161-2021

  46. [46]

    SensorsINI. jaer. https://github.com/SensorsINI/ jaer, Accessed: Dec 7, 2024

  47. [47]

    Labelme: Image annotation and labeling tool

    LabelMe Development Team. Labelme: Image annotation and labeling tool. https://github.com/labelmeai/ labelme, ongoing. Accessed: March 7, 2024

  48. [48]

    Microsoft COCO: Common Objects in Context

    Lin T, Maire M, Belongie SJ et al. Microsoft COCO: common objects in context. CoRR 2014; abs/1405.0312. URL http: //arxiv.org/abs/1405.0312. 1405.0312

  49. [49]

    Event-based vision meets deep learning on steering prediction for self- driving cars

    Maqueda AMI, Loquercio A, Gallego G et al. Event-based vision meets deep learning on steering prediction for self- driving cars. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018; : 5419–5427URL https:// api.semanticscholar.org/CorpusID:4610262

  50. [50]

    cv::CLAHE Class Reference, n.d

    OpenCV. cv::CLAHE Class Reference, n.d. URL https://docs.opencv.org/4.x/d6/db6/ classcv_1_1CLAHE.html

  51. [51]

    Mobilenetv2: Inverted residuals and linear bottlenecks

    Sandler M, Howard AG, Zhu M et al. Mobilenetv2: Inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition2018; : 4510–4520URL https://api.semanticscholar. org/CorpusID:4555207

  52. [52]

    PyTorch: an imperative style, high-performance deep learning library

    Paszke A, Gross S, Massa F et al. PyTorch: an imperative style, high-performance deep learning library. Red Hook, NY , USA: Curran Associates Inc., 2019

  53. [53]

    SGDR: Stochastic Gradient Descent with Warm Restarts

    Loshchilov I and Hutter F. Sgdr: Stochastic gradient descent with restarts. ArXiv 2016; abs/1608.03983. URL https:// api.semanticscholar.org/CorpusID:15884797

  54. [54]

    EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

    Tan M and Le QV . Efficientnet: Rethinking model scaling for convolutional neural networks. ArXiv 2019; abs/1905.11946. URL https://api.semanticscholar.org/ CorpusID:167217261

  55. [55]

    Microsoft coco: Common objects in context

    Lin TY , Maire M, Belongie S et al. Microsoft coco: Common objects in context. In Fleet D, Pajdla T, Schiele B et al. (eds.) Computer Vision – ECCV 2014 . Cham: Springer International Publishing, pp. 740–755

  56. [56]

    pycocotools: Python API for Microsoft COCO.https://github.com/ cocodataset/cocoapi, 2017

    Wei W, Mercurial T and Consortium C. pycocotools: Python API for Microsoft COCO.https://github.com/ cocodataset/cocoapi, 2017

  57. [57]

    Prophesee evk4 event camera, 2024

    Prophesee. Prophesee evk4 event camera, 2024. URL https: //www.prophesee.ai/event-camera-evk4/

  58. [58]

    ROS package for DVS data processing and applications

    University of Zurich, Robotics and Perception Group. ROS package for DVS data processing and applications. Available at https://github.com/uzh-rpg/rpg_dvs_ros, Accessed December 9, 2024

  59. [59]

    Accessed: 2024- 12-09

    Silvio Savarese, Lecture 3: Camera Models & Camera Calibration, Computational Vision and Geometry Lab, 2014 https://cvgl.stanford.edu/teaching/ cs231a_winter1415/lecture/lecture3_ camera_calibration_notes.pdf. Accessed: 2024- 12-09

  60. [60]

    Biasing Dynamic Sensors,

    iniVation AG, “Biasing Dynamic Sensors,” 2024. [Online]. Available: https://docs.inivation.com/ hardware/hardware-advanced-usage/biasing. html. [Accessed: Dec. 9, 2024]

  61. [61]

    Light Meter LM-3000 4+,

    Apps Studio, “Light Meter LM-3000 4+,” 2024. [Online]. Available: https://apps.apple.com/us/app/ light-meter-lm-3000/id1554264761 . [Accessed: Dec. 9, 2024]

  62. [62]

    S120C Photodiode Power Sensor,

    Thorlabs, Inc., “S120C Photodiode Power Sensor,” 2024. [Online]. Available: https://www.thorlabs.com/ thorproduct.cfm?partnumber=S120C. [Accessed: Dec. 9, 2024]

  63. [63]

    [Accessed: 5-Jan-2025]

    OpenCV , ”Multiple Object Tracking in Real-Time,” OpenCV Blog, Available: https://opencv.org/blog/ multiple-object-tracking-in-realtime/ . [Accessed: 5-Jan-2025]

  64. [64]

    Event-Driven Sensing for Efficient Perception: Vision and Audition Algorithms,

    Shih-Chii Liu, Bodo Rueckauer, Enea Ceolini, Adrian Huber, and Tobi Delbruck, “Event-Driven Sensing for Efficient Perception: Vision and Audition Algorithms,” IEEE Signal Processing Magazine , vol. 36, no. 6, pp. 29-37, 2019, doi: 10.1109/MSP.2019.2928127

  65. [65]

    Adaptive Time-Slice Block- Matching Optical Flow Algorithm for Dynamic Vision Sensors,

    Min Liu and Tobi Delbrck, “Adaptive Time-Slice Block- Matching Optical Flow Algorithm for Dynamic Vision Sensors,” in British Machine Vision Conference , 2018. Available at: https://api.semanticscholar.org/ CorpusID:52283776

  66. [66]

    RepVGG: Making VGG- style ConvNets Great Again,

    Xiaohan Ding, X. Zhang, Ningning Ma, Jungong Han, Guiguang Ding, and Jian Sun, “RepVGG: Making VGG- style ConvNets Great Again,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 13728-13737, 2021. Available at: https://api. semanticscholar.org/CorpusID:231572790

  67. [67]

    EfficientRep: An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design,

    Kaiheng Weng, Xiangxiang Chu, Xiaoming Xu, Junshi Huang, and Xiaoming Wei, “EfficientRep: An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design,” arXiv preprint arXiv:2302.00386 , 2023, submitted on 1 Feb

  68. [68]

    Available at: https://arxiv.org/abs/2302. 00386

  69. [69]

    CSPNet: A New Backbone that can Enhance Learning Capability of CNN,

    Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh- Hua Wu, Ping-Yang Chen, and Jun-Wei Hsieh, “CSPNet: A New Backbone that can Enhance Learning Capability of CNN,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1571-1580,

  70. [70]

    org/CorpusID:208310312

    Available at: https://api.semanticscholar. org/CorpusID:208310312

  71. [71]

    GhostNet: More Features From Cheap Operations,

    Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu, “GhostNet: More Features From Cheap Operations,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1577-1586, 2019. Available at: https://api.semanticscholar.org/ CorpusID:208310058

  72. [72]

    ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,

    Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun, “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition , pp. 6848-6856,

  73. [73]

    org/CorpusID:24982157

    Available at: https://api.semanticscholar. org/CorpusID:24982157

  74. [74]

    Feature Pyramid Networks for Object Detection,

    Tsung-Yi Lin, Piotr Dollr, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie, “Feature Pyramid Networks for Object Detection,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936-944, 2016. Available at: https://api. semanticscholar.org/CorpusID:10716717

  75. [75]

    Path Aggregation Network for Instance Segmentation,

    Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia, “Path Aggregation Network for Instance Segmentation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759-8768, 2018. Available at: https:// api.semanticscholar.org/CorpusID:3698141

  76. [76]

    RETRACTED: Highway forecasting for weather factor and traffic flow interaction scenarios,

    Ning Tao, Deng Shengteng, Jia Xiangkun, et al., “RETRACTED: Highway forecasting for weather factor and traffic flow interaction scenarios,” PREPRINT, 11 October 2023, Version 1. Available at: https: //doi.org/10.21203/rs.3.rs-3418469/v1

  77. [77]

    Liang, Y.-C

    L. Cordone, B. Miramond, and P. Thierion, ”Object Detection with Spiking Neural Networks on Automotive Event Data,” in *Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN)*, 2022, pp. 1-8. doi: 10.1109/IJCNN55064.2022.9892618

  78. [78]

    Zubic, D

    N. Zubic, D. Gehrig, M. Gehrig, and D. Scaramuzza, ”From Chaos Comes Order: Ordering Event Representations for Object Recognition and Detection,” in *Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV)*, 2023, pp. 12800-12810. Available: Semantic Scholar

  79. [79]

    Z. Zhou, Z. Wu, R. Boutteau, F. Yang, C. Demonceaux, and D. Ginhac, ”RGB-Event Fusion for Moving Object Detection in Autonomous Driving,” in *Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA)*, 2022, pp. 7808-7815. Available: Semantic Scholar

  80. [80]

    C. M. Torrejn, U. G. W. K. N. Gamage, and S. Tolu, ”Concurrent Detection of Known Defects and Out-of- Distribution Instances in Building Inspections: Advancements in Deep Classification,” in *Proceedings of the 2023 IEEE International Conference on Imaging Systems and Techniques (IST)*, 2023, pp. 1-6. doi: 10.1109/IST59124.2023.10355664. Prepared using sagej.cls