NERVE: A Neuromorphic Vision and Radar Ensemble for Multi-Sensor Fusion Research

Amirreza Yousefzadeh; Ethan Milon; Guangzhi Tang; Manolis Sifalakis; Omar Mansour; Pietro Martinello; YingFu Xu

arxiv: 2605.16414 · v1 · pith:CUG77OX4new · submitted 2026-05-13 · 💻 cs.CV

NERVE: A Neuromorphic Vision and Radar Ensemble for Multi-Sensor Fusion Research

Omar Mansour , Pietro Martinello , Ethan Milon , YingFu Xu , Manolis Sifalakis , Guangzhi Tang , Amirreza Yousefzadeh This is my paper

Pith reviewed 2026-05-20 20:44 UTC · model grok-4.3

classification 💻 cs.CV

keywords multi-sensor fusionneuromorphic visionDVSradarhuman detectiondistance estimationdatasetrecurrent models

0 comments

The pith

Combining DVS with 77 GHz radar improves human detection to 47.5% mAP and keeps distance errors below 1.8 m.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the NERVE dataset of synchronized recordings from two Dynamic Vision Sensors, an RGB-D camera, and 24 GHz plus 77 GHz radar units captured across twelve office days. It isolates a DVS-plus-radar subset with nearly one million frames and COCO-style annotations to test multi-modal fusion specifically for human detection and ranging. Baseline runs with feed-forward and recurrent detectors show that radar data added to DVS inputs raises detection scores, with recurrent networks reaching 47.5 percent mean average precision. The same models produce radar distance estimates whose mean absolute error stays under 1.8 meters when checked against LiDAR references. The work therefore supplies a concrete testbed for examining how event-based vision and radar measurements can be combined.

Core claim

The central claim is that fusing Dynamic Vision Sensor streams with 77 GHz radar data in the NERVE dataset consistently raises human-detection performance, with recurrent models attaining up to 47.5 percent mean average precision while radar-derived distance estimates remain below 1.8 m mean absolute error against LiDAR ground truth.

What carries the argument

The DVS-plus-77 GHz radar subset processed by feed-forward and recurrent detectors, which isolates the contribution of each modality to detection and ranging.

If this is right

Recurrent detectors make better use of the temporal structure in DVS and radar streams than feed-forward detectors.
77 GHz radar supplies a stronger complementary signal for detection than 24 GHz radar when paired with DVS.
The full dataset with its 16 object categories supports extension of the same fusion evaluation beyond the human-detection task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The reported gains may shrink when models trained on office data encounter outdoor motion or lighting changes.
Including the RGB-D camera already present in the recordings could further tighten distance estimates or raise detection scores.
The scale of the synchronized recordings invites direct comparison of early versus late fusion architectures on the same data.

Load-bearing premise

Recordings made in office settings with standard COCO annotations supply a representative ground truth for multi-sensor human detection and distance estimation in general conditions.

What would settle it

Running the identical fusion models on recordings from non-office environments or with independent ranging ground truth and observing whether mAP and distance errors remain comparable would test the claim.

Figures

Figures reproduced from arXiv: 2605.16414 by Amirreza Yousefzadeh, Ethan Milon, Guangzhi Tang, Manolis Sifalakis, Omar Mansour, Pietro Martinello, YingFu Xu.

**Figure 1.** Figure 1: Overview of the NERVE dataset: (a) the multi-sensor acquisition setup; (b) an example fused frame with annotations. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Block diagram of the automatic annotation pipeline: [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: NERVE dataset distributions: (a) spatial distribution of bounding box centers for class [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

We present NERVE (Neuromorphic Vision and Radar Ensemble), a multi-sensor dataset comprising 257 minutes of synchronized recordings from five sensors: two Dynamic Vision Sensors (DVS), an RGB-D camera, and two Radar units (24GHz and 77GHz). Captured across 12 measurement days in office environments, NERVE contains around 600GB of uncompressed temporally aligned data with around 914,000 frames and around 9.6 million RGB COCO-formatted annotations covering 16 relevant object categories. To evaluate multi-modal fusion, we construct a DVS+Radar subset for human detection and distance estimation. Baseline experiments using feed-forward and recurrent detectors show that combining DVS with 77GHz Radar consistently improves detection, with recurrent models achieving up to 47.5% mAP and mean absolute Radar distance errors below 1.8m against LiDAR ground truth.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward dataset paper releasing synchronized DVS and dual-radar recordings with COCO annotations, plus simple fusion baselines, but the LiDAR ground truth for the distance numbers is not among the listed sensors.

read the letter

The main thing to know is that NERVE supplies a sizable collection of temporally aligned data from two DVS units, an RGB-D camera, and 24 GHz plus 77 GHz radars, all recorded in offices across 12 days with nearly 10 million COCO-style annotations. That combination at this scale has not appeared before in the cited literature, so the release itself is the core new piece. The baseline fusion results are also new: adding the 77 GHz radar to DVS improves detection, with recurrent models hitting 47.5% mAP and sub-1.8 m mean absolute distance error on the reported task. Those numbers look internally consistent for what they are, and the paper earns credit for making the raw synchronized streams available rather than just claiming improvements in the abstract. The work is scoped to office scenes and human detection plus ranging, which keeps the claims modest and focused. The soft spot is the distance-error claim. The abstract measures radar errors against LiDAR ground truth, yet the five-sensor list never includes a LiDAR. If that reference was captured separately, post-processed, or derived, the numerical part of the fusion result rests on an uncharacterized source whose own error profile is not quantified. That is a real gap for anyone who wants to reproduce or extend the ranging numbers. No model architectures, training details, or ablation controls are described in the abstract either, though the full text may fill some of that in. This paper is for groups that need public multi-modal benchmarks mixing event vision and radar for perception work. A reader building fusion pipelines or looking for fresh synchronized data would get practical value from the recordings themselves. It deserves a serious referee because the dataset contribution is concrete and the baseline trends are plausible, even if the ground-truth details require clarification before publication.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces the NERVE multi-sensor dataset featuring synchronized recordings from two Dynamic Vision Sensors (DVS), an RGB-D camera, and two radar units operating at 24GHz and 77GHz. The dataset includes 257 minutes of data from office environments, approximately 914,000 frames, and 9.6 million COCO-formatted annotations for 16 object categories. Baseline experiments on DVS and 77GHz radar fusion for human detection and distance estimation demonstrate improved performance, with recurrent models reaching 47.5% mAP and mean absolute distance errors below 1.8 m using LiDAR as ground truth.

Significance. Should the experimental details be clarified and the ground truth methodology validated, the NERVE dataset could serve as a significant contribution to multi-sensor fusion research in computer vision, particularly for neuromorphic and radar modalities. The large scale and temporal alignment of the data, along with the provision of baseline results, offer a foundation for developing and evaluating fusion algorithms. The work highlights potential benefits of combining event-based vision with radar for detection tasks in indoor settings.

major comments (3)

[Abstract] Abstract: The claim of mean absolute Radar distance errors below 1.8m is made against LiDAR ground truth, but the described sensor suite consists only of two DVS, one RGB-D camera, and two Radar units. The manuscript should detail the acquisition, synchronization, and error characteristics of the LiDAR data used for ranging evaluation, as this reference is central to validating the distance estimation results.
[Experiments] Experiments section: The baseline experiments report improvements from DVS+Radar fusion but omit model architectures, training details, hyperparameters, loss functions, and any ablation studies or statistical tests. These omissions make it difficult to assess the robustness of the 47.5% mAP and sub-1.8m error claims.
[Dataset] Dataset description: While the dataset size and annotation format are specified, additional information on annotation process, inter-annotator agreement, and handling of sensor-specific challenges (e.g., DVS event noise, radar clutter) would strengthen the resource's utility.

minor comments (2)

[Abstract] Abstract: The repeated use of 'around' for quantities (e.g., around 600GB, around 914,000 frames) could be replaced with more precise figures or ranges if exact counts are available.
[Throughout] Throughout: Ensure consistent terminology for sensors, such as specifying '77GHz Radar' clearly in all references to avoid ambiguity with the 24GHz unit.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments and suggestions. We address each of the major comments below and will revise the manuscript to incorporate the requested clarifications and additional details.

read point-by-point responses

Referee: [Abstract] Abstract: The claim of mean absolute Radar distance errors below 1.8m is made against LiDAR ground truth, but the described sensor suite consists only of two DVS, one RGB-D camera, and two Radar units. The manuscript should detail the acquisition, synchronization, and error characteristics of the LiDAR data used for ranging evaluation, as this reference is central to validating the distance estimation results.

Authors: We thank the referee for pointing this out. The LiDAR was used solely to provide ground truth for the distance estimation evaluation and is not included in the released dataset. In the revised manuscript, we will add a new subsection in the Dataset or Experiments section describing the LiDAR sensor model, its acquisition setup, synchronization with the other sensors using hardware triggers, and error characteristics derived from calibration and manufacturer data. revision: yes
Referee: [Experiments] Experiments section: The baseline experiments report improvements from DVS+Radar fusion but omit model architectures, training details, hyperparameters, loss functions, and any ablation studies or statistical tests. These omissions make it difficult to assess the robustness of the 47.5% mAP and sub-1.8m error claims.

Authors: We agree that these details are essential for reproducibility and assessing the claims. The revised manuscript will include comprehensive descriptions of the model architectures for both feed-forward and recurrent detectors, the training protocols, specific hyperparameters, the loss functions employed, ablation studies on the fusion components, and statistical tests (e.g., paired t-tests) to validate the significance of the performance improvements. revision: yes
Referee: [Dataset] Dataset description: While the dataset size and annotation format are specified, additional information on annotation process, inter-annotator agreement, and handling of sensor-specific challenges (e.g., DVS event noise, radar clutter) would strengthen the resource's utility.

Authors: We appreciate this recommendation to enhance the dataset's documentation. We will expand the Dataset section to detail the annotation process, including the software tools used, the number of annotators involved, and the guidelines followed. We will also report inter-annotator agreement using appropriate metrics. Furthermore, we will describe the methods used to handle DVS event noise and radar clutter during the annotation and data preparation stages. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset paper with direct measurements only

full rationale

The manuscript presents a multi-sensor dataset (DVS, RGB-D, Radar) and reports baseline empirical results for human detection and ranging. No mathematical derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text or abstract. Results such as 47.5% mAP and <1.8 m errors are stated as direct measurements on collected data against an external reference, with no reduction by construction to self-defined quantities or ansatzes. The LiDAR ground-truth reference, while potentially raising separate questions of sensor enumeration, does not create a self-referential loop in any derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The contribution rests on standard data-collection assumptions rather than new theoretical constructs; no free parameters or invented entities are introduced.

axioms (2)

domain assumption Temporal synchronization across all five sensors is accurate enough for frame-level fusion
Required for the DVS+Radar subset experiments described in the abstract
domain assumption COCO-formatted annotations constitute reliable ground truth for the 16 object categories
Used to compute mAP and distance error metrics

pith-pipeline@v0.9.0 · 5705 in / 1328 out tokens · 81657 ms · 2026-05-20T20:44:46.389340+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Baseline experiments using feed-forward and recurrent detectors show that combining DVS with 77GHz Radar consistently improves detection, with recurrent models achieving up to 47.5% mAP and mean absolute Radar distance errors below 1.8m against LiDAR ground truth.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We present NERVE ... five sensors: two Dynamic Vision Sensors (DVS), an RGB-D camera, and two Radar units (24GHz and 77GHz).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

[1]

Event- based vision: A survey,

G. Gallego, T. Delbr ¨uck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidiset al., “Event- based vision: A survey,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 1, pp. 154–180, 2020

work page 2020
[2]

Neuromorphic electronic systems,

C. Mead, “Neuromorphic electronic systems,”Proceedings of the IEEE, vol. 78, no. 10, pp. 1629–1636, 1990

work page 1990
[3]

A survey of multisensor fusion techniques, architectures and methodologies,

B. Chandrasekaran, S. Gangadhar, and J. M. Conrad, “A survey of multisensor fusion techniques, architectures and methodologies,” in SoutheastCon 2017. IEEE, 2017, pp. 1–8

work page 2017
[4]

Vision meets robotics: The KITTI dataset,

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,”International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013

work page 2013
[5]

nuScenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuScenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 621–11 631

work page 2020
[6]

A large scale event-based detection dataset for automotive,

P. De Tournemire, D. Nitti, E. Perot, D. Migliore, and A. Sironi, “A large scale event-based detection dataset for automotive,”arXiv preprint arXiv:2001.08499, 2020

work page arXiv 2001
[7]

Learning to detect objects with a 1 megapixel event camera,

E. Perot, P. De Tournemire, D. Nitti, J. Masci, and A. Sironi, “Learning to detect objects with a 1 megapixel event camera,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 16 639– 16 652

work page 2020
[8]

Aircraft marshalling signals dataset of radar and event- based camera for sensor fusion,

L. M ¨uller, M. Sifalakis, S. Eissa, S. Afshar, A. van Schaik, and A. Yousefzadeh, “Aircraft marshalling signals dataset of radar and event- based camera for sensor fusion,” inIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

work page 2023
[9]

Mmwave radar and vision fusion for object detection in autonomous driving: A review,

Z. Wei, F. Zhang, S. Chang, Y . Liu, H. Wu, and Z. Feng, “Mmwave radar and vision fusion for object detection in autonomous driving: A review,”Sensors, vol. 22, no. 7, p. 2542, 2022

work page 2022
[10]

Fusing event-based camera and radar for SLAM using spiking neural networks with continual STDP learning,

A. Safa, T. Verbelen, I. Ocket, A. Bourdoux, H. Sahli, F. Catthoor, and G. Gielen, “Fusing event-based camera and radar for SLAM using spiking neural networks with continual STDP learning,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 2782–2788

work page 2023
[11]

Ultra-high-frequency harmony: mmwave radar and event camera orchestrate accurate drone landing,

H. Wang, J. Xu, X. Luo, X. Chen, T. Zhang, R. Duan, Y . Liu, and X. Chen, “Ultra-high-frequency harmony: mmwave radar and event camera orchestrate accurate drone landing,” inProceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems (SenSys). ACM, 2025, pp. 15–29

work page 2025
[12]

The FAIR guiding principles for scientific data management and stewardship,

M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourneet al., “The FAIR guiding principles for scientific data management and stewardship,”Scientific Data, vol. 3, no. 1, p. 160018, 2016

work page 2016
[13]

Converting static image datasets to spiking neuromorphic datasets using saccades,

G. Orchard, A. Jayawant, G. K. Cohen, and N. Thakor, “Converting static image datasets to spiking neuromorphic datasets using saccades,” Frontiers in neuroscience, vol. 9, p. 437, 2015

work page 2015
[14]

ESIM: An open event camera simulator,

H. Rebecq, D. Gehrig, and D. Scaramuzza, “ESIM: An open event camera simulator,” inConference on Robot Learning. PMLR, 2018, pp. 969–982

work page 2018
[15]

Video to events: Recycling video datasets for event cameras,

D. Gehrig, M. Gehrig, J. Hidalgo-Carri ´o, and D. Scaramuzza, “Video to events: Recycling video datasets for event cameras,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3586–3595

work page 2020
[16]

DDD17: End-To-End DAVIS Driving Dataset

J. Binas, D. Neil, S.-C. Liu, and T. Delbruck, “DDD17: End-to-end DA VIS driving dataset,”arXiv preprint arXiv:1711.01458, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[17]

DDD20 end- to-end event camera driving dataset: Fusing frames and events with deep learning for improved steering prediction,

Y . Hu, J. Binas, D. Neil, S.-C. Liu, and T. Delbruck, “DDD20 end- to-end event camera driving dataset: Fusing frames and events with deep learning for improved steering prediction,” inIEEE International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020, pp. 1–6

work page 2020
[18]

Neuromorphic vision datasets for pedestrian detection, action recog- nition, and fall detection,

S. Miao, G. Chen, X. Ning, Y . Zi, K. Ren, Z. Bing, and A. Knoll, “Neuromorphic vision datasets for pedestrian detection, action recog- nition, and fall detection,”Frontiers in Neurorobotics, vol. 13, p. 38, 2019

work page 2019
[19]

Pedro: An event-based dataset for person detection in robotics,

C. Boretti, P. Bich, F. Pareschi, L. Prono, R. Rovatti, and G. Setti, “Pedro: An event-based dataset for person detection in robotics,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4065–4074

work page 2023
[20]

The multivehicle stereo event camera dataset: An event camera dataset for 3d perception,

A. Z. Zhu, D. Thakur, T. ¨Ozaslan, B. Pfrommer, V . Kumar, and K. Daniilidis, “The multivehicle stereo event camera dataset: An event camera dataset for 3d perception,”IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 2032–2039, 2018

work page 2032
[21]

Intel RealSense LiDAR Camera L515 Datasheet,

Intel Corporation, “Intel RealSense LiDAR Camera L515 Datasheet,” https://docs.rs-online.com/f31c/A700000006942953.pdf, 2020, revision 002, June 2020

work page 2020
[22]

Charuco board-based omnidirectional camera calibration method,

G.-H. An, S. Lee, M.-W. Seo, K. Yun, W.-S. Cheong, and S.-J. Kang, “Charuco board-based omnidirectional camera calibration method,” Electronics, vol. 7, no. 12, p. 421, 2018

work page 2018
[23]

Ultralytics YOLOv8,

G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLOv8,” 2023, available: https://github.com/ultralytics/ultralytics. [Online]. Available: https://github.com/ultralytics/ultralytics

work page 2023
[24]

Recurrent vision transformers for object detection with event cameras,

M. Gehrig and D. Scaramuzza, “Recurrent vision transformers for object detection with event cameras,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 884–13 893

work page 2023
[25]

A recurrent YOLOv8-based framework for event- based object detection,

D. A. Silva, S. Ahmed, K. Siddique, M. Iacono, P. Morerio, L. Marce- naro, C. Regazzoni, L. Martino, J. Caba, K. Abualsaud, D. Thomas, and P. Vandergheynst, “A recurrent YOLOv8-based framework for event- based object detection,”Frontiers in Neuroscience, vol. 18, p. 1477979, 2024

work page 2024
[26]

YOLOX: Exceeding YOLO Series in 2021

Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOX: Exceeding YOLO series in 2021,”arXiv preprint arXiv:2107.08430, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[1] [1]

Event- based vision: A survey,

G. Gallego, T. Delbr ¨uck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidiset al., “Event- based vision: A survey,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 1, pp. 154–180, 2020

work page 2020

[2] [2]

Neuromorphic electronic systems,

C. Mead, “Neuromorphic electronic systems,”Proceedings of the IEEE, vol. 78, no. 10, pp. 1629–1636, 1990

work page 1990

[3] [3]

A survey of multisensor fusion techniques, architectures and methodologies,

B. Chandrasekaran, S. Gangadhar, and J. M. Conrad, “A survey of multisensor fusion techniques, architectures and methodologies,” in SoutheastCon 2017. IEEE, 2017, pp. 1–8

work page 2017

[4] [4]

Vision meets robotics: The KITTI dataset,

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,”International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013

work page 2013

[5] [5]

nuScenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuScenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 621–11 631

work page 2020

[6] [6]

A large scale event-based detection dataset for automotive,

P. De Tournemire, D. Nitti, E. Perot, D. Migliore, and A. Sironi, “A large scale event-based detection dataset for automotive,”arXiv preprint arXiv:2001.08499, 2020

work page arXiv 2001

[7] [7]

Learning to detect objects with a 1 megapixel event camera,

E. Perot, P. De Tournemire, D. Nitti, J. Masci, and A. Sironi, “Learning to detect objects with a 1 megapixel event camera,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 16 639– 16 652

work page 2020

[8] [8]

Aircraft marshalling signals dataset of radar and event- based camera for sensor fusion,

L. M ¨uller, M. Sifalakis, S. Eissa, S. Afshar, A. van Schaik, and A. Yousefzadeh, “Aircraft marshalling signals dataset of radar and event- based camera for sensor fusion,” inIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

work page 2023

[9] [9]

Mmwave radar and vision fusion for object detection in autonomous driving: A review,

Z. Wei, F. Zhang, S. Chang, Y . Liu, H. Wu, and Z. Feng, “Mmwave radar and vision fusion for object detection in autonomous driving: A review,”Sensors, vol. 22, no. 7, p. 2542, 2022

work page 2022

[10] [10]

Fusing event-based camera and radar for SLAM using spiking neural networks with continual STDP learning,

A. Safa, T. Verbelen, I. Ocket, A. Bourdoux, H. Sahli, F. Catthoor, and G. Gielen, “Fusing event-based camera and radar for SLAM using spiking neural networks with continual STDP learning,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 2782–2788

work page 2023

[11] [11]

Ultra-high-frequency harmony: mmwave radar and event camera orchestrate accurate drone landing,

H. Wang, J. Xu, X. Luo, X. Chen, T. Zhang, R. Duan, Y . Liu, and X. Chen, “Ultra-high-frequency harmony: mmwave radar and event camera orchestrate accurate drone landing,” inProceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems (SenSys). ACM, 2025, pp. 15–29

work page 2025

[12] [12]

The FAIR guiding principles for scientific data management and stewardship,

M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourneet al., “The FAIR guiding principles for scientific data management and stewardship,”Scientific Data, vol. 3, no. 1, p. 160018, 2016

work page 2016

[13] [13]

Converting static image datasets to spiking neuromorphic datasets using saccades,

G. Orchard, A. Jayawant, G. K. Cohen, and N. Thakor, “Converting static image datasets to spiking neuromorphic datasets using saccades,” Frontiers in neuroscience, vol. 9, p. 437, 2015

work page 2015

[14] [14]

ESIM: An open event camera simulator,

H. Rebecq, D. Gehrig, and D. Scaramuzza, “ESIM: An open event camera simulator,” inConference on Robot Learning. PMLR, 2018, pp. 969–982

work page 2018

[15] [15]

Video to events: Recycling video datasets for event cameras,

D. Gehrig, M. Gehrig, J. Hidalgo-Carri ´o, and D. Scaramuzza, “Video to events: Recycling video datasets for event cameras,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3586–3595

work page 2020

[16] [16]

DDD17: End-To-End DAVIS Driving Dataset

J. Binas, D. Neil, S.-C. Liu, and T. Delbruck, “DDD17: End-to-end DA VIS driving dataset,”arXiv preprint arXiv:1711.01458, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[17] [17]

DDD20 end- to-end event camera driving dataset: Fusing frames and events with deep learning for improved steering prediction,

Y . Hu, J. Binas, D. Neil, S.-C. Liu, and T. Delbruck, “DDD20 end- to-end event camera driving dataset: Fusing frames and events with deep learning for improved steering prediction,” inIEEE International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020, pp. 1–6

work page 2020

[18] [18]

Neuromorphic vision datasets for pedestrian detection, action recog- nition, and fall detection,

S. Miao, G. Chen, X. Ning, Y . Zi, K. Ren, Z. Bing, and A. Knoll, “Neuromorphic vision datasets for pedestrian detection, action recog- nition, and fall detection,”Frontiers in Neurorobotics, vol. 13, p. 38, 2019

work page 2019

[19] [19]

Pedro: An event-based dataset for person detection in robotics,

C. Boretti, P. Bich, F. Pareschi, L. Prono, R. Rovatti, and G. Setti, “Pedro: An event-based dataset for person detection in robotics,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4065–4074

work page 2023

[20] [20]

The multivehicle stereo event camera dataset: An event camera dataset for 3d perception,

A. Z. Zhu, D. Thakur, T. ¨Ozaslan, B. Pfrommer, V . Kumar, and K. Daniilidis, “The multivehicle stereo event camera dataset: An event camera dataset for 3d perception,”IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 2032–2039, 2018

work page 2032

[21] [21]

Intel RealSense LiDAR Camera L515 Datasheet,

Intel Corporation, “Intel RealSense LiDAR Camera L515 Datasheet,” https://docs.rs-online.com/f31c/A700000006942953.pdf, 2020, revision 002, June 2020

work page 2020

[22] [22]

Charuco board-based omnidirectional camera calibration method,

G.-H. An, S. Lee, M.-W. Seo, K. Yun, W.-S. Cheong, and S.-J. Kang, “Charuco board-based omnidirectional camera calibration method,” Electronics, vol. 7, no. 12, p. 421, 2018

work page 2018

[23] [23]

Ultralytics YOLOv8,

G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLOv8,” 2023, available: https://github.com/ultralytics/ultralytics. [Online]. Available: https://github.com/ultralytics/ultralytics

work page 2023

[24] [24]

Recurrent vision transformers for object detection with event cameras,

M. Gehrig and D. Scaramuzza, “Recurrent vision transformers for object detection with event cameras,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 884–13 893

work page 2023

[25] [25]

A recurrent YOLOv8-based framework for event- based object detection,

D. A. Silva, S. Ahmed, K. Siddique, M. Iacono, P. Morerio, L. Marce- naro, C. Regazzoni, L. Martino, J. Caba, K. Abualsaud, D. Thomas, and P. Vandergheynst, “A recurrent YOLOv8-based framework for event- based object detection,”Frontiers in Neuroscience, vol. 18, p. 1477979, 2024

work page 2024

[26] [26]

YOLOX: Exceeding YOLO Series in 2021

Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOX: Exceeding YOLO series in 2021,”arXiv preprint arXiv:2107.08430, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021