NERVE: A Neuromorphic Vision and Radar Ensemble for Multi-Sensor Fusion Research
Pith reviewed 2026-05-20 20:44 UTC · model grok-4.3
The pith
Combining DVS with 77 GHz radar improves human detection to 47.5% mAP and keeps distance errors below 1.8 m.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that fusing Dynamic Vision Sensor streams with 77 GHz radar data in the NERVE dataset consistently raises human-detection performance, with recurrent models attaining up to 47.5 percent mean average precision while radar-derived distance estimates remain below 1.8 m mean absolute error against LiDAR ground truth.
What carries the argument
The DVS-plus-77 GHz radar subset processed by feed-forward and recurrent detectors, which isolates the contribution of each modality to detection and ranging.
If this is right
- Recurrent detectors make better use of the temporal structure in DVS and radar streams than feed-forward detectors.
- 77 GHz radar supplies a stronger complementary signal for detection than 24 GHz radar when paired with DVS.
- The full dataset with its 16 object categories supports extension of the same fusion evaluation beyond the human-detection task.
Where Pith is reading between the lines
- The reported gains may shrink when models trained on office data encounter outdoor motion or lighting changes.
- Including the RGB-D camera already present in the recordings could further tighten distance estimates or raise detection scores.
- The scale of the synchronized recordings invites direct comparison of early versus late fusion architectures on the same data.
Load-bearing premise
Recordings made in office settings with standard COCO annotations supply a representative ground truth for multi-sensor human detection and distance estimation in general conditions.
What would settle it
Running the identical fusion models on recordings from non-office environments or with independent ranging ground truth and observing whether mAP and distance errors remain comparable would test the claim.
Figures
read the original abstract
We present NERVE (Neuromorphic Vision and Radar Ensemble), a multi-sensor dataset comprising 257 minutes of synchronized recordings from five sensors: two Dynamic Vision Sensors (DVS), an RGB-D camera, and two Radar units (24GHz and 77GHz). Captured across 12 measurement days in office environments, NERVE contains around 600GB of uncompressed temporally aligned data with around 914,000 frames and around 9.6 million RGB COCO-formatted annotations covering 16 relevant object categories. To evaluate multi-modal fusion, we construct a DVS+Radar subset for human detection and distance estimation. Baseline experiments using feed-forward and recurrent detectors show that combining DVS with 77GHz Radar consistently improves detection, with recurrent models achieving up to 47.5% mAP and mean absolute Radar distance errors below 1.8m against LiDAR ground truth.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the NERVE multi-sensor dataset featuring synchronized recordings from two Dynamic Vision Sensors (DVS), an RGB-D camera, and two radar units operating at 24GHz and 77GHz. The dataset includes 257 minutes of data from office environments, approximately 914,000 frames, and 9.6 million COCO-formatted annotations for 16 object categories. Baseline experiments on DVS and 77GHz radar fusion for human detection and distance estimation demonstrate improved performance, with recurrent models reaching 47.5% mAP and mean absolute distance errors below 1.8 m using LiDAR as ground truth.
Significance. Should the experimental details be clarified and the ground truth methodology validated, the NERVE dataset could serve as a significant contribution to multi-sensor fusion research in computer vision, particularly for neuromorphic and radar modalities. The large scale and temporal alignment of the data, along with the provision of baseline results, offer a foundation for developing and evaluating fusion algorithms. The work highlights potential benefits of combining event-based vision with radar for detection tasks in indoor settings.
major comments (3)
- [Abstract] Abstract: The claim of mean absolute Radar distance errors below 1.8m is made against LiDAR ground truth, but the described sensor suite consists only of two DVS, one RGB-D camera, and two Radar units. The manuscript should detail the acquisition, synchronization, and error characteristics of the LiDAR data used for ranging evaluation, as this reference is central to validating the distance estimation results.
- [Experiments] Experiments section: The baseline experiments report improvements from DVS+Radar fusion but omit model architectures, training details, hyperparameters, loss functions, and any ablation studies or statistical tests. These omissions make it difficult to assess the robustness of the 47.5% mAP and sub-1.8m error claims.
- [Dataset] Dataset description: While the dataset size and annotation format are specified, additional information on annotation process, inter-annotator agreement, and handling of sensor-specific challenges (e.g., DVS event noise, radar clutter) would strengthen the resource's utility.
minor comments (2)
- [Abstract] Abstract: The repeated use of 'around' for quantities (e.g., around 600GB, around 914,000 frames) could be replaced with more precise figures or ranges if exact counts are available.
- [Throughout] Throughout: Ensure consistent terminology for sensors, such as specifying '77GHz Radar' clearly in all references to avoid ambiguity with the 24GHz unit.
Simulated Author's Rebuttal
We thank the referee for their constructive comments and suggestions. We address each of the major comments below and will revise the manuscript to incorporate the requested clarifications and additional details.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim of mean absolute Radar distance errors below 1.8m is made against LiDAR ground truth, but the described sensor suite consists only of two DVS, one RGB-D camera, and two Radar units. The manuscript should detail the acquisition, synchronization, and error characteristics of the LiDAR data used for ranging evaluation, as this reference is central to validating the distance estimation results.
Authors: We thank the referee for pointing this out. The LiDAR was used solely to provide ground truth for the distance estimation evaluation and is not included in the released dataset. In the revised manuscript, we will add a new subsection in the Dataset or Experiments section describing the LiDAR sensor model, its acquisition setup, synchronization with the other sensors using hardware triggers, and error characteristics derived from calibration and manufacturer data. revision: yes
-
Referee: [Experiments] Experiments section: The baseline experiments report improvements from DVS+Radar fusion but omit model architectures, training details, hyperparameters, loss functions, and any ablation studies or statistical tests. These omissions make it difficult to assess the robustness of the 47.5% mAP and sub-1.8m error claims.
Authors: We agree that these details are essential for reproducibility and assessing the claims. The revised manuscript will include comprehensive descriptions of the model architectures for both feed-forward and recurrent detectors, the training protocols, specific hyperparameters, the loss functions employed, ablation studies on the fusion components, and statistical tests (e.g., paired t-tests) to validate the significance of the performance improvements. revision: yes
-
Referee: [Dataset] Dataset description: While the dataset size and annotation format are specified, additional information on annotation process, inter-annotator agreement, and handling of sensor-specific challenges (e.g., DVS event noise, radar clutter) would strengthen the resource's utility.
Authors: We appreciate this recommendation to enhance the dataset's documentation. We will expand the Dataset section to detail the annotation process, including the software tools used, the number of annotators involved, and the guidelines followed. We will also report inter-annotator agreement using appropriate metrics. Furthermore, we will describe the methods used to handle DVS event noise and radar clutter during the annotation and data preparation stages. revision: yes
Circularity Check
No circularity: empirical dataset paper with direct measurements only
full rationale
The manuscript presents a multi-sensor dataset (DVS, RGB-D, Radar) and reports baseline empirical results for human detection and ranging. No mathematical derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text or abstract. Results such as 47.5% mAP and <1.8 m errors are stated as direct measurements on collected data against an external reference, with no reduction by construction to self-defined quantities or ansatzes. The LiDAR ground-truth reference, while potentially raising separate questions of sensor enumeration, does not create a self-referential loop in any derivation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Temporal synchronization across all five sensors is accurate enough for frame-level fusion
- domain assumption COCO-formatted annotations constitute reliable ground truth for the 16 object categories
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Baseline experiments using feed-forward and recurrent detectors show that combining DVS with 77GHz Radar consistently improves detection, with recurrent models achieving up to 47.5% mAP and mean absolute Radar distance errors below 1.8m against LiDAR ground truth.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We present NERVE ... five sensors: two Dynamic Vision Sensors (DVS), an RGB-D camera, and two Radar units (24GHz and 77GHz).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Event- based vision: A survey,
G. Gallego, T. Delbr ¨uck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidiset al., “Event- based vision: A survey,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 1, pp. 154–180, 2020
work page 2020
-
[2]
Neuromorphic electronic systems,
C. Mead, “Neuromorphic electronic systems,”Proceedings of the IEEE, vol. 78, no. 10, pp. 1629–1636, 1990
work page 1990
-
[3]
A survey of multisensor fusion techniques, architectures and methodologies,
B. Chandrasekaran, S. Gangadhar, and J. M. Conrad, “A survey of multisensor fusion techniques, architectures and methodologies,” in SoutheastCon 2017. IEEE, 2017, pp. 1–8
work page 2017
-
[4]
Vision meets robotics: The KITTI dataset,
A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,”International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013
work page 2013
-
[5]
nuScenes: A multimodal dataset for autonomous driving,
H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuScenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 621–11 631
work page 2020
-
[6]
A large scale event-based detection dataset for automotive,
P. De Tournemire, D. Nitti, E. Perot, D. Migliore, and A. Sironi, “A large scale event-based detection dataset for automotive,”arXiv preprint arXiv:2001.08499, 2020
-
[7]
Learning to detect objects with a 1 megapixel event camera,
E. Perot, P. De Tournemire, D. Nitti, J. Masci, and A. Sironi, “Learning to detect objects with a 1 megapixel event camera,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 16 639– 16 652
work page 2020
-
[8]
Aircraft marshalling signals dataset of radar and event- based camera for sensor fusion,
L. M ¨uller, M. Sifalakis, S. Eissa, S. Afshar, A. van Schaik, and A. Yousefzadeh, “Aircraft marshalling signals dataset of radar and event- based camera for sensor fusion,” inIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5
work page 2023
-
[9]
Mmwave radar and vision fusion for object detection in autonomous driving: A review,
Z. Wei, F. Zhang, S. Chang, Y . Liu, H. Wu, and Z. Feng, “Mmwave radar and vision fusion for object detection in autonomous driving: A review,”Sensors, vol. 22, no. 7, p. 2542, 2022
work page 2022
-
[10]
A. Safa, T. Verbelen, I. Ocket, A. Bourdoux, H. Sahli, F. Catthoor, and G. Gielen, “Fusing event-based camera and radar for SLAM using spiking neural networks with continual STDP learning,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 2782–2788
work page 2023
-
[11]
Ultra-high-frequency harmony: mmwave radar and event camera orchestrate accurate drone landing,
H. Wang, J. Xu, X. Luo, X. Chen, T. Zhang, R. Duan, Y . Liu, and X. Chen, “Ultra-high-frequency harmony: mmwave radar and event camera orchestrate accurate drone landing,” inProceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems (SenSys). ACM, 2025, pp. 15–29
work page 2025
-
[12]
The FAIR guiding principles for scientific data management and stewardship,
M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourneet al., “The FAIR guiding principles for scientific data management and stewardship,”Scientific Data, vol. 3, no. 1, p. 160018, 2016
work page 2016
-
[13]
Converting static image datasets to spiking neuromorphic datasets using saccades,
G. Orchard, A. Jayawant, G. K. Cohen, and N. Thakor, “Converting static image datasets to spiking neuromorphic datasets using saccades,” Frontiers in neuroscience, vol. 9, p. 437, 2015
work page 2015
-
[14]
ESIM: An open event camera simulator,
H. Rebecq, D. Gehrig, and D. Scaramuzza, “ESIM: An open event camera simulator,” inConference on Robot Learning. PMLR, 2018, pp. 969–982
work page 2018
-
[15]
Video to events: Recycling video datasets for event cameras,
D. Gehrig, M. Gehrig, J. Hidalgo-Carri ´o, and D. Scaramuzza, “Video to events: Recycling video datasets for event cameras,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3586–3595
work page 2020
-
[16]
DDD17: End-To-End DAVIS Driving Dataset
J. Binas, D. Neil, S.-C. Liu, and T. Delbruck, “DDD17: End-to-end DA VIS driving dataset,”arXiv preprint arXiv:1711.01458, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[17]
Y . Hu, J. Binas, D. Neil, S.-C. Liu, and T. Delbruck, “DDD20 end- to-end event camera driving dataset: Fusing frames and events with deep learning for improved steering prediction,” inIEEE International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020, pp. 1–6
work page 2020
-
[18]
Neuromorphic vision datasets for pedestrian detection, action recog- nition, and fall detection,
S. Miao, G. Chen, X. Ning, Y . Zi, K. Ren, Z. Bing, and A. Knoll, “Neuromorphic vision datasets for pedestrian detection, action recog- nition, and fall detection,”Frontiers in Neurorobotics, vol. 13, p. 38, 2019
work page 2019
-
[19]
Pedro: An event-based dataset for person detection in robotics,
C. Boretti, P. Bich, F. Pareschi, L. Prono, R. Rovatti, and G. Setti, “Pedro: An event-based dataset for person detection in robotics,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4065–4074
work page 2023
-
[20]
The multivehicle stereo event camera dataset: An event camera dataset for 3d perception,
A. Z. Zhu, D. Thakur, T. ¨Ozaslan, B. Pfrommer, V . Kumar, and K. Daniilidis, “The multivehicle stereo event camera dataset: An event camera dataset for 3d perception,”IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 2032–2039, 2018
work page 2032
-
[21]
Intel RealSense LiDAR Camera L515 Datasheet,
Intel Corporation, “Intel RealSense LiDAR Camera L515 Datasheet,” https://docs.rs-online.com/f31c/A700000006942953.pdf, 2020, revision 002, June 2020
work page 2020
-
[22]
Charuco board-based omnidirectional camera calibration method,
G.-H. An, S. Lee, M.-W. Seo, K. Yun, W.-S. Cheong, and S.-J. Kang, “Charuco board-based omnidirectional camera calibration method,” Electronics, vol. 7, no. 12, p. 421, 2018
work page 2018
-
[23]
G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLOv8,” 2023, available: https://github.com/ultralytics/ultralytics. [Online]. Available: https://github.com/ultralytics/ultralytics
work page 2023
-
[24]
Recurrent vision transformers for object detection with event cameras,
M. Gehrig and D. Scaramuzza, “Recurrent vision transformers for object detection with event cameras,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 884–13 893
work page 2023
-
[25]
A recurrent YOLOv8-based framework for event- based object detection,
D. A. Silva, S. Ahmed, K. Siddique, M. Iacono, P. Morerio, L. Marce- naro, C. Regazzoni, L. Martino, J. Caba, K. Abualsaud, D. Thomas, and P. Vandergheynst, “A recurrent YOLOv8-based framework for event- based object detection,”Frontiers in Neuroscience, vol. 18, p. 1477979, 2024
work page 2024
-
[26]
YOLOX: Exceeding YOLO Series in 2021
Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOX: Exceeding YOLO series in 2021,”arXiv preprint arXiv:2107.08430, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.