Enhancing Event-based Object Detection with Monocular Normal Maps
read the original abstract
Object detection in autonomous driving is frequently compromised by complex illumination. While event cameras offer a robust solution, they are susceptible to sudden contrast changes such as reflections which often trigger dense, misleading event signals. To overcome this, we leverage RGB-derived surface normal maps as explicit geometric constraints. Crucially, even when RGB degrades, they preserve low-frequency structural priors that effectively assist in event-based detection. Consequently, we present NRE-Net, a trimodal framework that integrates structural priors from surface Normal maps, appearance context from RGB images, and high-frequency dynamics from Events. The Adaptive Dual-stream Fusion Module (ADFM) first aligns geometric and appearance cues, followed by the Event-modality Aware Fusion Module (EAFM) which selectively integrates event dynamics. Extensive evaluations on DSEC-Det-sub and PKU-DAVIS-SOD demonstrate that incorporating geometric priors yields an additional 3.0% AP50 gain over dual-modal baselines, while our approach consistently outperforms fusion methods such as SFNet (+2.7%) and SODFormer (+7.1%).
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding
RE-VLM is the first dual-stream VLM combining RGB and event data with a graph-based pipeline to generate training captions and QA pairs, showing gains over RGB-only and event-only models on new datasets for challengin...
-
Sparse Hypergraph-Enhanced Frame-Event Object Detection with Fine-Grained MoE
Hyper-FEOD fuses RGB and event data via sparse hypergraph cross-modal fusion and region-specialized MoE experts to improve accuracy-efficiency in object detection.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.