pith. sign in

DDD17: End-To-End DAVIS Driving Dataset

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it
abstract

Event cameras, such as dynamic vision sensors (DVS), and dynamic and active-pixel vision sensors (DAVIS) can supplement other autonomous driving sensors by providing a concurrent stream of standard active pixel sensor (APS) images and DVS temporal contrast events. The APS stream is a sequence of standard grayscale global-shutter image sensor frames. The DVS events represent brightness changes occurring at a particular moment, with a jitter of about a millisecond under most lighting conditions. They have a dynamic range of >120 dB and effective frame rates >1 kHz at data rates comparable to 30 fps (frames/second) image sensors. To overcome some of the limitations of current image acquisition technology, we investigate in this work the use of the combined DVS and APS streams in end-to-end driving applications. The dataset DDD17 accompanying this paper is the first open dataset of annotated DAVIS driving recordings. DDD17 has over 12 h of a 346x260 pixel DAVIS sensor recording highway and city driving in daytime, evening, night, dry and wet weather conditions, along with vehicle speed, GPS position, driver steering, throttle, and brake captured from the car's on-board diagnostics interface. As an example application, we performed a preliminary end-to-end learning study of using a convolutional neural network that is trained to predict the instantaneous steering angle from DVS and APS visual data.

fields

cs.CV 3

years

2026 3

representative citing papers

RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding

cs.CV · 2026-05-19 · unverdicted · novelty 5.0 · 2 refs

RE-VLM fuses RGB and event data in a dual-stream VLM with a graph-based pipeline for generating training captions and QA pairs, plus two new datasets, showing gains over RGB-only and event-only baselines especially in challenging conditions.

citing papers explorer

Showing 3 of 3 citing papers.

  • NERVE: A Neuromorphic Vision and Radar Ensemble for Multi-Sensor Fusion Research cs.CV · 2026-05-13 · conditional · none · ref 16 · internal anchor

    NERVE is a new 600GB multi-sensor dataset with DVS, RGB-D, and 24/77GHz radar plus baselines showing DVS+77GHz radar fusion improves human detection to 47.5% mAP with sub-1.8m distance error.

  • Generative Event Pretraining with Foundation Model Alignment cs.CV · 2026-03-24 · unverdicted · none · ref 3 · internal anchor

    GEP transfers semantic knowledge from image foundation models to event data via alignment and generative pretraining on mixed sequences to create transferable event-based visual models.

  • RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding cs.CV · 2026-05-19 · unverdicted · none · ref 2 · 2 links · internal anchor

    RE-VLM fuses RGB and event data in a dual-stream VLM with a graph-based pipeline for generating training captions and QA pairs, plus two new datasets, showing gains over RGB-only and event-only baselines especially in challenging conditions.