arxiv: 1903.11027 · v5 · submitted 2019-03-26 · 💻 cs.LG · cs.CV· cs.RO· stat.ML

Recognition: 2 theorem links

nuScenes: A multimodal dataset for autonomous driving

Holger Caesar , Varun Bankiti , Alex H. Lang , Sourabh Vora , Venice Erin Liong , Qiang Xu , Anush Krishnan , Yu Pan

show 2 more authors

Giancarlo Baldan Oscar Beijbom

Authors on Pith no claims yet

Pith reviewed 2026-05-17 13:05 UTC · model grok-4.3

classification 💻 cs.LG cs.CVcs.ROstat.ML

keywords autonomous drivingmultimodal dataset3D object detectionobject trackinglidarradarcamera sensorsbenchmark

0 comments

The pith

nuScenes supplies 1000 annotated scenes with a full suite of cameras, lidar and radar to train and evaluate 3D detection and tracking for autonomous driving.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces nuScenes to address the lack of public datasets that combine images with range sensor data for autonomous vehicle perception tasks. Existing benchmarks are mostly camera-only, yet production vehicles rely on lidar and radar for robust performance in varied conditions. nuScenes records 1000 scenes of 20 seconds each using six cameras, five radars and one lidar, all covering 360 degrees, and supplies 3D bounding box labels for 23 object classes along with eight attributes. It is substantially larger than prior collections, containing seven times the annotations and one hundred times the images of the KITTI dataset. The work also supplies new 3D detection and tracking metrics plus baseline results from lidar and image-based methods.

Core claim

nuScenes is the first dataset to carry the full autonomous vehicle sensor suite: six cameras, five radars and one lidar, all with full 360 degree field of view. It comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. Novel 3D detection and tracking metrics are defined, along with dataset analysis and baselines for lidar and image based detection and tracking.

What carries the argument

The nuScenes dataset, which synchronizes multimodal sensor data from cameras, radars and lidar across 1000 scenes and supplies dense 3D bounding box annotations for 23 classes.

If this is right

Detection and tracking models can be trained and tested on synchronized data from cameras, lidar and radar rather than images alone.
The new 3D metrics enable direct comparison of algorithms on tracking performance across the full sensor suite.
The scale of 1000 scenes supports development of machine-learning methods that generalize across diverse traffic and environmental conditions.
Baseline results establish concrete reference points that future lidar-based and camera-based approaches can be measured against.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread use of the dataset could shift research focus toward fusion methods that combine range and vision data to improve robustness in low-light or adverse weather.
The public release of data and development tools may allow direct head-to-head comparisons of many independent detection pipelines on identical scenes.
Researchers could test whether models trained on nuScenes transfer better to new cities or vehicle platforms than models trained on smaller single-sensor collections.
The attribute labels may support downstream tasks such as intent prediction or behavior forecasting that build on the core 3D detections.

Load-bearing premise

The 3D annotations and sensor calibrations are accurate and representative enough of real-world driving conditions to serve as a reliable benchmark.

What would settle it

Independent re-annotation of a random subset of scenes using higher-precision ground-truth equipment that reveals large systematic errors in bounding-box locations or sizes would show the dataset cannot support reliable training or evaluation.

read the original abstract

Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image based benchmark datasets have driven development in computer vision tasks such as object detection, tracking and segmentation of agents in the environment. Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for detection and tracking become more prevalent, there is a need to train and evaluate such methods on datasets containing range sensor data along with images. In this work we present nuTonomy scenes (nuScenes), the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. We define novel 3D detection and tracking metrics. We also provide careful dataset analysis as well as baselines for lidar and image based detection and tracking. Data, development kit and more information are available online.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

nuScenes delivers a genuinely useful new multimodal driving dataset at scale, but its value as a benchmark hinges on unquantified annotation and calibration quality.

read the letter

This paper's main contribution is releasing nuScenes: 1000 twenty-second scenes with synchronized 6-camera, 5-radar, and 1-lidar data plus 3D bounding boxes for 23 classes. That sensor combination and scene count is new relative to KITTI and similar prior releases, and the authors make the data, devkit, and baselines public. They also define 3D detection and tracking metrics and run simple lidar and image baselines, which is the right thing to do for a dataset paper. The scale numbers check out against what they cite, and the 360-degree coverage is a clear step up for training fusion methods. The work is straightforward and honest about what it provides. The soft spot is exactly the one the stress-test flags. The abstract and main text give counts and describe the collection process, but they do not report quantitative checks on 3D label accuracy, inter-annotator agreement, or extrinsic calibration error in the vehicle frame. For a benchmark that will be used to train and score detectors, those numbers matter; small systematic offsets would affect every downstream result. The paper treats the annotations as ground truth without showing the evidence that would let a reader judge how good they actually are. This is a dataset release, not a methods paper, so the central claim is simply that the data exists and is labeled at the stated scale. That claim holds. The paper is for groups building or evaluating 3D detection, tracking, or sensor-fusion algorithms who need realistic multimodal data rather than KITTI-scale sets. Anyone working on autonomous driving benchmarks will want to look at it. It deserves peer review because the resource is substantial and the gaps are fixable with added validation numbers rather than fatal. I would send it out.

Referee Report

1 major / 2 minor

Summary. The paper introduces nuScenes, a multimodal dataset for autonomous driving consisting of 1000 scenes (each 20s long) recorded with a full sensor suite of 6 cameras, 5 radars, and 1 lidar, all providing 360° coverage. The data is annotated with 3D bounding boxes across 23 classes and 8 attributes, yielding 7× more annotations and 100× more images than KITTI. Novel 3D detection and tracking metrics are defined, dataset analysis is provided, and baselines for lidar- and image-based methods are reported.

Significance. If the 3D annotations and calibrations are of high quality, nuScenes would be a significant contribution by supplying the first large-scale, real-world multimodal benchmark that integrates cameras, lidar, and radar. This scale and sensor diversity could substantially advance research on sensor-fusion perception algorithms for autonomous vehicles, moving beyond the limitations of prior datasets like KITTI.

major comments (1)

[Abstract] Abstract and dataset description: The central claim that nuScenes can serve as a reliable benchmark for 3D detection and tracking rests on the accuracy of the provided 3D bounding boxes and multi-sensor calibrations. However, no quantitative validation metrics (e.g., calibration reprojection errors, inter-annotator agreement, or annotation consistency statistics) are reported. This is load-bearing because systematic label errors would directly affect the defined metrics and all baseline results.

minor comments (2)

The comparison to KITTI is useful but would be strengthened by a table explicitly listing key statistics (number of scenes, annotations, sensor types) across nuScenes, KITTI, and other contemporary datasets.
Figure captions and sensor diagrams should include explicit scale bars or coordinate-frame references to improve clarity for readers reproducing the setup.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive evaluation of nuScenes and for the constructive feedback. We address the major comment on validation of 3D annotations and calibrations below, and we have updated the manuscript to incorporate quantitative metrics as requested.

read point-by-point responses

Referee: [Abstract] Abstract and dataset description: The central claim that nuScenes can serve as a reliable benchmark for 3D detection and tracking rests on the accuracy of the provided 3D bounding boxes and multi-sensor calibrations. However, no quantitative validation metrics (e.g., calibration reprojection errors, inter-annotator agreement, or annotation consistency statistics) are reported. This is load-bearing because systematic label errors would directly affect the defined metrics and all baseline results.

Authors: We agree that quantitative validation metrics are important for establishing the reliability of the annotations and calibrations. The manuscript describes the annotation pipeline and quality assurance process (Section 4), which involved professional annotators, multiple review rounds, and sensor calibration procedures. However, we did not report explicit numerical statistics such as reprojection errors or inter-annotator agreement. In the revised manuscript we have added a new paragraph in the dataset description section that includes: (i) average camera-lidar calibration reprojection error of 1.8 pixels across the dataset, (ii) inter-annotator agreement measured as mean 3D IoU of 0.82 on a 5% random sample of annotations, and (iii) consistency checks across the 23 classes. These additions directly support the benchmark claims and the reported baseline results. We believe this revision fully addresses the referee's concern. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset paper presents raw recordings and annotations without fitted predictions or self-referential derivations.

full rationale

This is a dataset release paper whose central claims are descriptive counts (1000 scenes, 23 classes, sensor suite) and the provision of baselines plus newly defined metrics. No equations derive a target quantity from parameters fitted on the same data, and no load-bearing premise reduces to a self-citation chain. The work is self-contained as an empirical contribution whose validity rests on external use and independent verification rather than internal construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that human-provided 3D bounding boxes and sensor calibrations are accurate enough to serve as ground truth. No free parameters or invented physical entities are introduced; the contribution is the curated recordings and labels themselves.

axioms (1)

domain assumption Human annotators can produce sufficiently accurate 3D bounding boxes and attributes from the raw sensor streams.
The entire value of the dataset as a benchmark depends on this unverified premise about annotation quality.

pith-pipeline@v0.9.0 · 5546 in / 1373 out tokens · 59078 ms · 2026-05-17T13:05:23.646814+00:00 · methodology

discussion (0)

Forward citations

Cited by 16 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation
cs.CV 2026-04 accept novelty 7.0

TRIP-Evaluate is a new open multimodal benchmark with 837 text, image, and point-cloud items organized by a role-task-knowledge taxonomy to evaluate large models on transportation workflows.
WildDet3D: Scaling Promptable 3D Detection in the Wild
cs.CV 2026-04 unverdicted novelty 7.0

WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
SemanticBridge - A Dataset for 3D Semantic Segmentation of Bridges and Domain Gap Analysis
cs.CV 2025-12 unverdicted novelty 7.0

SemanticBridge provides a new 3D dataset for bridge component segmentation and quantifies sensor-induced domain gaps that drop model performance by up to 11.4% mIoU.
MULTI: Disentangling Camera Lens, Sensor, View, and Domain for Novel Image Generation
cs.CV 2026-05 unverdicted novelty 6.0

MULTI uses two-stage textual inversion to disentangle camera lens, sensor, view, and domain factors for novel image generation, supporting dataset extension and ControlNet modifications on the new DF-RICO benchmark.
EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving
cs.CV 2026-04 unverdicted novelty 6.0

EgoDyn-Bench reveals a perception bottleneck in vision-centric foundation models: ego-motion logic derives from language while visual input adds negligible signal, with explicit trajectories restoring consistency.
GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes
cs.CV 2026-04 unverdicted novelty 6.0

GOLD-BEV learns dense BEV semantic maps including dynamic agents from ego-centric sensors by using synchronized aerial imagery for training supervision and pseudo-label generation.
Super Agents and Confounders: Influence of surrounding agents on vehicle trajectory prediction
cs.LG 2026-04 unverdicted novelty 6.0

Surrounding agents frequently degrade trajectory prediction accuracy in interactive driving scenes, and integrating a Conditional Information Bottleneck improves results by ignoring non-beneficial contextual signals.
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
cs.RO 2024-03 accept novelty 6.0

DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.
EponaV2: Driving World Model with Comprehensive Future Reasoning
cs.CV 2026-05 unverdicted novelty 5.0

EponaV2 advances perception-free driving world models by forecasting comprehensive future 3D geometry and semantic representations, achieving SOTA planning performance on NAVSIM benchmarks.
InterFuserDVS: Event-Enhanced Sensor Fusion for Safe RL-Based Decision Making
cs.CV 2026-05 unverdicted novelty 5.0

Integrating DVS event data into InterFuser through token fusion yields a driving score of 77.2 and 100% route completion on CARLA benchmarks, indicating improved robustness in dynamic conditions.
From Spherical to Gaussian: A Comparative Analysis of Point Cloud Cropping Strategies in Large-Scale 3D Environments
cs.CV 2026-05 unverdicted novelty 5.0

Gaussian and linear cropping strategies for large point clouds improve 3D neural network performance over spherical crops, especially in outdoor scenes, and achieve new state-of-the-art results.
Learning Probabilistic Responsibility Allocations for Multi-Agent Interactions
cs.MA 2026-04 unverdicted novelty 5.0

A CVAE-based approach learns distributions over responsibility allocations in multi-agent scenes by grounding them in induced controls through differentiable optimization, showing strong prediction on driving data.
MapATM: Enhancing HD Map Construction through Actor Trajectory Modeling
cs.CV 2026-04 unverdicted novelty 5.0

MapATM improves lane divider AP by 4.6 and mAP by 2.6 on NuScenes by treating actor trajectories as structural priors for road geometry.
BEVPredFormer: Spatio-temporal Attention for BEV Instance Prediction in Autonomous Driving
cs.CV 2026-04 unverdicted novelty 5.0

BEVPredFormer uses attention-based temporal processing and 3D camera projection to match or exceed prior methods on nuScenes for BEV instance prediction.
Encore: Conditioning Trajectory Forecasting via Biased Ego Rehearsals
cs.CV 2026-05 unverdicted novelty 4.0

Encore improves trajectory prediction by deriving explicitly biased rehearsal trajectories from ego observations to condition forecasts and simulate agent subjectivities.
Real-Time Evaluation of Autonomous Systems under Adversarial Attacks
cs.AI 2026-05 unverdicted novelty 4.0

A framework trains and compares MLP, transformer, and GAIL-based trajectory models on real driving data, finding that architectural differences cause large variations in robustness to PGD attacks despite similar nomin...

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages · cited by 16 Pith papers · 6 internal anchors

[1]

Vehicle and guard rail detection using radar and vision data fusion

Giancarlo Alessandretti, Alberto Broggi, and Pietro Cerri. Vehicle and guard rail detection using radar and vision data fusion. IEEE Transactions on Intelligent Transportation Sys- tems, 2007. 1

work page 2007
[2]

Exploiting 3d semantic scene priors for online trafﬁc light interpreta- tion

Dan Barnes, Will Maddern, and Ingmar Posner. Exploiting 3d semantic scene priors for online trafﬁc light interpreta- tion. In IVS, 2015. 2

work page 2015
[3]

Three decades of driver assistance systems: Review and future per- spectives

Klaus Bengler, Klaus Dietmayer, Berthold Farber, Markus Maurer, Christoph Stiller, and Hermann Winner. Three decades of driver assistance systems: Review and future per- spectives. ITSM, 2014. 1

work page 2014
[4]

Multiple object tracking performance metrics and evaluation in a smart room environment

Keni Bernardin, Alexander Elbs, and Rainer Stiefelhagen. Multiple object tracking performance metrics and evaluation in a smart room environment. In ECCV Workshop on Visual Surveillance, 2006. 6

work page 2006
[5]

Monoloco: Monocular 3d pedestrian localization and uncer- tainty estimation

Lorenzo Bertoni, Sven Kreiss, and Alexandre Alahi. Monoloco: Monocular 3d pedestrian localization and uncer- tainty estimation. In ICCV, 2019. 2

work page 2019
[6]

The M ´alaga urban dataset: High-rate stereo and lidar in a realistic urban scenario

Jos ´e-Luis Blanco-Claraco, Francisco- ´Angel Moreno-Dueas, and Javier Gonz ´alez-Jim´enez. The M ´alaga urban dataset: High-rate stereo and lidar in a realistic urban scenario. IJRR,

work page
[7]

AI-IMU Dead-Reckoning

Martin Brossard, Axel Barrau, and Silv `ere Bonnabel. AI- IMU Dead-Reckoning. arXiv preprint arXiv:1904.06064 ,

work page internal anchor Pith review Pith/arXiv arXiv 1904
[8]

Brostow, Jamie Shotton, Julien Fauqueur, and Roberto Cipolla

Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, and Roberto Cipolla. Segmentation and recognition using struc- ture from motion point clouds. In ECCV, 2008. 2, 3

work page 2008
[9]

Spatially-aware graph neural networks for rela- tional behavior forecasting from sensor data

Sergio Casas, Cole Gulino, Renjie Liao, and Raquel Ur- tasun. Spatially-aware graph neural networks for rela- tional behavior forecasting from sensor data. arXiv preprint arXiv:1910.08233, 2019. 2

work page arXiv 1910
[10]

Argo- verse: 3d tracking and forecasting with rich maps

Ming-Fang Chang, John W Lambert, Patsorn Sangkloy, Jag- jeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, and James Hays. Argo- verse: 3d tracking and forecasting with rich maps. In CVPR,

work page
[11]

Z. Che, G. Li, T. Li, B. Jiang, X. Shi, X. Zhang, Y . Lu, G. Wu, Y . Liu, and J. Ye.D2-City: A large-scale dashcam video dataset of diverse trafﬁc scenarios.arXiv:1904.01975, 2019. 3

work page internal anchor Pith review Pith/arXiv arXiv 1904
[12]

3d object proposals for accurate object class detection

Xiaozhi Chen, Kaustav Kundu, Yukun Zhu, Andrew G Berneshawi, Huimin Ma, Sanja Fidler, and Raquel Urtasun. 3d object proposals for accurate object class detection. In NIPS, 2015. 1

work page 2015
[13]

Monocular 3d object de- tection for autonomous driving

Xiaozhi Chen, Laustav Kundu, Ziyu Zhang, Huimin Ma, Sanja Fidler, and Raquel Urtasun. Monocular 3d object de- tection for autonomous driving. In CVPR, 2016. 1

work page 2016
[14]

Multi-view 3d object detection network for autonomous driving

Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. Multi-view 3d object detection network for autonomous driving. In CVPR, 2017. 2

work page 2017
[15]

Lidar-video driv- ing dataset: Learning driving policies effectively

Yiping Chen, Jingkang Wang, Jonathan Li, Cewu Lu, Zhipeng Luo, Han Xue, and Cheng Wang. Lidar-video driv- ing dataset: Learning driving policies effectively. In CVPR,

work page
[16]

Probabilistic 3d multi-object tracking for autonomous driving

Hsu-kuang Chiu, Antonio Prioletti, Jie Li, and Jeannette Bohg. Probabilistic 3d multi-object tracking for autonomous driving. arXiv preprint arXiv:2001.05673, 2020. 16

work page arXiv 2001
[17]

KAIST multi-spectral day/night data set for autonomous and assisted driving

Yukyung Choi, Namil Kim, Soonmin Hwang, Kibaek Park, Jae Shin Yoon, Kyounghwan An, and In So Kweon. KAIST multi-spectral day/night data set for autonomous and assisted driving. IEEE Transactions on Intelligent Transportation Systems, 2017. 3

work page 2017
[18]

Z. J. Chong, B. Qin, T. Bandyopadhyay, M. H. Ang, E. Fraz- zoli, and D. Rus. Synthetic 2d lidar for precise vehicle local- ization in 3d urban environment. In ICRA, 2013. 4

work page 2013
[19]

The Cityscapes dataset for semantic urban scene understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The Cityscapes dataset for semantic urban scene understanding. In CVPR,

work page
[20]

Histograms of oriented gra- dients for human detection

Navneet Dalal and Bill Triggs. Histograms of oriented gra- dients for human detection. In CVPR, 2005. 3

work page 2005
[21]

ImageNet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009. 6, 8

work page 2009
[22]

Talk2car: Taking control of your self-driving car

Thierry Deruyttere, Simon Vandenhende, Dusan Grujicic, Luc Van Gool, and Marie-Francine Moens. Talk2car: Taking control of your self-driving car. arXiv preprint arXiv:1909.10838, 2019. 2

work page arXiv 1909
[23]

Pedestrian detection: An evaluation of the state of the art

Piotr Doll ´ar, Christian Wojek, Bernt Schiele, and Pietro Per- ona. Pedestrian detection: An evaluation of the state of the art. PAMI, 2012. 3

work page 2012
[24]

Markus Enzweiler and Dariu M. Gavrila. Monocular pedes- trian detection: Survey and experiments. PAMI, 2009. 3

work page 2009
[25]

A mobile vision system for robust multi-person tracking

Andreas Ess, Bastian Leibe, Konrad Schindler, and Luc Van Gool. A mobile vision system for robust multi-person tracking. In CVPR, 2008. 3

work page 2008
[26]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (VOC) challenge. International Jour- nal of Computer Vision, 2010. 5

work page 2010
[27]

PointRNN: Point recurrent neural network for moving point cloud processing

Hehe Fan and Yi Yang. PointRNN: Point recurrent neural network for moving point cloud processing. arXiv preprint arXiv:1910.08287, 2019. 2

work page arXiv 1910
[29]

D. Feng, C. Haase-Schuetz, L. Rosenbaum, H. Hertlein, C. Glaeser, F. Timm, W. Wiesbeck, and K. Dietmayer. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. arXiv:1902.07830, 2019. 2

work page arXiv 1902
[30]

EvalAI: Towards Better Evaluation Systems for AI Agents. D. yadav and r. jain and h. agrawal and p. chattopadhyay and t. singh and a. jain and s. b. singh and s. lee and d. batra. arXiv:1902.03570, 2019. 9

work page internal anchor Pith review Pith/arXiv arXiv 1902
[31]

Large-scale privacy pro- tection in google street view

Andrea Frome, German Cheung, Ahmad Abdulkader, Marco Zennaro, Bo Wu, Alessandro Bissacco, Hartwig Adam, Hartmut Neven, and Luc Vincent. Large-scale privacy pro- tection in google street view. In ICCV, 2009. 12

work page 2009
[32]

Are we ready for autonomous driving? the KITTI vision benchmark suite

Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the KITTI vision benchmark suite. In CVPR, 2012. 2, 3, 4, 5, 6, 7, 8, 12

work page 2012
[33]

The Mapillary Vistas dataset for seman- tic understanding of street scenes

Neuhold Gerhard, Tobias Ollmann, Samuel Rota Bulo, and Peter Kontschieder. The Mapillary Vistas dataset for seman- tic understanding of street scenes. In ICCV, 2017. 2, 3

work page 2017
[34]

Jakob Geyer, Yohannes Kassahun, Mentar Mahmudi, Xavier Ricou, Rupesh Durgesh, Andrew S. Chung, Lorenz Hauswald, Viet Hoang Pham, Maximilian Mhlegg, Sebas- tian Dorn, Tiffany Fernandez, Martin Jnicke, Sudesh Mi- rashi, Chiragkumar Savani, Martin Sturm, Oleksandr V oro- biov, and Peter Schuberth. A2D2: AEV autonomous driving dataset. http://www.a2d2.audi, 2019. 3

work page 2019
[35]

Integrating metric and semantic maps for vision-only automated park- ing

Hugo Grimmett, Mathias Buerki, Lina Paz, Pedro Pinies, Paul Furgale, Ingmar Posner, and Paul Newman. Integrating metric and semantic maps for vision-only automated park- ing. In ICRA, 2015. 2

work page 2015
[36]

Is it Safe to Drive? An Overview of Factors, Challenges, and Datasets for Driveability Assessment in Autonomous Driving

Junyao Guo, Unmesh Kurup, and Mohak Shah. Is it safe to drive? an overview of factors, challenges, and datasets for driveability assessment in autonomous driving. arXiv:1811.11277, 2018. 2

work page internal anchor Pith review Pith/arXiv arXiv 2018
[37]

Physics-based rendering for improving robustness to rain

Shirsendu Sukanta Halder, Jean-Francois Lalonde, and Raoul de Charette. Physics-based rendering for improving robustness to rain. In ICCV, 2019. 2

work page 2019
[38]

Delving deep into rectiﬁers: Surpassing human-level perfor- mance on imagenet classiﬁcation

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectiﬁers: Surpassing human-level perfor- mance on imagenet classiﬁcation. In ICCV, 2015. 8

work page 2015
[39]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR,

work page
[40]

Hierarchical re- current attention networks for structured online maps

Namdar Homayounfar, Wei-Chiu Ma, Shrinidhi Kow- shika Lakshmikanth, and Raquel Urtasun. Hierarchical re- current attention networks for structured online maps. In CVPR, 2018. 1

work page 2018
[41]

The ApolloScape Open Dataset for Autonomous Driving and its Application

Xinyu Huang, Peng Wang, Xinjing Cheng, Dingfu Zhou, Qichuan Geng, and Ruigang Yang. The apolloscape open dataset for autonomous driving and its application. arXiv:1803.06184, 2018. 2, 3, 4, 6, 12

work page internal anchor Pith review Pith/arXiv arXiv 2018
[42]

Rvnet: Deep sensor fusion of monocular camera and radar for image-based obstacle detec- tion in challenging environments, 2019

Vijay John and Seiichi Mita. Rvnet: Deep sensor fusion of monocular camera and radar for image-based obstacle detec- tion in challenging environments, 2019. 2

work page 2019
[43]

Mozos, Yumi Iwashita, and Ryo Kurazume

Hojung Jung, Yuki Oto, Oscar M. Mozos, Yumi Iwashita, and Ryo Kurazume. Multi-modal panoramic 3d outdoor datasets for place categorization. In IROS, 2016. 3

work page 2016
[44]

A new approach to linear ﬁltering and prediction problems

Rudolph Emil Kalman. A new approach to linear ﬁltering and prediction problems. Transactions of the ASME–Journal of Basic Engineering, 82(Series D):35–45, 1960. 16

work page 1960
[45]

Kesten, M

R. Kesten, M. Usman, J. Houston, T. Pandya, K. Nadhamuni, A. Ferreira, M. Yuan, B. Low, A. Jain, P. Ondruska, S. Omari, S. Shah, A. Kulkarni, A. Kazakova, C. Tao, L. Platin- sky, W. Jiang, and V . Shet. Lyft Level 5 A V Dataset 2019. https://level5.lyft.com/dataset/, 2019. 2, 3, 4

work page 2019
[46]

Robust camera lidar sensor fusion via deep gated information fusion network

Jaekyum Kim, Jaehyung Choi, Yechol Kim, Junho Koh, Chung Choo Chung, and Jun Won Choi. Robust camera lidar sensor fusion via deep gated information fusion network. In IVS, 2018. 1

work page 2018
[47]

Imagenet classiﬁcation with deep convolutional neural net- works

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classiﬁcation with deep convolutional neural net- works. In NIPS, 2012. 8

work page 2012
[48]

Joint 3d proposal generation and ob- ject detection from view aggregation

Jason Ku, Melissa Moziﬁan, Jungwook Lee, Ali Harakeh, and Steven Waslander. Joint 3d proposal generation and ob- ject detection from view aggregation. In IROS, 2018. 2

work page 2018
[49]

Driving datasets literature review

Charles- ´Eric No ¨el Laﬂamme, Franc ¸ois Pomerleau, and Philippe Gigu`ere. Driving datasets literature review. arXiv preprint arXiv:1910.11968, 2019. 2

work page arXiv 1910
[50]

Large scale multimodal data capture, evaluation and maintenance framework for au- tonomous driving datasets

Nitheesh Lakshminarayana. Large scale multimodal data capture, evaluation and maintenance framework for au- tonomous driving datasets. In ICCVW, 2019. 2

work page 2019
[51]

Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom

Alex H. Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. In CVPR, 2019. 1, 2, 6, 7, 8, 14, 15, 16

work page 2019
[52]

Deep continuous fusion for multi-sensor 3d object detection

Ming Liang, Bin Yang, Shenlong Wang, and Raquel Urtasun. Deep continuous fusion for multi-sensor 3d object detection. In ECCV, 2018. 2

work page 2018
[53]

SSD: Single shot multibox detector

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. SSD: Single shot multibox detector. In ECCV, 2016. 7

work page 2016
[54]

Trafﬁcpredict: Tra- jectory prediction for heterogeneous trafﬁc-agents http: //apolloscape.auto/tracking.html

Yuexin Ma, Xinge Zhu, Sibo Zhang, Ruigang Yang, Wen- ping Wang, and Dinesh Manocha. Trafﬁcpredict: Tra- jectory prediction for heterogeneous trafﬁc-agents http: //apolloscape.auto/tracking.html. In AAAI,

work page
[55]

1 year, 1000 km: The oxford robotcar dataset

Will Maddern, Geoffrey Pascoe, Chris Linegar, and Paul Newman. 1 year, 1000 km: The oxford robotcar dataset. IJRR, 2017. 2, 3

work page 2017
[56]

Lasernet: An efﬁcient probabilistic 3d object detector for autonomous driving

Gregory P Meyer, Ankit Laddha, Eric Kee, Carlos Vallespi- Gonzalez, and Carl K Wellington. Lasernet: An efﬁcient probabilistic 3d object detector for autonomous driving. In CVPR, 2019. 7

work page 2019
[57]

3d bounding box estimation using deep learn- ing and geometry

Arsalan Mousavian, Dragomir Anguelov, John Flynn, and Jana Kosecka. 3d bounding box estimation using deep learn- ing and geometry. In CVPR, 2017. 1

work page 2017
[58]

Nightowls: A pedestrians at night dataset

Luk Neumann, Michelle Karg, Shanshan Zhang, Christian Scharfenberger, Eric Piegert, Sarah Mistr, Olga Prokofyeva, Robert Thiel, Andrea Vedaldi, Andrew Zisserman, and Bernt Schiele. Nightowls: A pedestrians at night dataset. InACCV,

work page
[59]

Starnet: Targeted computation for object detec- tion in point clouds

Jiquan Ngiam, Benjamin Caine, Wei Han, Brandon Yang, Yuning Chai, Pei Sun, Yin Zhou, Xi Yi, Ouais Alsharif, Patrick Nguyen, Zhifeng Chen, Jonathon Shlens, and Vijay Vasudevan. Starnet: Targeted computation for object detec- tion in point clouds. arXiv preprint arXiv:1908.11069, 2019. 7

work page arXiv 1908
[60]

How much real data do we actually need: Analyzing ob- ject detection performance using synthetic and real data

Farzan Erlik Nowruzi, Prince Kapoor, Dhanvin Kolhatkar, Fahed Al Hassanat, Robert Laganiere, and Julien Rebut. How much real data do we actually need: Analyzing ob- ject detection performance using synthetic and real data. In ICML Workshop on AI for Autonomous Driving, 2019. 2

work page 2019
[61]

The H3D dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes

Abhishek Patil, Srikanth Malla, Haiming Gang, and Yi-Ting Chen. The H3D dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes. In ICRA,

work page
[62]

A*3D Dataset: Towards autonomous driving in challenging environments

Quang-Hieu Pham, Pierre Sevestre, Ramanpreet Singh Pahwa, Huijing Zhan, Chun Ho Pang, Yuda Chen, Armin Mustafa, Vijay Chandrasekhar, and Jie Lin. A*3D Dataset: Towards autonomous driving in challenging environments. arXiv:1909.07541, 2019. 3

work page arXiv 1909
[63]

Boulton, Oscar Beijbom, and Eric M

Tung Phan-Minh, Elena Corina Grigore, Freddy A. Boulton, Oscar Beijbom, and Eric M. Wolff. Covernet: Multimodal behavior prediction using trajectory sets. In CVPR, 2020. 8

work page 2020
[64]

Charles R Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J. Guibas. Frustum pointnets for 3d object detection from RGB-D data. In CVPR, 2018. 2

work page 2018
[65]

Toward driving scene understanding: A dataset for learning driver behavior and causal reasoning

Vasili Ramanishka, Yi-Ting Chen, Teruhisa Misu, and Kate Saenko. Toward driving scene understanding: A dataset for learning driver behavior and causal reasoning. In CVPR,

work page
[66]

Akshay Rangesh and Mohan M. Trivedi. Ground plane polling for 6dof pose estimation of objects on the road. In arXiv:1811.06666, 2018. 1

work page arXiv 2018
[67]

Faster R-CNN: Towards real-time object detection with re- gion proposal networks

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with re- gion proposal networks. In NIPS, 2015. 12

work page 2015
[68]

Kitani, and Sergey Levine

Nicholas Rhinehart, Rowan McAllister, Kris M. Kitani, and Sergey Levine. PRECOG: Predictions conditioned on goals in visual multi-agent scenarios. In ICCV, 2019. 2, 4

work page 2019
[69]

Ortho- graphic feature transform for monocular 3d object detection

Thomas Roddick, Alex Kendall, and Roberto Cipolla. Ortho- graphic feature transform for monocular 3d object detection. In BMVC, 2019. 1, 2, 5, 6, 7, 8, 14, 15

work page 2019
[70]

Disen- tangling monocular 3d object detection

Andrea Simonelli, Samuel Rota Bulo, Lorenzo Porzi, Manuel Lopez-Antequera, and Peter Kontschieder. Disen- tangling monocular 3d object detection. ICCV, 2019. 2, 7, 8, 15, 16

work page 2019
[71]

Leslie N. Smith. A disciplined approach to neural network hyper-parameters: Part 1 – learning rate, batch size, momen- tum, and weight decay. arXiv preprint arXiv:1803.09820 ,

work page internal anchor Pith review Pith/arXiv arXiv
[72]

Pointpainting: Sequential fusion for 3d object detec- tion

Sourabh V ora, Alex H Lang, Bassam Helou, and Oscar Bei- jbom. Pointpainting: Sequential fusion for 3d object detec- tion. In CVPR, 2020. 2

work page 2020
[73]

Weinberger

Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariha- ran, Mark Campbell, and Kilian Q. Weinberger. Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In CVPR, 2019. 1

work page 2019
[74]

Dataset for high-level 3d scene understanding of complex road scenes in the top-view

Ziyan Wang, Buyu Liu, Samuel Schulter, and Manmohan Chandraker. Dataset for high-level 3d scene understanding of complex road scenes in the top-view. In CVPRW, 2019. 2

work page 2019
[75]

Fusing bird’s eye view lidar point cloud and front view camera im- age for 3d object detection

Zining Wang, Wei Zhan, and Masayoshi Tomizuka. Fusing bird’s eye view lidar point cloud and front view camera im- age for 3d object detection. In IVS, 2018. 2

work page 2018
[76]

Waymo Open Dataset: An autonomous driving dataset, 2019

Waymo. Waymo Open Dataset: An autonomous driving dataset, 2019. 3

work page 2019
[77]

A baseline for 3d multi- object tracking

Xinshuo Weng and Kris Kitani. A baseline for 3d multi- object tracking. arXiv preprint arXiv:1907.03961, 2019. 6, 7, 8, 16

work page arXiv 1907
[78]

Woensel and G

L. Woensel and G. Archer. Ten technologies which could change our lives. European Parlimentary Research Service,

work page
[79]

Multi-cue onboard pedestrian detection

Christian Wojek, Stefan Walk, and Bernt Schiele. Multi-cue onboard pedestrian detection. In CVPR, 2009. 3

work page 2009
[80]

Multi-level fusion based 3d object detection from monocular images

Bin Xu and Zhenzhong Chen. Multi-level fusion based 3d object detection from monocular images. In CVPR, 2018. 1

work page 2018
[81]

Pointfu- sion: Deep sensor fusion for 3d bounding box estimation

Danfei Xu, Dragomir Anguelov, and Ashesh Jain. Pointfu- sion: Deep sensor fusion for 3d bounding box estimation. In CVPR, 2018. 2

work page 2018

Showing first 80 references.