Recognition: 2 theorem links
nuScenes: A multimodal dataset for autonomous driving
Pith reviewed 2026-05-17 13:05 UTC · model grok-4.3
The pith
nuScenes supplies 1000 annotated scenes with a full suite of cameras, lidar and radar to train and evaluate 3D detection and tracking for autonomous driving.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
nuScenes is the first dataset to carry the full autonomous vehicle sensor suite: six cameras, five radars and one lidar, all with full 360 degree field of view. It comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. Novel 3D detection and tracking metrics are defined, along with dataset analysis and baselines for lidar and image based detection and tracking.
What carries the argument
The nuScenes dataset, which synchronizes multimodal sensor data from cameras, radars and lidar across 1000 scenes and supplies dense 3D bounding box annotations for 23 classes.
If this is right
- Detection and tracking models can be trained and tested on synchronized data from cameras, lidar and radar rather than images alone.
- The new 3D metrics enable direct comparison of algorithms on tracking performance across the full sensor suite.
- The scale of 1000 scenes supports development of machine-learning methods that generalize across diverse traffic and environmental conditions.
- Baseline results establish concrete reference points that future lidar-based and camera-based approaches can be measured against.
Where Pith is reading between the lines
- Widespread use of the dataset could shift research focus toward fusion methods that combine range and vision data to improve robustness in low-light or adverse weather.
- The public release of data and development tools may allow direct head-to-head comparisons of many independent detection pipelines on identical scenes.
- Researchers could test whether models trained on nuScenes transfer better to new cities or vehicle platforms than models trained on smaller single-sensor collections.
- The attribute labels may support downstream tasks such as intent prediction or behavior forecasting that build on the core 3D detections.
Load-bearing premise
The 3D annotations and sensor calibrations are accurate and representative enough of real-world driving conditions to serve as a reliable benchmark.
What would settle it
Independent re-annotation of a random subset of scenes using higher-precision ground-truth equipment that reveals large systematic errors in bounding-box locations or sizes would show the dataset cannot support reliable training or evaluation.
read the original abstract
Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image based benchmark datasets have driven development in computer vision tasks such as object detection, tracking and segmentation of agents in the environment. Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for detection and tracking become more prevalent, there is a need to train and evaluate such methods on datasets containing range sensor data along with images. In this work we present nuTonomy scenes (nuScenes), the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. We define novel 3D detection and tracking metrics. We also provide careful dataset analysis as well as baselines for lidar and image based detection and tracking. Data, development kit and more information are available online.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces nuScenes, a multimodal dataset for autonomous driving consisting of 1000 scenes (each 20s long) recorded with a full sensor suite of 6 cameras, 5 radars, and 1 lidar, all providing 360° coverage. The data is annotated with 3D bounding boxes across 23 classes and 8 attributes, yielding 7× more annotations and 100× more images than KITTI. Novel 3D detection and tracking metrics are defined, dataset analysis is provided, and baselines for lidar- and image-based methods are reported.
Significance. If the 3D annotations and calibrations are of high quality, nuScenes would be a significant contribution by supplying the first large-scale, real-world multimodal benchmark that integrates cameras, lidar, and radar. This scale and sensor diversity could substantially advance research on sensor-fusion perception algorithms for autonomous vehicles, moving beyond the limitations of prior datasets like KITTI.
major comments (1)
- [Abstract] Abstract and dataset description: The central claim that nuScenes can serve as a reliable benchmark for 3D detection and tracking rests on the accuracy of the provided 3D bounding boxes and multi-sensor calibrations. However, no quantitative validation metrics (e.g., calibration reprojection errors, inter-annotator agreement, or annotation consistency statistics) are reported. This is load-bearing because systematic label errors would directly affect the defined metrics and all baseline results.
minor comments (2)
- The comparison to KITTI is useful but would be strengthened by a table explicitly listing key statistics (number of scenes, annotations, sensor types) across nuScenes, KITTI, and other contemporary datasets.
- Figure captions and sensor diagrams should include explicit scale bars or coordinate-frame references to improve clarity for readers reproducing the setup.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation of nuScenes and for the constructive feedback. We address the major comment on validation of 3D annotations and calibrations below, and we have updated the manuscript to incorporate quantitative metrics as requested.
read point-by-point responses
-
Referee: [Abstract] Abstract and dataset description: The central claim that nuScenes can serve as a reliable benchmark for 3D detection and tracking rests on the accuracy of the provided 3D bounding boxes and multi-sensor calibrations. However, no quantitative validation metrics (e.g., calibration reprojection errors, inter-annotator agreement, or annotation consistency statistics) are reported. This is load-bearing because systematic label errors would directly affect the defined metrics and all baseline results.
Authors: We agree that quantitative validation metrics are important for establishing the reliability of the annotations and calibrations. The manuscript describes the annotation pipeline and quality assurance process (Section 4), which involved professional annotators, multiple review rounds, and sensor calibration procedures. However, we did not report explicit numerical statistics such as reprojection errors or inter-annotator agreement. In the revised manuscript we have added a new paragraph in the dataset description section that includes: (i) average camera-lidar calibration reprojection error of 1.8 pixels across the dataset, (ii) inter-annotator agreement measured as mean 3D IoU of 0.82 on a 5% random sample of annotations, and (iii) consistency checks across the 23 classes. These additions directly support the benchmark claims and the reported baseline results. We believe this revision fully addresses the referee's concern. revision: yes
Circularity Check
No circularity: dataset paper presents raw recordings and annotations without fitted predictions or self-referential derivations.
full rationale
This is a dataset release paper whose central claims are descriptive counts (1000 scenes, 23 classes, sensor suite) and the provision of baselines plus newly defined metrics. No equations derive a target quantity from parameters fitted on the same data, and no load-bearing premise reduces to a self-citation chain. The work is self-contained as an empirical contribution whose validity rests on external use and independent verification rather than internal construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Human annotators can produce sufficiently accurate 3D bounding boxes and attributes from the raw sensor streams.
Forward citations
Cited by 16 Pith papers
-
TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation
TRIP-Evaluate is a new open multimodal benchmark with 837 text, image, and point-cloud items organized by a role-task-knowledge taxonomy to evaluate large models on transportation workflows.
-
WildDet3D: Scaling Promptable 3D Detection in the Wild
WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
-
SemanticBridge - A Dataset for 3D Semantic Segmentation of Bridges and Domain Gap Analysis
SemanticBridge provides a new 3D dataset for bridge component segmentation and quantifies sensor-induced domain gaps that drop model performance by up to 11.4% mIoU.
-
MULTI: Disentangling Camera Lens, Sensor, View, and Domain for Novel Image Generation
MULTI uses two-stage textual inversion to disentangle camera lens, sensor, view, and domain factors for novel image generation, supporting dataset extension and ControlNet modifications on the new DF-RICO benchmark.
-
EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving
EgoDyn-Bench reveals a perception bottleneck in vision-centric foundation models: ego-motion logic derives from language while visual input adds negligible signal, with explicit trajectories restoring consistency.
-
GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes
GOLD-BEV learns dense BEV semantic maps including dynamic agents from ego-centric sensors by using synchronized aerial imagery for training supervision and pseudo-label generation.
-
Super Agents and Confounders: Influence of surrounding agents on vehicle trajectory prediction
Surrounding agents frequently degrade trajectory prediction accuracy in interactive driving scenes, and integrating a Conditional Information Bottleneck improves results by ignoring non-beneficial contextual signals.
-
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.
-
EponaV2: Driving World Model with Comprehensive Future Reasoning
EponaV2 advances perception-free driving world models by forecasting comprehensive future 3D geometry and semantic representations, achieving SOTA planning performance on NAVSIM benchmarks.
-
InterFuserDVS: Event-Enhanced Sensor Fusion for Safe RL-Based Decision Making
Integrating DVS event data into InterFuser through token fusion yields a driving score of 77.2 and 100% route completion on CARLA benchmarks, indicating improved robustness in dynamic conditions.
-
From Spherical to Gaussian: A Comparative Analysis of Point Cloud Cropping Strategies in Large-Scale 3D Environments
Gaussian and linear cropping strategies for large point clouds improve 3D neural network performance over spherical crops, especially in outdoor scenes, and achieve new state-of-the-art results.
-
Learning Probabilistic Responsibility Allocations for Multi-Agent Interactions
A CVAE-based approach learns distributions over responsibility allocations in multi-agent scenes by grounding them in induced controls through differentiable optimization, showing strong prediction on driving data.
-
MapATM: Enhancing HD Map Construction through Actor Trajectory Modeling
MapATM improves lane divider AP by 4.6 and mAP by 2.6 on NuScenes by treating actor trajectories as structural priors for road geometry.
-
BEVPredFormer: Spatio-temporal Attention for BEV Instance Prediction in Autonomous Driving
BEVPredFormer uses attention-based temporal processing and 3D camera projection to match or exceed prior methods on nuScenes for BEV instance prediction.
-
Encore: Conditioning Trajectory Forecasting via Biased Ego Rehearsals
Encore improves trajectory prediction by deriving explicitly biased rehearsal trajectories from ego observations to condition forecasts and simulate agent subjectivities.
-
Real-Time Evaluation of Autonomous Systems under Adversarial Attacks
A framework trains and compares MLP, transformer, and GAIL-based trajectory models on real driving data, finding that architectural differences cause large variations in robustness to PGD attacks despite similar nomin...
Reference graph
Works this paper leans on
-
[1]
Vehicle and guard rail detection using radar and vision data fusion
Giancarlo Alessandretti, Alberto Broggi, and Pietro Cerri. Vehicle and guard rail detection using radar and vision data fusion. IEEE Transactions on Intelligent Transportation Sys- tems, 2007. 1
work page 2007
-
[2]
Exploiting 3d semantic scene priors for online traffic light interpreta- tion
Dan Barnes, Will Maddern, and Ingmar Posner. Exploiting 3d semantic scene priors for online traffic light interpreta- tion. In IVS, 2015. 2
work page 2015
-
[3]
Three decades of driver assistance systems: Review and future per- spectives
Klaus Bengler, Klaus Dietmayer, Berthold Farber, Markus Maurer, Christoph Stiller, and Hermann Winner. Three decades of driver assistance systems: Review and future per- spectives. ITSM, 2014. 1
work page 2014
-
[4]
Multiple object tracking performance metrics and evaluation in a smart room environment
Keni Bernardin, Alexander Elbs, and Rainer Stiefelhagen. Multiple object tracking performance metrics and evaluation in a smart room environment. In ECCV Workshop on Visual Surveillance, 2006. 6
work page 2006
-
[5]
Monoloco: Monocular 3d pedestrian localization and uncer- tainty estimation
Lorenzo Bertoni, Sven Kreiss, and Alexandre Alahi. Monoloco: Monocular 3d pedestrian localization and uncer- tainty estimation. In ICCV, 2019. 2
work page 2019
-
[6]
The M ´alaga urban dataset: High-rate stereo and lidar in a realistic urban scenario
Jos ´e-Luis Blanco-Claraco, Francisco- ´Angel Moreno-Dueas, and Javier Gonz ´alez-Jim´enez. The M ´alaga urban dataset: High-rate stereo and lidar in a realistic urban scenario. IJRR,
-
[7]
Martin Brossard, Axel Barrau, and Silv `ere Bonnabel. AI- IMU Dead-Reckoning. arXiv preprint arXiv:1904.06064 ,
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[8]
Brostow, Jamie Shotton, Julien Fauqueur, and Roberto Cipolla
Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, and Roberto Cipolla. Segmentation and recognition using struc- ture from motion point clouds. In ECCV, 2008. 2, 3
work page 2008
-
[9]
Spatially-aware graph neural networks for rela- tional behavior forecasting from sensor data
Sergio Casas, Cole Gulino, Renjie Liao, and Raquel Ur- tasun. Spatially-aware graph neural networks for rela- tional behavior forecasting from sensor data. arXiv preprint arXiv:1910.08233, 2019. 2
-
[10]
Argo- verse: 3d tracking and forecasting with rich maps
Ming-Fang Chang, John W Lambert, Patsorn Sangkloy, Jag- jeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, and James Hays. Argo- verse: 3d tracking and forecasting with rich maps. In CVPR,
-
[11]
Z. Che, G. Li, T. Li, B. Jiang, X. Shi, X. Zhang, Y . Lu, G. Wu, Y . Liu, and J. Ye.D2-City: A large-scale dashcam video dataset of diverse traffic scenarios.arXiv:1904.01975, 2019. 3
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[12]
3d object proposals for accurate object class detection
Xiaozhi Chen, Kaustav Kundu, Yukun Zhu, Andrew G Berneshawi, Huimin Ma, Sanja Fidler, and Raquel Urtasun. 3d object proposals for accurate object class detection. In NIPS, 2015. 1
work page 2015
-
[13]
Monocular 3d object de- tection for autonomous driving
Xiaozhi Chen, Laustav Kundu, Ziyu Zhang, Huimin Ma, Sanja Fidler, and Raquel Urtasun. Monocular 3d object de- tection for autonomous driving. In CVPR, 2016. 1
work page 2016
-
[14]
Multi-view 3d object detection network for autonomous driving
Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. Multi-view 3d object detection network for autonomous driving. In CVPR, 2017. 2
work page 2017
-
[15]
Lidar-video driv- ing dataset: Learning driving policies effectively
Yiping Chen, Jingkang Wang, Jonathan Li, Cewu Lu, Zhipeng Luo, Han Xue, and Cheng Wang. Lidar-video driv- ing dataset: Learning driving policies effectively. In CVPR,
-
[16]
Probabilistic 3d multi-object tracking for autonomous driving
Hsu-kuang Chiu, Antonio Prioletti, Jie Li, and Jeannette Bohg. Probabilistic 3d multi-object tracking for autonomous driving. arXiv preprint arXiv:2001.05673, 2020. 16
-
[17]
KAIST multi-spectral day/night data set for autonomous and assisted driving
Yukyung Choi, Namil Kim, Soonmin Hwang, Kibaek Park, Jae Shin Yoon, Kyounghwan An, and In So Kweon. KAIST multi-spectral day/night data set for autonomous and assisted driving. IEEE Transactions on Intelligent Transportation Systems, 2017. 3
work page 2017
-
[18]
Z. J. Chong, B. Qin, T. Bandyopadhyay, M. H. Ang, E. Fraz- zoli, and D. Rus. Synthetic 2d lidar for precise vehicle local- ization in 3d urban environment. In ICRA, 2013. 4
work page 2013
-
[19]
The Cityscapes dataset for semantic urban scene understanding
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The Cityscapes dataset for semantic urban scene understanding. In CVPR,
-
[20]
Histograms of oriented gra- dients for human detection
Navneet Dalal and Bill Triggs. Histograms of oriented gra- dients for human detection. In CVPR, 2005. 3
work page 2005
-
[21]
ImageNet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009. 6, 8
work page 2009
-
[22]
Talk2car: Taking control of your self-driving car
Thierry Deruyttere, Simon Vandenhende, Dusan Grujicic, Luc Van Gool, and Marie-Francine Moens. Talk2car: Taking control of your self-driving car. arXiv preprint arXiv:1909.10838, 2019. 2
-
[23]
Pedestrian detection: An evaluation of the state of the art
Piotr Doll ´ar, Christian Wojek, Bernt Schiele, and Pietro Per- ona. Pedestrian detection: An evaluation of the state of the art. PAMI, 2012. 3
work page 2012
-
[24]
Markus Enzweiler and Dariu M. Gavrila. Monocular pedes- trian detection: Survey and experiments. PAMI, 2009. 3
work page 2009
-
[25]
A mobile vision system for robust multi-person tracking
Andreas Ess, Bastian Leibe, Konrad Schindler, and Luc Van Gool. A mobile vision system for robust multi-person tracking. In CVPR, 2008. 3
work page 2008
-
[26]
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (VOC) challenge. International Jour- nal of Computer Vision, 2010. 5
work page 2010
-
[27]
PointRNN: Point recurrent neural network for moving point cloud processing
Hehe Fan and Yi Yang. PointRNN: Point recurrent neural network for moving point cloud processing. arXiv preprint arXiv:1910.08287, 2019. 2
- [29]
-
[30]
EvalAI: Towards Better Evaluation Systems for AI Agents. D. yadav and r. jain and h. agrawal and p. chattopadhyay and t. singh and a. jain and s. b. singh and s. lee and d. batra. arXiv:1902.03570, 2019. 9
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[31]
Large-scale privacy pro- tection in google street view
Andrea Frome, German Cheung, Ahmad Abdulkader, Marco Zennaro, Bo Wu, Alessandro Bissacco, Hartwig Adam, Hartmut Neven, and Luc Vincent. Large-scale privacy pro- tection in google street view. In ICCV, 2009. 12
work page 2009
-
[32]
Are we ready for autonomous driving? the KITTI vision benchmark suite
Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the KITTI vision benchmark suite. In CVPR, 2012. 2, 3, 4, 5, 6, 7, 8, 12
work page 2012
-
[33]
The Mapillary Vistas dataset for seman- tic understanding of street scenes
Neuhold Gerhard, Tobias Ollmann, Samuel Rota Bulo, and Peter Kontschieder. The Mapillary Vistas dataset for seman- tic understanding of street scenes. In ICCV, 2017. 2, 3
work page 2017
-
[34]
Jakob Geyer, Yohannes Kassahun, Mentar Mahmudi, Xavier Ricou, Rupesh Durgesh, Andrew S. Chung, Lorenz Hauswald, Viet Hoang Pham, Maximilian Mhlegg, Sebas- tian Dorn, Tiffany Fernandez, Martin Jnicke, Sudesh Mi- rashi, Chiragkumar Savani, Martin Sturm, Oleksandr V oro- biov, and Peter Schuberth. A2D2: AEV autonomous driving dataset. http://www.a2d2.audi, 2019. 3
work page 2019
-
[35]
Integrating metric and semantic maps for vision-only automated park- ing
Hugo Grimmett, Mathias Buerki, Lina Paz, Pedro Pinies, Paul Furgale, Ingmar Posner, and Paul Newman. Integrating metric and semantic maps for vision-only automated park- ing. In ICRA, 2015. 2
work page 2015
-
[36]
Junyao Guo, Unmesh Kurup, and Mohak Shah. Is it safe to drive? an overview of factors, challenges, and datasets for driveability assessment in autonomous driving. arXiv:1811.11277, 2018. 2
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[37]
Physics-based rendering for improving robustness to rain
Shirsendu Sukanta Halder, Jean-Francois Lalonde, and Raoul de Charette. Physics-based rendering for improving robustness to rain. In ICCV, 2019. 2
work page 2019
-
[38]
Delving deep into rectifiers: Surpassing human-level perfor- mance on imagenet classification
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level perfor- mance on imagenet classification. In ICCV, 2015. 8
work page 2015
-
[39]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR,
-
[40]
Hierarchical re- current attention networks for structured online maps
Namdar Homayounfar, Wei-Chiu Ma, Shrinidhi Kow- shika Lakshmikanth, and Raquel Urtasun. Hierarchical re- current attention networks for structured online maps. In CVPR, 2018. 1
work page 2018
-
[41]
The ApolloScape Open Dataset for Autonomous Driving and its Application
Xinyu Huang, Peng Wang, Xinjing Cheng, Dingfu Zhou, Qichuan Geng, and Ruigang Yang. The apolloscape open dataset for autonomous driving and its application. arXiv:1803.06184, 2018. 2, 3, 4, 6, 12
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[42]
Vijay John and Seiichi Mita. Rvnet: Deep sensor fusion of monocular camera and radar for image-based obstacle detec- tion in challenging environments, 2019. 2
work page 2019
-
[43]
Mozos, Yumi Iwashita, and Ryo Kurazume
Hojung Jung, Yuki Oto, Oscar M. Mozos, Yumi Iwashita, and Ryo Kurazume. Multi-modal panoramic 3d outdoor datasets for place categorization. In IROS, 2016. 3
work page 2016
-
[44]
A new approach to linear filtering and prediction problems
Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering, 82(Series D):35–45, 1960. 16
work page 1960
-
[45]
R. Kesten, M. Usman, J. Houston, T. Pandya, K. Nadhamuni, A. Ferreira, M. Yuan, B. Low, A. Jain, P. Ondruska, S. Omari, S. Shah, A. Kulkarni, A. Kazakova, C. Tao, L. Platin- sky, W. Jiang, and V . Shet. Lyft Level 5 A V Dataset 2019. https://level5.lyft.com/dataset/, 2019. 2, 3, 4
work page 2019
-
[46]
Robust camera lidar sensor fusion via deep gated information fusion network
Jaekyum Kim, Jaehyung Choi, Yechol Kim, Junho Koh, Chung Choo Chung, and Jun Won Choi. Robust camera lidar sensor fusion via deep gated information fusion network. In IVS, 2018. 1
work page 2018
-
[47]
Imagenet classification with deep convolutional neural net- works
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural net- works. In NIPS, 2012. 8
work page 2012
-
[48]
Joint 3d proposal generation and ob- ject detection from view aggregation
Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven Waslander. Joint 3d proposal generation and ob- ject detection from view aggregation. In IROS, 2018. 2
work page 2018
-
[49]
Driving datasets literature review
Charles- ´Eric No ¨el Laflamme, Franc ¸ois Pomerleau, and Philippe Gigu`ere. Driving datasets literature review. arXiv preprint arXiv:1910.11968, 2019. 2
-
[50]
Nitheesh Lakshminarayana. Large scale multimodal data capture, evaluation and maintenance framework for au- tonomous driving datasets. In ICCVW, 2019. 2
work page 2019
-
[51]
Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom
Alex H. Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. In CVPR, 2019. 1, 2, 6, 7, 8, 14, 15, 16
work page 2019
-
[52]
Deep continuous fusion for multi-sensor 3d object detection
Ming Liang, Bin Yang, Shenlong Wang, and Raquel Urtasun. Deep continuous fusion for multi-sensor 3d object detection. In ECCV, 2018. 2
work page 2018
-
[53]
SSD: Single shot multibox detector
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. SSD: Single shot multibox detector. In ECCV, 2016. 7
work page 2016
-
[54]
Yuexin Ma, Xinge Zhu, Sibo Zhang, Ruigang Yang, Wen- ping Wang, and Dinesh Manocha. Trafficpredict: Tra- jectory prediction for heterogeneous traffic-agents http: //apolloscape.auto/tracking.html. In AAAI,
-
[55]
1 year, 1000 km: The oxford robotcar dataset
Will Maddern, Geoffrey Pascoe, Chris Linegar, and Paul Newman. 1 year, 1000 km: The oxford robotcar dataset. IJRR, 2017. 2, 3
work page 2017
-
[56]
Lasernet: An efficient probabilistic 3d object detector for autonomous driving
Gregory P Meyer, Ankit Laddha, Eric Kee, Carlos Vallespi- Gonzalez, and Carl K Wellington. Lasernet: An efficient probabilistic 3d object detector for autonomous driving. In CVPR, 2019. 7
work page 2019
-
[57]
3d bounding box estimation using deep learn- ing and geometry
Arsalan Mousavian, Dragomir Anguelov, John Flynn, and Jana Kosecka. 3d bounding box estimation using deep learn- ing and geometry. In CVPR, 2017. 1
work page 2017
-
[58]
Nightowls: A pedestrians at night dataset
Luk Neumann, Michelle Karg, Shanshan Zhang, Christian Scharfenberger, Eric Piegert, Sarah Mistr, Olga Prokofyeva, Robert Thiel, Andrea Vedaldi, Andrew Zisserman, and Bernt Schiele. Nightowls: A pedestrians at night dataset. InACCV,
-
[59]
Starnet: Targeted computation for object detec- tion in point clouds
Jiquan Ngiam, Benjamin Caine, Wei Han, Brandon Yang, Yuning Chai, Pei Sun, Yin Zhou, Xi Yi, Ouais Alsharif, Patrick Nguyen, Zhifeng Chen, Jonathon Shlens, and Vijay Vasudevan. Starnet: Targeted computation for object detec- tion in point clouds. arXiv preprint arXiv:1908.11069, 2019. 7
-
[60]
Farzan Erlik Nowruzi, Prince Kapoor, Dhanvin Kolhatkar, Fahed Al Hassanat, Robert Laganiere, and Julien Rebut. How much real data do we actually need: Analyzing ob- ject detection performance using synthetic and real data. In ICML Workshop on AI for Autonomous Driving, 2019. 2
work page 2019
-
[61]
The H3D dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes
Abhishek Patil, Srikanth Malla, Haiming Gang, and Yi-Ting Chen. The H3D dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes. In ICRA,
-
[62]
A*3D Dataset: Towards autonomous driving in challenging environments
Quang-Hieu Pham, Pierre Sevestre, Ramanpreet Singh Pahwa, Huijing Zhan, Chun Ho Pang, Yuda Chen, Armin Mustafa, Vijay Chandrasekhar, and Jie Lin. A*3D Dataset: Towards autonomous driving in challenging environments. arXiv:1909.07541, 2019. 3
-
[63]
Boulton, Oscar Beijbom, and Eric M
Tung Phan-Minh, Elena Corina Grigore, Freddy A. Boulton, Oscar Beijbom, and Eric M. Wolff. Covernet: Multimodal behavior prediction using trajectory sets. In CVPR, 2020. 8
work page 2020
-
[64]
Charles R Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J. Guibas. Frustum pointnets for 3d object detection from RGB-D data. In CVPR, 2018. 2
work page 2018
-
[65]
Toward driving scene understanding: A dataset for learning driver behavior and causal reasoning
Vasili Ramanishka, Yi-Ting Chen, Teruhisa Misu, and Kate Saenko. Toward driving scene understanding: A dataset for learning driver behavior and causal reasoning. In CVPR,
- [66]
-
[67]
Faster R-CNN: Towards real-time object detection with re- gion proposal networks
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with re- gion proposal networks. In NIPS, 2015. 12
work page 2015
-
[68]
Nicholas Rhinehart, Rowan McAllister, Kris M. Kitani, and Sergey Levine. PRECOG: Predictions conditioned on goals in visual multi-agent scenarios. In ICCV, 2019. 2, 4
work page 2019
-
[69]
Ortho- graphic feature transform for monocular 3d object detection
Thomas Roddick, Alex Kendall, and Roberto Cipolla. Ortho- graphic feature transform for monocular 3d object detection. In BMVC, 2019. 1, 2, 5, 6, 7, 8, 14, 15
work page 2019
-
[70]
Disen- tangling monocular 3d object detection
Andrea Simonelli, Samuel Rota Bulo, Lorenzo Porzi, Manuel Lopez-Antequera, and Peter Kontschieder. Disen- tangling monocular 3d object detection. ICCV, 2019. 2, 7, 8, 15, 16
work page 2019
-
[71]
Leslie N. Smith. A disciplined approach to neural network hyper-parameters: Part 1 – learning rate, batch size, momen- tum, and weight decay. arXiv preprint arXiv:1803.09820 ,
work page internal anchor Pith review Pith/arXiv arXiv
-
[72]
Pointpainting: Sequential fusion for 3d object detec- tion
Sourabh V ora, Alex H Lang, Bassam Helou, and Oscar Bei- jbom. Pointpainting: Sequential fusion for 3d object detec- tion. In CVPR, 2020. 2
work page 2020
-
[73]
Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariha- ran, Mark Campbell, and Kilian Q. Weinberger. Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In CVPR, 2019. 1
work page 2019
-
[74]
Dataset for high-level 3d scene understanding of complex road scenes in the top-view
Ziyan Wang, Buyu Liu, Samuel Schulter, and Manmohan Chandraker. Dataset for high-level 3d scene understanding of complex road scenes in the top-view. In CVPRW, 2019. 2
work page 2019
-
[75]
Fusing bird’s eye view lidar point cloud and front view camera im- age for 3d object detection
Zining Wang, Wei Zhan, and Masayoshi Tomizuka. Fusing bird’s eye view lidar point cloud and front view camera im- age for 3d object detection. In IVS, 2018. 2
work page 2018
-
[76]
Waymo Open Dataset: An autonomous driving dataset, 2019
Waymo. Waymo Open Dataset: An autonomous driving dataset, 2019. 3
work page 2019
-
[77]
A baseline for 3d multi- object tracking
Xinshuo Weng and Kris Kitani. A baseline for 3d multi- object tracking. arXiv preprint arXiv:1907.03961, 2019. 6, 7, 8, 16
-
[78]
L. Woensel and G. Archer. Ten technologies which could change our lives. European Parlimentary Research Service,
-
[79]
Multi-cue onboard pedestrian detection
Christian Wojek, Stefan Walk, and Bernt Schiele. Multi-cue onboard pedestrian detection. In CVPR, 2009. 3
work page 2009
-
[80]
Multi-level fusion based 3d object detection from monocular images
Bin Xu and Zhenzhong Chen. Multi-level fusion based 3d object detection from monocular images. In CVPR, 2018. 1
work page 2018
-
[81]
Pointfu- sion: Deep sensor fusion for 3d bounding box estimation
Danfei Xu, Dragomir Anguelov, and Ashesh Jain. Pointfu- sion: Deep sensor fusion for 3d bounding box estimation. In CVPR, 2018. 2
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.