pith. machine review for the scientific record. sign in

arxiv: 2301.00493 · v1 · submitted 2023-01-02 · 💻 cs.CV · cs.AI· cs.LG· cs.RO

Recognition: no theorem link

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson , William Qi , Tanmay Agarwal , John Lambert , Jagjeet Singh , Siddhesh Khandelwal , Bowen Pan , Ratnesh Kumar , Andrew Hartnett , Jhony Kaesemodel Pontes , Deva Ramanan , Peter Carr , James Hays

Authors on Pith no claims yet

Pith reviewed 2026-05-12 20:10 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LGcs.RO
keywords self-drivingperceptionmotion forecastinglidardatasets3D annotationsHD mapsautonomous vehicles
0
0 comments X

The pith

Argoverse 2 releases three large datasets to support new research in self-driving perception and forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Argoverse 2 as a collection of three datasets for perception and forecasting in autonomous driving. The Sensor Dataset includes 1,000 sequences with camera images, lidar, and 3D annotations for 26 object types. The Lidar Dataset offers 20,000 unlabeled sequences to enable self-supervised learning and point cloud forecasting. The Motion Forecasting Dataset contains 250,000 scenarios focused on interactions, with track histories for predicting future actor movements. Each scenario comes with its own high-definition map from data collected in six cities. The authors argue that these resources will enable machine learning advances not possible with smaller or less detailed existing datasets.

Core claim

Argoverse 2 comprises the annotated Sensor Dataset with 1,000 multimodal sequences and 3D cuboid labels for 26 categories, the Lidar Dataset with 20,000 sequences of point clouds for self-supervised tasks, and the Motion Forecasting Dataset with 250,000 interaction scenarios providing track histories for future motion prediction, all equipped with 6-DOF pose and HD maps of lanes and crosswalks from six cities.

What carries the argument

The three complementary datasets supplying multimodal annotated sensor data, large-scale unlabeled lidar sequences, and detailed motion scenarios with HD maps from multiple cities.

If this is right

  • 3D perception models can be trained and evaluated using annotations for 26 sufficiently-sampled object categories.
  • Self-supervised learning and point cloud forecasting can be pursued with the largest collection of lidar sensor data.
  • Motion forecasting models can predict future locations for scored actors based on track histories of location, heading, velocity, and category in challenging scenarios.
  • Research can leverage HD maps with 3D lane and crosswalk geometry sourced from six distinct cities.
  • The datasets support both new and existing machine learning problems in self-driving that prior collections do not address as effectively.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Researchers may combine the sensor annotations with the forecasting scenarios to develop unified models that perform detection and future-motion prediction together.
  • The multi-city coverage could help create models more robust to differences in road layouts and traffic behaviors across urban areas.
  • Pre-training on the large unlabeled lidar sequences might boost performance on other 3D vision tasks beyond self-driving.
  • Emphasis on interaction-rich scenarios could support development of safer prediction systems that handle complex multi-vehicle situations.

Load-bearing premise

The provided annotations are accurate enough and the selected scenarios are sufficiently representative to drive meaningful improvements in deployed self-driving systems.

What would settle it

A test showing that models trained on Argoverse 2 data achieve no measurable gains in accuracy or generalization on independent real-world self-driving benchmarks compared to models trained on smaller prior datasets would falsify the claim.

read the original abstract

We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces Argoverse 2, a collection of three datasets for self-driving perception and forecasting research. The Annotated Sensor Dataset provides 1,000 multimodal sequences with high-resolution imagery from seven ring cameras and two stereo cameras, lidar point clouds, 6-DOF map-aligned pose, and 3D cuboid annotations for 26 object categories. The Lidar Dataset offers 20,000 sequences of unlabeled lidar point clouds with map-aligned pose. The Motion Forecasting Dataset contains 250,000 scenarios with track histories for scored actors and HD maps with 3D lane and crosswalk geometry, all sourced from six cities. The authors state that these datasets will support new and existing machine learning research problems in ways that existing datasets do not, and all are released under CC BY-NC-SA 4.0.

Significance. If the datasets are released with the described scale, annotation quality, and diversity, this work provides a substantial resource for advancing 3D perception, self-supervised point cloud learning, point cloud forecasting, and motion prediction. Strengths include the explicit provision of annotation counts, category coverage, multi-city sourcing, and the combination of labeled and unlabeled data at large scale, which directly addresses limitations in prior collections and enables new research directions as claimed.

minor comments (3)
  1. [Abstract] Abstract: The statement that all 26 object categories 'are sufficiently-sampled to support training and evaluation of 3D perception models' would be strengthened by including (or referencing) per-category instance counts or a table summarizing annotation statistics to allow readers to assess this claim directly.
  2. The manuscript would benefit from a dedicated comparison section or table (e.g., Table 1) against prior datasets such as nuScenes, Waymo Open Dataset, and the original Argoverse to explicitly quantify differences in scale, number of categories, sensor modalities, and geographic coverage.
  3. [Abstract] Motion Forecasting Dataset description: The criteria used to mine the 250,000 scenarios for 'interesting and challenging interactions' are not detailed in the provided abstract; adding a brief description of the mining process or heuristics would improve reproducibility and clarity.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the Argoverse 2 datasets and for recommending acceptance. We appreciate the recognition that the scale, annotation quality, multi-city diversity, and combination of labeled and unlabeled data address limitations in prior collections and enable new research directions.

Circularity Check

0 steps flagged

No circularity: pure dataset release with no derivations or fitted predictions

full rationale

The paper is a data release describing three datasets (Sensor, Lidar, Motion Forecasting) with explicit counts, categories, sourcing, and annotation details. No equations, derivations, parameters, or predictive claims exist that could reduce to inputs by construction. The central assertion that the datasets enable new research rests on the documented scale, diversity, and annotations rather than any self-referential logic or self-citation chain. This is the standard non-circular structure for dataset papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset release paper containing no mathematical derivations, fitted parameters, or postulated physical entities. The only background assumptions are standard ones about sensor calibration and map accuracy in the self-driving domain.

pith-pipeline@v0.9.0 · 5610 in / 1096 out tokens · 58895 ms · 2026-05-12T20:10:13.850741+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 29 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Unified Modeling of Lane and Lane Topology for Driving Scene Reasoning

    cs.CV 2026-05 unverdicted novelty 7.0

    UniTopo unifies lane detection and topology reasoning into a single perception model, outperforming prior methods on OpenLane-V2 benchmarks with TOP_ll scores of 30.1% and 31.8%.

  2. CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography

    cs.CV 2026-05 conditional novelty 7.0

    CARD is a new multi-modal driving dataset delivering ~500K dense depth pixels per frame from challenging road topographies using stereo cameras and fused LiDARs over 110 km.

  3. TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation

    cs.CV 2026-04 accept novelty 7.0

    TRIP-Evaluate is a new open multimodal benchmark with 837 text, image, and point-cloud items organized by a role-task-knowledge taxonomy to evaluate large models on transportation workflows.

  4. TopoHR: Hierarchical Centerline Representation for Cyclic Topology Reasoning in Driving Scenes with Point-to-Instance Relations

    cs.CV 2026-04 unverdicted novelty 7.0

    TopoHR proposes a hierarchical centerline representation and topology reasoning module with point-to-instance relations and cyclic interactions, achieving new state-of-the-art results on the OpenLane-V2 benchmark for ...

  5. WildDet3D: Scaling Promptable 3D Detection in the Wild

    cs.CV 2026-04 unverdicted novelty 7.0

    WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.

  6. Appearance Decomposition Gaussian Splatting for Multi-Traversal Reconstruction

    cs.CV 2026-04 unverdicted novelty 7.0

    ADM-GS decomposes static background appearance into traversal-invariant material and traversal-dependent illumination via a frequency-separated neural light field, yielding +0.98 dB PSNR gains and better cross-travers...

  7. RayMamba: Ray-Aligned Serialization for Long-Range 3D Object Detection

    cs.CV 2026-04 unverdicted novelty 7.0

    RayMamba improves long-range 3D object detection by ray-aligned serialization of sparse voxels for state space modeling, delivering up to 2.49 mAP gain on nuScenes in the 40-50 m range.

  8. A global dataset of continuous urban dashcam driving

    cs.CV 2026-04 accept novelty 7.0

    CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.

  9. UniDAC: Universal Metric Depth Estimation for Any Camera

    cs.CV 2026-03 unverdicted novelty 7.0

    UniDAC achieves universal metric depth estimation across camera types by decoupling relative depth prediction from spatially varying scale estimation using a depth-guided module and distortion-aware positional embedding.

  10. LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

    cs.CV 2026-03 unverdicted novelty 7.0

    KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.

  11. MUSDA: Multi-source Multi-modality Unsupervised Domain Adaptive 3D Object Detection for Autonomous Driving

    cs.CV 2026-05 unverdicted novelty 6.0

    MUSDA proposes hierarchical domain classifiers for multi-modality feature alignment and a prototype graph strategy for multi-source prediction fusion in unsupervised domain adaptation for 3D object detection.

  12. GSMap: 2D Gaussians for Online HD Mapping

    cs.CV 2026-05 unverdicted novelty 6.0

    GSMap represents HD map elements as sequences of 2D Gaussians to unify geometric precision and topological regularity for online autonomous driving maps.

  13. Unified Map Prior Encoder for Mapping and Planning

    cs.CV 2026-05 unverdicted novelty 6.0

    UMPE fuses any subset of HD/SD vector maps, raster SD maps, and satellite imagery into BEV features via alignment-aware vector and raster branches, raising mapping mAP by 5.3-5.9 points and cutting planning L2 error b...

  14. LIE: LiDAR-only HD Map Construction with Intensity Enhancement via Online Knowledge Distillation

    cs.CV 2026-05 unverdicted novelty 6.0

    LIE delivers LiDAR-only HD map segmentation via online knowledge distillation that fuses intensity maps, beating the best camera-only model by 8.2% mIoU on nuScenes while adapting quickly to new datasets.

  15. VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions

    eess.SY 2026-04 unverdicted novelty 6.0

    VLM-VPI uses Qwen3-VL and GPT-OSS models for pedestrian intent and age reasoning plus a tiered safety controller, reporting 92.3% intent accuracy in CARLA and reduced conflicts versus rule-based and supervised baselines.

  16. EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving

    cs.CV 2026-04 unverdicted novelty 6.0

    EgoDyn-Bench reveals a perception bottleneck in vision-centric foundation models: ego-motion logic derives from language while visual input adds negligible signal, with explicit trajectories restoring consistency.

  17. Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

    cs.CV 2026-04 unverdicted novelty 6.0

    OneVL is the first latent CoT method to exceed explicit CoT accuracy on four driving benchmarks while running at answer-only speed, by supervising latent tokens with a visual world model decoder.

  18. Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

    cs.CV 2026-04 unverdicted novelty 6.0

    OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.

  19. CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras

    cs.CV 2026-04 unverdicted novelty 6.0

    CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.

  20. EdgeVTP: Exploration of Latency-efficient Trajectory Prediction for Edge-based Embedded Vision Applications

    cs.CV 2026-04 unverdicted novelty 6.0

    EdgeVTP delivers the lowest measured end-to-end latency on Jetson-class platforms while matching or exceeding state-of-the-art accuracy on highway trajectory benchmarks by using bounded graph interactions and a one-sh...

  21. EagleVision: A Multi-Task Benchmark for Cross-Domain Perception in High-Speed Autonomous Racing

    cs.RO 2026-04 unverdicted novelty 6.0

    EagleVision creates a standardized multi-task benchmark for LiDAR perception in high-speed autonomous racing, with experiments showing that pretraining on racing data improves cross-domain detection and prediction per...

  22. Visually-grounded Humanoid Agents

    cs.CV 2026-04 unverdicted novelty 6.0

    A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.

  23. Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection

    cs.CV 2026-04 unverdicted novelty 6.0

    Telescope uses learnable hyperbolic foveation to deliver a 76% relative mAP gain (0.185 to 0.326) for objects beyond 250 meters while keeping overhead low.

  24. HorizonWeaver: Generalizable Multi-Level Semantic Editing for Driving Scenes

    cs.CV 2026-04 unverdicted novelty 6.0

    HorizonWeaver enables photorealistic, instruction-driven multi-level editing of complex driving scenes with improved generalization via a new paired dataset, language-guided masks, and joint training losses.

  25. Goal-Oriented Reactive Simulation for Closed-Loop Trajectory Prediction

    cs.RO 2026-03 conditional novelty 6.0

    Closed-loop on-policy training with a reactive goal-oriented scene decoder cuts collision rates by up to 79.5% in dense traffic compared to standard open-loop baselines.

  26. SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection

    cs.CV 2026-04 unverdicted novelty 5.0

    SemLT3D introduces semantic-guided expert distillation with a language MoE module and CLIP projection to enrich features for long-tailed classes in camera-only 3D detection.

  27. Artificial Intelligence for Modeling and Simulation of Mixed Automated and Human Traffic

    cs.AI 2026-04 unverdicted novelty 5.0

    This survey synthesizes AI techniques for mixed autonomy traffic simulation and introduces a taxonomy spanning agent-level behavior models, environment-level methods, and cognitive/physics-informed approaches.

  28. LEAN-3D: Low-latency Hierarchical Point Cloud Codec for Mobile 3D Streaming

    eess.SP 2026-04 unverdicted novelty 5.0

    LEAN-3D delivers 3-5x lower latency and up to 5.1x lower edge energy for learned point cloud compression on mobile hardware by restricting learned components to shallow hierarchy levels and using deterministic coding ...

  29. AtteConDA: Attention-Based Conflict Suppression in Multi-Condition Diffusion Models and Synthetic Data Augmentation

    cs.CV 2026-05 unverdicted novelty 4.0

    AtteConDA adds attention-based conflict suppression to multi-condition diffusion models so that generated driving-scene images retain richer structural cues from the original annotations.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · cited by 28 Pith papers · 1 internal anchor

  1. [1]

    SemanticKITTI: A dataset for semantic scene understanding of lidar sequences

    Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. SemanticKITTI: A dataset for semantic scene understanding of lidar sequences. In ICCV, October 2019

  2. [2]

    Range conditioned dilated convolutions for scale invariant 3d object detection

    Alex Bewley, Pei Sun, Thomas Mensink, Drago Anguelov, and Cristian Sminchisescu. Range conditioned dilated convolutions for scale invariant 3d object detection. In Conference on Robot Learning, 2020

  3. [3]

    Language Models are Few-Shot Learners

    Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020

  4. [4]

    Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom

    Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuScenes: A Multimodal Dataset for Autonomous Driving. In CVPR, 2020

  5. [5]

    To the point: Efficient 3d object detection in the range image with graph convolution kernels

    Yuning Chai, Pei Sun, Jiquan Ngiam, Weiyue Wang, Benjamin Caine, Vijay Vasudevan, Xiao Zhang, and Dragomir Anguelov. To the point: Efficient 3d object detection in the range image with graph convolution kernels. In CVPR, June 2021

  6. [6]

    Argoverse: 3D Tracking and Forecasting With Rich Maps

    Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, and James Hays. Argoverse: 3D Tracking and Forecasting With Rich Maps. In CVPR, 2019

  7. [7]

    A simple framework for contrastive learning of visual representations

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In ICML, 2020

  8. [8]

    GeoSim: Realistic video simulation via geometry-aware composition for self-driving

    Yun Chen, Frieda Rong, Shivam Duggal, Shenlong Wang, Xinchen Yan, Sivabalan Mani- vasagam, Shangjie Xue, Ersin Yumer, and Raquel Urtasun. GeoSim: Realistic video simulation via geometry-aware composition for self-driving. In CVPR, June 2021

  9. [9]

    Kaist multi-spectral day/night data set for autonomous and assisted driving

    Yukyung Choi, Namil Kim, Soonmin Hwang, Kibaek Park, Jae Shin Yoon, Kyounghwan An, and In So Kweon. Kaist multi-spectral day/night data set for autonomous and assisted driving. IEEE Transactions on Intelligent Transportation Systems, 19(3):934–948, 2018. 11

  10. [10]

    All-day visual place recognition: Benchmark dataset and baseline

    Yukyung Choi, Namil Kim, Kibaek Park, Soonmin Hwang, Jae Shin Yoon, Yoon In, and Inso Kweon. All-day visual place recognition: Benchmark dataset and baseline. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. Workshop on Visual Place Recognition in Changing Environments, 2015

  11. [11]

    arXiv preprint arXiv:2211.02545 (2022)

    Alexander Cui, Sergio Casas, Kelvin Wong, Simon Suo, and Raquel Urtasun. Gorela: Go relative for viewpoint-invariant motion forecasting. arXiv preprint arXiv:2211.02545, 2022

  12. [12]

    Large scale interactive motion forecasting for autonomous driving : The waymo open motion dataset

    Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Benjamin Sapp, Charles Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov. Large scale interactive motion forecasting for autonomous driving : The waymo open motion da...

  13. [13]

    Technical report for cvpr 2022 workshop on autonomous driving argoverse 3d object detection competition, 2022

    Jin Fang, Qinghao Meng, Dingfu Zhou, Chulin Tang, Jianbing Shen, Cheng-Zhong Xu, and Liangjun Zhang. Technical report for cvpr 2022 workshop on autonomous driving argoverse 3d object detection competition, 2022

  14. [14]

    Cityscapes 3d: Dataset and benchmark for 9 dof vehicle detection

    Nils Gählert, Nicolas Jourdan, Marius Cordts, Uwe Franke, and Joachim Denzler. Cityscapes 3d: Dataset and benchmark for 9 dof vehicle detection. CoRR, abs/2006.07864, 2020

  15. [15]

    VectorNet: Encoding hd maps and agent dynamics from vectorized representation

    Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, and Cordelia Schmid. VectorNet: Encoding hd maps and agent dynamics from vectorized representation. In CVPR, June 2020

  16. [16]

    Afdet: Anchor free one stage 3d object detection

    Runzhou Ge, Zhuangzhuang Ding, Yihan Hu, Yu Wang, Sijia Chen, Li Huang, and Yuan Li. Afdet: Anchor free one stage 3d object detection. In CVPR Workshops, 2020

  17. [17]

    Are we ready for autonomous driving? The KITTI vision benchmark suite

    Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark suite. In CVPR, June 2012

  18. [18]

    Jakob Geyer, Yohannes Kassahun, Mentar Mahmudi, Xavier Ricou, Rupesh Durgesh, Andrew S. Chung, Lorenz Hauswald, Viet Hoang Pham, Maximilian Mühlegg, Sebastian Dorn, Tiffany Fernandez, Martin Jänicke, Sudesh Mirashi, Chiragkumar Savani, Martin Sturm, Oleksandr V orobiov, Martin Oelker, Sebastian Garreis, and Peter Schuberth. A2D2: audi autonomous driving d...

  19. [19]

    Home: Heatmap output for future motion estimation

    Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bogdan Stanciulescu, and Fabien Moutarde. Home: Heatmap output for future motion estimation. arXiv preprint arXiv:2105.10968, 2021

  20. [20]

    Thomas: Trajectory heatmap output with learned multi-agent sampling

    Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bogdan Stanciulescu, and Fabien Moutarde. Thomas: Trajectory heatmap output with learned multi-agent sampling. In ICLR, 2022

  21. [21]

    Streaming object detection for 3-d point clouds

    Wei Han, Zhengdong Zhang, Benjamin Caine, Brandon Yang, Christoph Sprunk, Ouais Alsharif, Jiquan Ngiam, Vijay Vasudevan, Jonathon Shlens, and Zhifeng Chen. Streaming object detection for 3-d point clouds. In ECCV, 2020

  22. [22]

    One Thousand and One Hours: Self-driving Motion Prediction Dataset

    John Houston, Guido Zuidhof, Luca Bergamini, Yawei Ye, Long Chen, Ashesh Jain, Sammy Omari, Vladimir Iglovikov, and Peter Ondruska. One Thousand and One Hours: Self-driving Motion Prediction Dataset. arXiv:2006.14480 [cs], November 2020. Comment: Presented at CoRL2020

  23. [23]

    Safe local motion planning with self-supervised freespace forecasting

    Peiyun Hu, Aaron Huang, John Dolan, David Held, and Deva Ramanan. Safe local motion planning with self-supervised freespace forecasting. In CVPR, June 2021

  24. [24]

    Kesten, M

    R. Kesten, M. Usman, J. Houston, T. Pandya, K. Nadhamuni, A. Ferreira, M. Yuan, B. Low, A. Jain, P. Ondruska, S. Omari, S. Shah, A. Kulkarni, A. Kazakova, C. Tao, L. Platinsky, W. Jiang, and V . Shet. Lyft level 5 av dataset.arXiv, 2019

  25. [25]

    What-if motion prediction for autonomous driving

    Siddhesh Khandelwal, William Qi, Jagjeet Singh, Andrew Hartnett, and Deva Ramanan. What-if motion prediction for autonomous driving. arXiv preprint arXiv:2008.10587, 2020

  26. [26]

    Lambert and James Hays

    John W. Lambert and James Hays. Trust, but Verify: Cross-modality fusion for hd map change detection. In Advances in Neural Information Processing Systems Track on Datasets and Benchmarks, 2021

  27. [27]

    Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom

    Alex H. Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. PointPillars: Fast encoders for object detection from point clouds. In CVPR, June 2019

  28. [28]

    Social attention for autonomous decision-making in dense traffic

    Edouard Leurent and Jean Mercat. Social attention for autonomous decision-making in dense traffic. CoRR, abs/1911.12250, 2019. 12

  29. [29]

    Towards streaming perception

    Mengtian Li, Yu-Xiong Wang, and Deva Ramanan. Towards streaming perception. InECCV, 2020

  30. [30]

    Hdmapnet: An online HD map construction and evaluation framework

    Qi Li, Yue Wang, Yilun Wang, and Hang Zhao. Hdmapnet: An online HD map construction and evaluation framework. CoRR, abs/2107.06307, 2021

  31. [31]

    Learning lane graph representations for motion forecasting

    Ming Liang, Bin Yang, Rui Hu, Yun Chen, Renjie Liao, Song Feng, and Raquel Urtasun. Learning lane graph representations for motion forecasting. In ECCV, 2020

  32. [32]

    Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation

    Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, and Song Han. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. arXiv preprint arXiv:2205.13542, 2022

  33. [33]

    Trafficpredict: Trajectory prediction for heterogeneous traffic-agents.CoRR, abs/1811.02146, 2018

    Yuexin Ma, Xinge Zhu, Sibo Zhang, Ruigang Yang, Wenping Wang, and Dinesh Manocha. Trafficpredict: Trajectory prediction for heterogeneous traffic-agents.CoRR, abs/1811.02146, 2018

  34. [34]

    Andrey Malinin, Neil Band, Alexander Ganshin, German Chesnokov, Yarin Gal, Mark J. F. Gales, Alexey Noskov, Andrey Ploskonosov, Liudmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, Vyas Raina, Mariya Shmatova, Panos Tigas, and Boris Yangel. Shifts: A dataset of real distributional shift across multiple large-scale tasks. CoRR, abs/2107.07455, 2021

  35. [35]

    LiDARsim: Realistic lidar simula- tion by leveraging the real world

    Sivabalan Manivasagam, Shenlong Wang, Kelvin Wong, Wenyuan Zeng, Mikita Sazanovich, Shuhan Tan, Bin Yang, Wei-Chiu Ma, and Raquel Urtasun. LiDARsim: Realistic lidar simula- tion by leveraging the real world. In CVPR, June 2020

  36. [36]

    One million scenes for aut onomous driving: Once dataset

    Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Jie Yu, Hang Xu, and Chunjing Xu. One Million Scenes for Autonomous Driving: ONCE Dataset. arXiv:2106.11037 [cs], August 2021. Comment: Accepted to NeurIPS 2021 Datasets and Benchmarks Track

  37. [37]

    Multi-head attention for multi-modal joint vehicle motion forecasting

    Jean Mercat, Thomas Gilles, Nicole El Zoghby, Guillaume Sandou, Dominique Beauvois, and Guillermo Pita Gil. Multi-head attention for multi-modal joint vehicle motion forecasting. In ICRA. IEEE, 2020

  38. [38]

    Multi-head attention for multi-modal joint vehicle motion forecasting, 2019

    Jean Mercat, Thomas Gilles, Nicole El Zoghby, Guillaume Sandou, Dominique Beauvois, and Guillermo Pita Gil. Multi-head attention for multi-modal joint vehicle motion forecasting, 2019

  39. [39]

    The H3D dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes

    Abhishek Patil, Srikanth Malla, Haiming Gang, and Yi-Ting Chen. The H3D dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes. CoRR, abs/1903.01568, 2019

  40. [40]

    A*3d dataset: Towards au- tonomous driving in challenging environments

    Quang-Hieu Pham, Pierre Sevestre, Ramanpreet Singh Pahwa, Huijing Zhan, Chun Ho Pang, Yuda Chen, Armin Mustafa, Vijay Chandrasekhar, and Jie Lin. A*3d dataset: Towards au- tonomous driving in challenging environments. CoRR, abs/1909.07541, 2019

  41. [41]

    Canadian adverse driving conditions dataset

    Matthew Pitropov, Danson Evan Garcia, Jason Rebello, Michael Smart, Carlos Wang, Krzysztof Czarnecki, and Steven Waslander. Canadian adverse driving conditions dataset. The Interna- tional Journal of Robotics Research, 40(4-5):681–690, Dec 2020

  42. [42]

    Qi, Yin Zhou, Mahyar Najibi, Pei Sun, Khoa V o, Boyang Deng, and Dragomir Anguelov

    Charles R. Qi, Yin Zhou, Mahyar Najibi, Pei Sun, Khoa V o, Boyang Deng, and Dragomir Anguelov. Offboard 3d object detection from point cloud sequences. In CVPR, June 2021

  43. [43]

    Argoverse motion forecast- ing competition

    Jagjeet Singh, William Qi, Tanmay Agarwal, and Andrew Hartnett. Argoverse motion forecast- ing competition. https://eval.ai/web/challenges/challenge-page/454/overview. Accessed: 08-27-2021

  44. [44]

    Qml for argoverse 2 motion forecasting challenge, 2022

    Tong Su, Xishun Wang, and Xiaodong Yang. Qml for argoverse 2 motion forecasting challenge, 2022

  45. [45]

    Scalability in Perception for Autonomous Driving: Waymo Open Dataset

    Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in Perception...

  46. [46]

    Rsn: Range sparse net for efficient, accurate lidar 3d object detection

    Pei Sun, Weiyue Wang, Yuning Chai, Gamaleldin Elsayed, Alex Bewley, Xiao Zhang, Cristian Sminchisescu, and Dragomir Anguelov. Rsn: Range sparse net for efficient, accurate lidar 3d object detection. In CVPR, June 2021. 13

  47. [47]

    Ganet: Goal area network for motion forecasting, 2022

    Mingkun Wang, Xinge Zhu, Changqian Yu, Wei Li, Yuexin Ma, Ruochun Jin, Xiaoguang Ren, Dongchun Ren, Mingxu Wang, and Wenjing Yang. Ganet: Goal area network for motion forecasting, 2022

  48. [48]

    4d forecast- ing: Sequential forecasting of 100,000 points

    Xinshuo Weng, Jianren Wang, Sergey Levine, Kris Kitani, and Nicholas Rhinehart. 4d forecast- ing: Sequential forecasting of 100,000 points. In Proceedings of ECCV ’20 Workshops, August 2020

  49. [49]

    Inverting the forecasting pipeline with spf2: Sequential pointcloud forecasting for sequential pose forecasting

    Xinshuo Weng, Jianren Wang, Sergey Levine, Kris Kitani, and Nick Rhinehart. Inverting the forecasting pipeline with spf2: Sequential pointcloud forecasting for sequential pose forecasting. In Proceedings of (CoRL) Conference on Robot Learning, November 2020

  50. [50]

    Surfelgan: Synthesizing realistic sensor data for autonomous driving

    Zhenpei Yang, Yuning Chai, Dragomir Anguelov, Yin Zhou, Pei Sun, Dumitru Erhan, Sean Rafferty, and Henrik Kretzschmar. Surfelgan: Synthesizing realistic sensor data for autonomous driving. In CVPR, June 2020

  51. [51]

    Center-based 3d object detection and tracking

    Tianwei Yin, Xingyi Zhou, and Philipp Krahenbuhl. Center-based 3d object detection and tracking. In CVPR, June 2021

  52. [52]

    Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps

    Wei Zhan, Liting Sun, Di Wang, Haojie Shi, Aubrey Clausse, Maximilian Naumann, Julius Kummerle, Hendrik Konigshof, Christoph Stiller, Arnaud de La Fortelle, et al. Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps. arXiv preprint arXiv:1910.03088, 2019

  53. [53]

    Banet: Motion forecasting with boundary aware network, 2022

    Chen Zhang, Honglin Sun, Chen Chen, and Yandong Guo. Banet: Motion forecasting with boundary aware network, 2022

  54. [54]

    Miss Rate

    Jannik Zürn, Johan Vertens, and Wolfram Burgard. Lane graph estimation for scene understand- ing in urban driving. CoRR, abs/2105.00195, 2021. 6 Appendix 6.1 Additional Information About Sensor Suite In Figure 8, we provide a diagram of the sensor suite used to capture the Argoverse 2 datasets. Figure 9 shows the speed distribution for annotated pedestria...